-
-
Notifications
You must be signed in to change notification settings - Fork 428
[WIP] [2.0] A maths and hardware intrinsics library for Silk.NET and .NET 5 #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
namespace Silk.NET.Intrinsics.Avx | ||
{ | ||
public partial struct AvxRegister : IRegister<double> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "Register" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is poorly chosen, but in the context of the intrinsics library a register is a class capable of performing mathematical operations for a given type using a SIMD register such as AVX or SSE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe IVector
or something would be a better name?
Is Avx
referring to VEX
encoded (128-bit or 256-bit) vectors or strictly 256-bit vectors? If the latter, it might also benefit from a clearer name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah VEX-encoded, but I don't want that leaking out to the user as I want this to be relatively easy to use and not try to introduce too many new concepts to the user - all the user needs to know is "fast maths ooh shiney"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it being VEX encoded an important detail? The operation is ultimately the same, just with better codegen under VEX.
It normally only gets interesting when the size changes, since that changes how much you are processing, etc.
throw new System.NotImplementedException(); | ||
} | ||
|
||
public WorkUnit<float> Normalize2(WorkUnit<float> vector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure Normalize2
is a "clear" name. I understand what it means with context, but it isn't immediately obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XML docs will cover that when we add them post-development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also consider renaming to NormalizeVector2
which would disambiguate and not force users to rely on docs.
throw new System.NotImplementedException(); | ||
} | ||
|
||
public WorkUnit<float> X(WorkUnit<float> vector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this GetX
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there meant to be a SetX
or WithX
counterpart? (depending on if mutable or immutable)
throw new System.NotImplementedException(); | ||
} | ||
|
||
public WorkUnit<float> NegateMultiplyAddFused(WorkUnit<float> x, WorkUnit<float> y, WorkUnit<float> z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is an explicit NegateMultiplyAdd
needed? Seems like an optimization the JIT should (and does) do...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will use the FMA register if applicable, otherwise it will use your everyday AVX register.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but I'm not sure that clarifies why it is needed?
I can't think of an optimization you can do knowing that it is -(a * b) + c
vs (a * b) + c
, in which case you should be able to just have MultiplyFusedAdd
and let the JIT optimize MultiplyFusedAdd(Negate(a), b, c)
when FMA is available and just do your normal math, including the negation, when it isn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point, cc @sunkin351
throw new System.NotImplementedException(); | ||
} | ||
|
||
public unsafe WorkUnit<float> ToVector4(float* ptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be a Load
counterpart to Store
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, probably poorly named.
public partial struct AvxRegister | ||
{ | ||
public static WorkUnitFlags Flags { get; } = GetFlags(); | ||
public static WorkUnitFlags Flags128F { get; } = Flags | WorkUnitFlags.Vector128 | WorkUnitFlags.TypeFloat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Framework design guidelines recommends using the full non language specific name so there is no ambiguity.
That is, these should be Flags128Single
, Flags128UInt64
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, will rename though these are meant to be private properties that for some reason I left public.
{ | ||
public WorkUnitFlags Flags { get; set; } | ||
public Vector128<T> Vector { get; set; } | ||
public unsafe fixed byte Padding[16]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is fixed byte Padding
and why does it need both the Vector
and the padding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to ensure that you can convert WorkUnit128 to WorkUnit and vice versa using Unsafe.As.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to have a Vector128<T> Reserved { get; set; }
or Vector128<T> Upper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, would it? As long as the space gets filled I suppose it doesn't really matter, the WorkUnitXXX types aren't really public-facing APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fixed sized buffers generally generate security cookies and other bits. Also worse codegen due to being many more fields. Having it be a Vector128<T>
would remove that and clarify how the bits are reserved and may be interpreted.
It would also allow it to be interpreted as an HVA struct for ABI purposes (assuming you do get rid of Flags
like you mentioned might be a consideration below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah fair enough, the more you know :D
Will implement.
internal struct WorkUnit256<T> where T:unmanaged | ||
{ | ||
public WorkUnitFlags Flags { get; set; } | ||
public Vector256<T> Vector { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This WorkUnit
is going to be 64-bytes, seems like a lot of wasted space...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, however typically short-lived.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of WorkUnitFlags
, they look to encode the type and size of the vector, but it isn't clear how its meant to be used in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering removing it as I don't think it's worth having. Originally it was gonna hold which set of instructions to use (i.e. AVX or SSE) so that we can have a static class that redirects maths operations to the correct implementation, however I think we can do that without the flags and probably get better treatment from the JIT too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable. It would also help with throughput due to reduced memory usage, etc.
Generator ops
|
Replaced with #190 |
Summary of the PR
What version does this PR target?
2.0
Related issues, Discord discussions, or proposals
#48
Further Comments
DO NOT SQUASH AND MERGE It must be merged in using a merge commit due to Gamma working on it too.