Skip to content

[WIP] [2.0] A maths and hardware intrinsics library for Silk.NET and .NET 5 #173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

Perksey
Copy link
Member

@Perksey Perksey commented Apr 24, 2020

Summary of the PR

  • Adds a library for using SIMD instructions with .NET
  • Adds a library containing generic matrices, vectors, and quaternions; as well as their related maths ops.

What version does this PR target?

2.0

Related issues, Discord discussions, or proposals

#48

Further Comments

DO NOT SQUASH AND MERGE It must be merged in using a merge commit due to Gamma working on it too.


namespace Silk.NET.Intrinsics.Avx
{
public partial struct AvxRegister : IRegister<double>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "Register" mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is poorly chosen, but in the context of the intrinsics library a register is a class capable of performing mathematical operations for a given type using a SIMD register such as AVX or SSE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe IVector or something would be a better name?

Is Avx referring to VEX encoded (128-bit or 256-bit) vectors or strictly 256-bit vectors? If the latter, it might also benefit from a clearer name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah VEX-encoded, but I don't want that leaking out to the user as I want this to be relatively easy to use and not try to introduce too many new concepts to the user - all the user needs to know is "fast maths ooh shiney"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it being VEX encoded an important detail? The operation is ultimately the same, just with better codegen under VEX.
It normally only gets interesting when the size changes, since that changes how much you are processing, etc.

throw new System.NotImplementedException();
}

public WorkUnit<float> Normalize2(WorkUnit<float> vector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure Normalize2 is a "clear" name. I understand what it means with context, but it isn't immediately obvious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XML docs will cover that when we add them post-development.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also consider renaming to NormalizeVector2 which would disambiguate and not force users to rely on docs.

throw new System.NotImplementedException();
}

public WorkUnit<float> X(WorkUnit<float> vector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this GetX?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there meant to be a SetX or WithX counterpart? (depending on if mutable or immutable)

throw new System.NotImplementedException();
}

public WorkUnit<float> NegateMultiplyAddFused(WorkUnit<float> x, WorkUnit<float> y, WorkUnit<float> z)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is an explicit NegateMultiplyAdd needed? Seems like an optimization the JIT should (and does) do...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will use the FMA register if applicable, otherwise it will use your everyday AVX register.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I'm not sure that clarifies why it is needed?

I can't think of an optimization you can do knowing that it is -(a * b) + c vs (a * b) + c, in which case you should be able to just have MultiplyFusedAdd and let the JIT optimize MultiplyFusedAdd(Negate(a), b, c) when FMA is available and just do your normal math, including the negation, when it isn't

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point, cc @sunkin351

throw new System.NotImplementedException();
}

public unsafe WorkUnit<float> ToVector4(float* ptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be a Load counterpart to Store?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably poorly named.

public partial struct AvxRegister
{
public static WorkUnitFlags Flags { get; } = GetFlags();
public static WorkUnitFlags Flags128F { get; } = Flags | WorkUnitFlags.Vector128 | WorkUnitFlags.TypeFloat;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Framework design guidelines recommends using the full non language specific name so there is no ambiguity.

That is, these should be Flags128Single, Flags128UInt64, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, will rename though these are meant to be private properties that for some reason I left public.

{
public WorkUnitFlags Flags { get; set; }
public Vector128<T> Vector { get; set; }
public unsafe fixed byte Padding[16];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is fixed byte Padding and why does it need both the Vector and the padding?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to ensure that you can convert WorkUnit128 to WorkUnit and vice versa using Unsafe.As.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have a Vector128<T> Reserved { get; set; } or Vector128<T> Upper ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, would it? As long as the space gets filled I suppose it doesn't really matter, the WorkUnitXXX types aren't really public-facing APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixed sized buffers generally generate security cookies and other bits. Also worse codegen due to being many more fields. Having it be a Vector128<T> would remove that and clarify how the bits are reserved and may be interpreted.
It would also allow it to be interpreted as an HVA struct for ABI purposes (assuming you do get rid of Flags like you mentioned might be a consideration below).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah fair enough, the more you know :D

Will implement.

internal struct WorkUnit256<T> where T:unmanaged
{
public WorkUnitFlags Flags { get; set; }
public Vector256<T> Vector { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This WorkUnit is going to be 64-bytes, seems like a lot of wasted space...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, however typically short-lived.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of WorkUnitFlags, they look to encode the type and size of the vector, but it isn't clear how its meant to be used in this context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering removing it as I don't think it's worth having. Originally it was gonna hold which set of instructions to use (i.e. AVX or SSE) so that we can have a static class that redirects maths operations to the correct implementation, however I think we can do that without the flags and probably get better treatment from the JIT too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable. It would also help with throughput due to reduced memory usage, etc.

@Perksey Perksey added this to the 2.0 milestone Apr 30, 2020
@Perksey
Copy link
Member Author

Perksey commented May 27, 2020

Generator ops

  • LD1/LD2/LD3/LD4 - loads a value and stores it into a vector
  • ST1/ST2/ST3/ST4 - stores the vector.
  • LDC - load constant value
  • LDB - loads a shuffle value
  • AND - and operation
  • XOR - exclusive or
  • OR - or
  • ADD.H - horizontal add
  • ADD - add
  • SUB - substract
  • MUL - multiply
  • DIV - divide
  • REC - reciprocal
  • SQRT - square root
  • RECSQRT - reciprocal square root
  • MAX - maximum
  • MIN - minimum
  • RND - round to nearest integer
  • RND.Z - round to zero (truncate)
  • RND.L - round down (floor)
  • RND.H - round up (ceiling)
  • EQ - compare equal
  • NEQ - compare not equal
  • GT - compare greater than
  • LT - compare less than
  • GTE - greater than or equal
  • LTE - less than or equal
  • STV - broadcasts a scalar to a vector
  • PMT - permutes a vector using a control byte
  • SHF - shuffles two vectors using a control byte
  • SIN - computes the sine of a vector's elements
  • COS - computes the cosine of a vector's elements
  • TAN - computes the tangent of a vector's elements
  • SIN.A - computes the sine of a vector's elements (approx)
  • COS.A - computes the cosine of a vector's elements (approx)
  • TAN.A - computes the tangent of a vector's elements (approx)
  • ATAN - atan
  • ATAN2 - atan2
  • DOT2/DOT3/DOT4 - dot product
  • CRO2/CRO3/CRO4 - cross product

@Perksey
Copy link
Member Author

Perksey commented Jun 17, 2020

Replaced with #190

@Perksey Perksey closed this Jun 17, 2020
@Perksey Perksey deleted the maths branch June 17, 2020 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants