Skip to content

Serialization optimizations #801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 7, 2020

Conversation

stebet
Copy link
Contributor

@stebet stebet commented Apr 3, 2020

Proposed Changes

Optimizations for (de)serialization of short, ushort, int, uint, long and ulong, basically forcing Network-Order at all times, regardless of architecture. This gets rid of a lot of redundant endianness checks making the code a bit more verbose but performing a lot better. Some code in here had actually regressed quite a bit from the original implementation from a pure serialization perspective, although it was still faster due to overhead in the old NetworkBinaryReader class. This should bring best of both worlds :)

Types of Changes

What types of changes does your code introduce to this project?

  • Bug fix (non-breaking change)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

@lukebakken lukebakken self-assigned this Apr 3, 2020
@lukebakken lukebakken added this to the 6.0.0 milestone Apr 3, 2020
@lukebakken
Copy link
Collaborator

@stebet thanks again! Just so you know I'm hoping to wrap up version 6.0.0 next week... are there any more 6.0-compatible surprises you have? 🎁

I'm assuming I can run your benchmark application to see the direct benefit of this change?

@stebet
Copy link
Contributor Author

stebet commented Apr 3, 2020

You should be able to use the benchmark app yes, but doubt you'll see a big change, unless you can manage quite a bit of throughput. This is mostly the result of me benchmarking the old vs new way of (de)serializing and making some fine-tuning adjustment.

No more surprises for the time being :)

@stebet
Copy link
Contributor Author

stebet commented Apr 3, 2020

If you take the code from the TestNetworkByteOrderSerialization test and run that in a tight loop with a profiler attached, you should see a hefty change.

@michaelklishin michaelklishin merged commit 337f96c into rabbitmq:master Apr 7, 2020
@stebet stebet deleted the serializationOptimizations branch April 7, 2020 09:45
@bollhals
Copy link
Contributor

@stebet I was wondering why the change away from BinaryPrimitives? Would you mind explaining it to me?

@stebet
Copy link
Contributor Author

stebet commented May 23, 2020

@stebet I was wondering why the change away from BinaryPrimitives? Would you mind explaining it to me?

BinaryPrimitives were slower on .NET Framework. Will be reintroduced in 7.0 probably.

@bollhals
Copy link
Contributor

if it's only for .NET Framework, would it be an option to compile the .Net standard one with different code? Or will Full Framework users also use the standard one? 🤔

@stebet
Copy link
Contributor Author

stebet commented May 25, 2020

It could but BinaryPrimitives on .NET Core aren't really faster (although they are simpler), so I don't see a point in doing in doing code branching based on TFMs or compilation symbols for the 6.x branch.

@bollhals
Copy link
Contributor

At least in my tests (don't have the numbers anymore, have to dig them back up later) with .net core 3.1, the conversion of 10x ulong went from ~50 ns down to 5 ns by using the BinaryPrimitives instead of the manual conversion. (Was the only one I quickly checked)

@stebet
Copy link
Contributor Author

stebet commented May 25, 2020

Did you use BenchmarkDotNet?

@bollhals
Copy link
Contributor

Yes I did, here are the number for my quick test:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i5-8600K CPU 3.60GHz (Coffee Lake), 1 CPU, 6 logical and 6 physical cores
.NET Core SDK=3.1.300
[Host] : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.22101), X64 RyuJIT
ShortRun : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.22101), X64 RyuJIT

Job=ShortRun IterationCount=3 LaunchCount=1
WarmupCount=3

Method Mean Error StdDev Code Size Gen 0 Gen 1 Gen 2 Allocated
WriteUInt64 57.125 ns 2.8096 ns 0.1540 ns 521 B - - - -
WriteUInt64_Span 40.368 ns 1.3413 ns 0.0735 ns 515 B - - - -
WriteUInt64_SpanAndBinaryPrimitives 18.160 ns 0.1676 ns 0.0092 ns 236 B - - - -
WriteUInt64_BinaryPrimitivesInlined 7.676 ns 0.2599 ns 0.0142 ns 189 B - - - -
WriteUInt64_MemoryBinaryPrimitivesInlined 19.719 ns 0.2569 ns 0.0141 ns 214 B - - - -
    [ShortRunJob]
    [MemoryDiagnoser]
    [DisassemblyDiagnoser]
    public class NetworkOrderSerializerBenchmark
    {
        private Memory<byte> _memory;
        private ulong _start;
        
        public NetworkOrderSerializerBenchmark()
        {
            _memory = new Memory<byte>(new byte[sizeof(byte) * 100]);
            _start = 50;
        }


        [Benchmark]
        public void WriteUInt64()
        {
            for (ulong i = _start; i < _start + 10; i++)
            {
                NetworkOrderSerializer.WriteUInt64(_memory, i);
            }
        }

        [Benchmark]
        public void WriteUInt64_Span()
        {
            var span = _memory.Span;
            for (ulong i = _start; i < _start + 10; i++)
            {
                NetworkOrderSerializer.WriteUInt64_Span(span, i);
            }
        }

        [Benchmark]
        public void WriteUInt64_SpanAndBinaryPrimitives()
        {
            var span = _memory.Span;
            for (ulong i = _start; i < _start + 10; i++)
            {
                NetworkOrderSerializer.WriteUInt64_SpanAndBinaryPrimitives(span, i);
            }
        }

        [Benchmark]
        public void WriteUInt64_BinaryPrimitivesInlined()
        {
            var span = _memory.Span;
            for (ulong i = _start; i < _start + 10; i++)
            {
                NetworkOrderSerializer.WriteUInt64_SpanAndBinaryPrimitives_Inlinable(span, i);
            }
        }

        [Benchmark]
        public void WriteUInt64_MemoryBinaryPrimitivesInlined()
        {
            for (ulong i = _start; i < _start + 10; i++)
            {
                NetworkOrderSerializer.WriteUInt64_BinaryPrimitives_Inlinable(_memory, i);
            }
        }
    }

/*  Where the methods in NetworkOrderSerializer */

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        internal static void WriteUInt64(Memory<byte> memory, ulong val)
        {
            if (memory.Length < 8)
            {
                throw new ArgumentOutOfRangeException(nameof(memory), "Insufficient length to write UInt64 from memory.");
            }

            SerializeUInt64(memory.Span, val);
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        internal static void WriteUInt64_Span(Span<byte> span, ulong val)
        {
            if (span.Length < 8)
            {
                throw new ArgumentOutOfRangeException(nameof(span), "Insufficient length to write UInt64 from memory.");
            }

            SerializeUInt64(span, val);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        internal static void WriteUInt64_SpanAndBinaryPrimitives(Span<byte> span, ulong val)
        {
            BinaryPrimitives.WriteUInt64BigEndian(span, val);
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        internal static void WriteUInt64_SpanAndBinaryPrimitives_Inlinable(Span<byte> span, ulong val)
        {
            BinaryPrimitives.WriteUInt64BigEndian(span, val);
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        internal static void WriteUInt64_BinaryPrimitives_Inlinable(Memory<byte> memory, ulong val)
        {
            BinaryPrimitives.WriteUInt64BigEndian(memory.Span, val);
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants