Serialization optimizations #801

stebet · 2020-04-03T14:32:38Z

Proposed Changes

Optimizations for (de)serialization of short, ushort, int, uint, long and ulong, basically forcing Network-Order at all times, regardless of architecture. This gets rid of a lot of redundant endianness checks making the code a bit more verbose but performing a lot better. Some code in here had actually regressed quite a bit from the original implementation from a pure serialization perspective, although it was still faster due to overhead in the old NetworkBinaryReader class. This should bring best of both worlds :)

Types of Changes

What types of changes does your code introduce to this project?

Bug fix (non-breaking change)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause an observable behavior change in existing systems)
Documentation improvements (corrections, new content, etc)
Cosmetic change (whitespace, formatting, etc)

Checklist

I have read the CONTRIBUTING.md document
I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
All tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)
Any dependent changes have been merged and published in related repositories

lukebakken · 2020-04-03T15:36:16Z

@stebet thanks again! Just so you know I'm hoping to wrap up version 6.0.0 next week... are there any more 6.0-compatible surprises you have? 🎁

I'm assuming I can run your benchmark application to see the direct benefit of this change?

stebet · 2020-04-03T16:13:41Z

You should be able to use the benchmark app yes, but doubt you'll see a big change, unless you can manage quite a bit of throughput. This is mostly the result of me benchmarking the old vs new way of (de)serializing and making some fine-tuning adjustment.

No more surprises for the time being :)

stebet · 2020-04-03T16:15:20Z

If you take the code from the TestNetworkByteOrderSerialization test and run that in a tight loop with a profiler attached, you should see a hefty change.

bollhals · 2020-05-23T13:53:19Z

@stebet I was wondering why the change away from BinaryPrimitives? Would you mind explaining it to me?

stebet · 2020-05-23T14:30:13Z

@stebet I was wondering why the change away from BinaryPrimitives? Would you mind explaining it to me?

BinaryPrimitives were slower on .NET Framework. Will be reintroduced in 7.0 probably.

bollhals · 2020-05-23T19:04:01Z

if it's only for .NET Framework, would it be an option to compile the .Net standard one with different code? Or will Full Framework users also use the standard one? 🤔

stebet · 2020-05-25T09:34:27Z

It could but BinaryPrimitives on .NET Core aren't really faster (although they are simpler), so I don't see a point in doing in doing code branching based on TFMs or compilation symbols for the 6.x branch.

bollhals · 2020-05-25T10:10:35Z

At least in my tests (don't have the numbers anymore, have to dig them back up later) with .net core 3.1, the conversion of 10x ulong went from ~50 ns down to 5 ns by using the BinaryPrimitives instead of the manual conversion. (Was the only one I quickly checked)

stebet · 2020-05-25T11:41:19Z

Did you use BenchmarkDotNet?

bollhals · 2020-05-25T19:52:10Z

Yes I did, here are the number for my quick test:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i5-8600K CPU 3.60GHz (Coffee Lake), 1 CPU, 6 logical and 6 physical cores
.NET Core SDK=3.1.300
[Host] : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.22101), X64 RyuJIT
ShortRun : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.22101), X64 RyuJIT

Job=ShortRun IterationCount=3 LaunchCount=1
WarmupCount=3

Method	Mean	Error	StdDev	Code Size	Gen 0	Gen 1	Gen 2	Allocated
WriteUInt64	57.125 ns	2.8096 ns	0.1540 ns	521 B	-	-	-	-
WriteUInt64_Span	40.368 ns	1.3413 ns	0.0735 ns	515 B	-	-	-	-
WriteUInt64_SpanAndBinaryPrimitives	18.160 ns	0.1676 ns	0.0092 ns	236 B	-	-	-	-
WriteUInt64_BinaryPrimitivesInlined	7.676 ns	0.2599 ns	0.0142 ns	189 B	-	-	-	-
WriteUInt64_MemoryBinaryPrimitivesInlined	19.719 ns	0.2569 ns	0.0141 ns	214 B	-	-	-	-

    [ShortRunJob]
    [MemoryDiagnoser]
    [DisassemblyDiagnoser]
    public class NetworkOrderSerializerBenchmark
    {
        private Memory<byte> _memory;
        private ulong _start;
        
        public NetworkOrderSerializerBenchmark()
        {
            _memory = new Memory<byte>(new byte[sizeof(byte) * 100]);
            _start = 50;
        }


        [Benchmark]
        public void WriteUInt64()
        {
            for (ulong i = _start; i < _start + 10; i++)
            {
                NetworkOrderSerializer.WriteUInt64(_memory, i);
            }
        }

        [Benchmark]
        public void WriteUInt64_Span()
        {
            var span = _memory.Span;
            for (ulong i = _start; i < _start + 10; i++)
            {
                NetworkOrderSerializer.WriteUInt64_Span(span, i);
            }
        }

        [Benchmark]
        public void WriteUInt64_SpanAndBinaryPrimitives()
        {
            var span = _memory.Span;
            for (ulong i = _start; i < _start + 10; i++)
            {
                NetworkOrderSerializer.WriteUInt64_SpanAndBinaryPrimitives(span, i);
            }
        }

        [Benchmark]
        public void WriteUInt64_BinaryPrimitivesInlined()
        {
            var span = _memory.Span;
            for (ulong i = _start; i < _start + 10; i++)
            {
                NetworkOrderSerializer.WriteUInt64_SpanAndBinaryPrimitives_Inlinable(span, i);
            }
        }

        [Benchmark]
        public void WriteUInt64_MemoryBinaryPrimitivesInlined()
        {
            for (ulong i = _start; i < _start + 10; i++)
            {
                NetworkOrderSerializer.WriteUInt64_BinaryPrimitives_Inlinable(_memory, i);
            }
        }
    }

/*  Where the methods in NetworkOrderSerializer */

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        internal static void WriteUInt64(Memory<byte> memory, ulong val)
        {
            if (memory.Length < 8)
            {
                throw new ArgumentOutOfRangeException(nameof(memory), "Insufficient length to write UInt64 from memory.");
            }

            SerializeUInt64(memory.Span, val);
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        internal static void WriteUInt64_Span(Span<byte> span, ulong val)
        {
            if (span.Length < 8)
            {
                throw new ArgumentOutOfRangeException(nameof(span), "Insufficient length to write UInt64 from memory.");
            }

            SerializeUInt64(span, val);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        internal static void WriteUInt64_SpanAndBinaryPrimitives(Span<byte> span, ulong val)
        {
            BinaryPrimitives.WriteUInt64BigEndian(span, val);
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        internal static void WriteUInt64_SpanAndBinaryPrimitives_Inlinable(Span<byte> span, ulong val)
        {
            BinaryPrimitives.WriteUInt64BigEndian(span, val);
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        internal static void WriteUInt64_BinaryPrimitives_Inlinable(Memory<byte> memory, ulong val)
        {
            BinaryPrimitives.WriteUInt64BigEndian(memory.Span, val);
        }

stebet added 3 commits April 2, 2020 15:30

Network byte order optimizations when reading data.

e091932

Optimizing serialization code.

b67d6fd

More optimizations.

ea6dd27

lukebakken self-assigned this Apr 3, 2020

lukebakken added this to the 6.0.0 milestone Apr 3, 2020

lukebakken added the enhancement label Apr 3, 2020

michaelklishin approved these changes Apr 3, 2020

View reviewed changes

michaelklishin merged commit 337f96c into rabbitmq:master Apr 7, 2020

stebet deleted the serializationOptimizations branch April 7, 2020 09:45

stebet mentioned this pull request May 25, 2020

Adding BinaryPrimitives support for NETSTANDARD targets #851

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialization optimizations #801

Serialization optimizations #801

stebet commented Apr 3, 2020

lukebakken commented Apr 3, 2020

stebet commented Apr 3, 2020

stebet commented Apr 3, 2020

bollhals commented May 23, 2020

stebet commented May 23, 2020

bollhals commented May 23, 2020

stebet commented May 25, 2020

bollhals commented May 25, 2020

stebet commented May 25, 2020

bollhals commented May 25, 2020

Serialization optimizations #801

Serialization optimizations #801

Conversation

stebet commented Apr 3, 2020

Proposed Changes

Types of Changes

Checklist

lukebakken commented Apr 3, 2020

stebet commented Apr 3, 2020

stebet commented Apr 3, 2020

bollhals commented May 23, 2020

stebet commented May 23, 2020

bollhals commented May 23, 2020

stebet commented May 25, 2020

bollhals commented May 25, 2020

stebet commented May 25, 2020

bollhals commented May 25, 2020