Skip to content

simplify missed heartbeat recognition #854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 27, 2020

Conversation

bollhals
Copy link
Contributor

Proposed Changes

The NotifyHeartbeatListener() was showing up on my analysis. Even though it is fairly small and quick, but it is called for every received frame.

I think this simplified implementation should be fast and as safe as before.
Some comments to the implementation:

  • AFAIK _heartbeatCounter is only updated by a single thread (MainLoop), so no additional thread safety needed.
  • The reading do not need to be 100% exact as 1) as long as they're different it's good enough and 2) they're called so little, they won't suffer reading the wrong cached value and even if, nothing changes until 8 missed heartbeat anyway.

Types of Changes

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

@lukebakken lukebakken self-assigned this May 27, 2020
@michaelklishin
Copy link
Contributor

Yes, we don't need perfect precision here simply for the fact that the absolute recommended minimum for heartbeat timeout is 5 seconds, and in most cases something closer to 15-60 seconds would be reasonable.

Copy link
Contributor

@michaelklishin michaelklishin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use a monotonically growing counter that is incremented every time we read a frame, on a long running connection we should overflow int in less than 3 days at a rate of 10K messages/second.

How much of an improvement does this yield? The "ticking overflowing counter" approach is worrisome to me.

@michaelklishin
Copy link
Contributor

michaelklishin commented May 27, 2020

Actually we would overflow int in less than day since every delivery in practice will use at least three frames:

2147483647/(10000*3*3600*24)

An unsigned 64-bit integer would last us quite a bit longer (over 1900 years or so if my math is correct).

@lukebakken lukebakken force-pushed the simplify.missed.heartbeat branch from 1a4c3ac to 2e8ff5d Compare May 27, 2020 15:46
Copy link
Collaborator

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NotifyHeartbeatListener() was showing up on my analysis

Can we get more details about what "showing up on my analysis" means?

@lukebakken
Copy link
Collaborator

@michaelklishin I tested this commit using the following code. As @bollhals states we don't care what the value of _heartbeatCounter is, just that it changes.

lbakken@shostakovich ~/issues/rabbitmq/rabbitmq-dotnet-client/test-overflow
$ cat Program.cs 
using System;

namespace test_overflow
{
    class Program
    {
        static void Main(string[] args)
        {
            int val;

            unchecked
            {
                val = int.MaxValue + 2;
            }

            Console.WriteLine("0 int.MaxValue += 2 {0}", val);

            unchecked
            {
                val = val + 2;
            }

            Console.WriteLine("1 val = val + 2 {0}", val);                                                                                                                                                                                                                      
        }                                                                                                                                                                                                                                                                       
    }                                                                                                                                                                                                                                                                           
}
                                                                                                                                                                                                                                                                               
lbakken@shostakovich ~/issues/rabbitmq/rabbitmq-dotnet-client/test-overflow                                                                                                                                                                                                     
$ dotnet run                                                                                                                                                                                                                                                                    
0 int.MaxValue += 2 -2147483647                                                                                                                                                                                                                                                 
1 val = val + 2 -2147483645

@michaelklishin
Copy link
Contributor

OK, this looks like a simplification overall then. Let's wait for some profiler numbers to assess the difference.

@bollhals
Copy link
Contributor Author

Thanks for the comments, I thought about the overflow but somehow I thought that it will never happen. Glad you did the math.

I've checked again the implementation and I think we can even make it simpler. Since we don't care about the value, just that it changed, it should also be implementable with just 1 normal bool variable.

@bollhals
Copy link
Contributor Author

The NotifyHeartbeatListener() was showing up on my analysis

Can we get more details about what "showing up on my analysis" means?

Sure, this is from the test application of stebet:
image

@bollhals
Copy link
Contributor Author

image
With the bool implementation (I'll push it soon)

@bollhals
Copy link
Contributor Author

/reminder to squash before merge

Copy link
Collaborator

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bollhals I am probably mis-reading the screenshots in this comment, but it seems like the number of method calls to NotifyHeartbeatListener should be equal to the number of calls to ReadFrame. You can see that in your first screenshot, but the second one shows far fewer calls to NotifyHeartbeatListener which seems to suggest that the code run in the second example isn't what is shown in this PR. Please correct me if I am mistaken!

@bollhals
Copy link
Contributor Author

@bollhals I am probably mis-reading the screenshots in this comment, but it seems like the number of method calls to NotifyHeartbeatListener should be equal to the number of calls to ReadFrame. You can see that in your first screenshot, but the second one shows far fewer calls to NotifyHeartbeatListener which seems to suggest that the code run in the second example isn't what is shown in this PR. Please correct me if I am mistaken!

I wasn't sure either, and I had to do something else first.
But to me it looks like the method iteself gets inlined at some point later from the JIT or something else, because when I specify NoInlining to the method, the result is this:
image
image

@lukebakken
Copy link
Collaborator

@bollhals awesome, I really appreciate the explanation.

@lukebakken lukebakken modified the milestones: 7.0.0, 6.1.0 May 27, 2020
@lukebakken lukebakken merged commit ef01562 into rabbitmq:master May 27, 2020
michaelklishin pushed a commit that referenced this pull request May 27, 2020
simplify missed heartbeat recognition

(cherry picked from commit ef01562)
@michaelklishin
Copy link
Contributor

Backported to 6.x.

@bollhals bollhals deleted the simplify.missed.heartbeat branch March 2, 2021 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants