-
Notifications
You must be signed in to change notification settings - Fork 31
feat: parse server response and react to error messages #281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Any change to the transport status is guarded by a mutex but some methods were using the field directly, causing race conditions. apmproxy tests are using a timeout of 5s to wait for the status to change during backoff. However there is a subtle bug that causes the test to fail because the initial delay is exactly 5s. To avoid this, the tests wait 7s now. Update status name and documentation.
Start backoff mechanism on critical error and log a warning on client errors.
d726fca
to
6d5679d
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, mostly just wondering about the decision not to backoff on auth & validation failures. We should probably have a test or two covering the new states.
// No need to start backoff, this is a temporary status. It usually | ||
// means we went over the limit of events/s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would ClientFailing be temporary? From what I can see above, it's either due to auth failure (probably not temporary without user intervention?) or some validation error (probably implies a bug in the agent or server?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can see above, it's either due to auth failure (probably not temporary without user intervention?)
Ah, good point! I've made auth errors a critical failure
When would ClientFailing be temporary?
From what I could see from the middlewares in the APM server repository, this would happen on data decoding/validation errors, request body too large or invalid query. Those are errors tied to a specific request, I don't think we should trigger a backoff and associated delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I could see from the middlewares in the APM server repository, this would happen on data decoding/validation errors, request body too large or invalid query. Those are errors tied to a specific request, I don't think we should trigger a backoff and associated delay.
good point, I think this is fine as is now.
auth errors are not temporary failures
// No need to start backoff, this is a temporary status. It usually | ||
// means we went over the limit of events/s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I could see from the middlewares in the APM server repository, this would happen on data decoding/validation errors, request body too large or invalid query. Those are errors tied to a specific request, I don't think we should trigger a backoff and associated delay.
good point, I think this is fine as is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for the changes.
@axw any other concerns from your side? I believe this is ready to merge otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the silence, I lost sight of this one. LGTM.
Start backoff mechanism on critical error and log a warning on
client errors.
Any change to the transport status is guarded by a mutex but
some methods were using the field directly, causing race
conditions.
apmproxy tests are using a timeout of 5s to wait for the status
to change during backoff. However there is a subtle bug that
causes the test to fail because the initial delay is exactly 5s.
To avoid this, the tests wait 7s now.
Blocked by #280
Closes #225
Closes #205