-
Notifications
You must be signed in to change notification settings - Fork 31
On shutdown wait for lambda logs API to report the final platform report metrics #347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6bfbe07
to
2fad63b
Compare
@lahsivjar One question: In case that the metrics get reported much later (e.g. with the shutdown, 45 mins after the last invocation), which timestamp is reported for the corresponding metric? Is it the timestamp of the previous invocation or the timestamp of when the In the latter case, would there be an easy way to use the invocation timestamp? |
I think I found the answer to my question :-) |
@AlexanderWert The timestamp reported will be the timestamp in the |
looking forward to test this with my functions that run periodically once a day (right now I don't see any metrics at all, hope this will change it ) :-) |
Somehow the assignee only got applied to the other extension PR. I tested this on 8.6.0 and it worked fine. I discovered an issue with how we were compiling the test function and opened a PR to fix that (#350). |
Motivation
platform.report
metrics for a lambda invocation are reported in the future invocation (in most cases the next one). Due to this, we will not have the report metrics for the last invocation till shutdown, and, as per the behavior of the extension prior to this PR, we will end up dropping the last metric. For periodic (hourly/daily/weekly) invocations, this will lead to noplatform.report
metric.Solution
During shutdown the extension gets a deadline of 2 seconds, this PR uses some of the 2 seconds to wait for the logs API to send us the
platform.report
metric for the last seen invocation. This wait is executed for each execution env/instance that served an invocation (NOT for each invocation). Per our initial benchmarks, the wait lasts for a maximum of 40ms (avg: ~5ms).While the PR fixes the
platform.report
metric for successful invocations, for function crashes (timeouts/OOMs) it is still possible to miss the lastplatform.report
.Note that the
platform.report
metric for the last invocation can take as much as 45 minutes to be reported since they will be collected when the lambda execution env shuts down.How to test?
platform.report
metrics in Kibana (can be filtered by kqlfaas.billed_duration : *
for metrics datastream) and assert that it is same as the number of function invocations. (Note that it will take up to 45 minutes for all theplatform.report
metrics to be indexed).Steps 1 & 2 can be performed by running
cd testing && LAMBDA_RUNTIME=go1.x EC_API_KEY=<ec_api_key> make bench
. By default, this will make 500 requests (can be visualized in the summary generated after above command). After confirmation of the test please runLAMBDA_RUNTIME=go1.x EC_API_KEY=<ec_api_key> make destroy
to delete the infrastructure.Related Issues
Related to #334