On shutdown wait for lambda logs API to report the final platform report metrics #347

lahsivjar · 2022-11-23T07:18:12Z

Motivation

platform.report metrics for a lambda invocation are reported in the future invocation (in most cases the next one). Due to this, we will not have the report metrics for the last invocation till shutdown, and, as per the behavior of the extension prior to this PR, we will end up dropping the last metric. For periodic (hourly/daily/weekly) invocations, this will lead to no platform.report metric.

Solution

During shutdown the extension gets a deadline of 2 seconds, this PR uses some of the 2 seconds to wait for the logs API to send us the platform.report metric for the last seen invocation. This wait is executed for each execution env/instance that served an invocation (NOT for each invocation). Per our initial benchmarks, the wait lasts for a maximum of 40ms (avg: ~5ms).

While the PR fixes the platform.report metric for successful invocations, for function crashes (timeouts/OOMs) it is still possible to miss the last platform.report.

Note that the platform.report metric for the last invocation can take as much as 45 minutes to be reported since they will be collected when the lambda execution env shuts down.

How to test?

Create a lambda function with the latest version of the extension and configure it to send load to APM-Server.
Invoke the lambda function a specific number of times.
Observe the number of platform.report metrics in Kibana (can be filtered by kql faas.billed_duration : * for metrics datastream) and assert that it is same as the number of function invocations. (Note that it will take up to 45 minutes for all the platform.report metrics to be indexed).

Steps 1 & 2 can be performed by running cd testing && LAMBDA_RUNTIME=go1.x EC_API_KEY=<ec_api_key> make bench. By default, this will make 500 requests (can be visualized in the summary generated after above command). After confirmation of the test please run LAMBDA_RUNTIME=go1.x EC_API_KEY=<ec_api_key> make destroy to delete the infrastructure.

Related Issues

Related to #334

apmmachine · 2022-11-23T07:26:43Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-11-28T09:32:32.195+0000
Duration: 9 min 36 sec

Test stats 🧪

Test	Results
Failed	0
Passed	202
Skipped	2
Total	204

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

AlexanderWert · 2022-11-24T12:15:03Z

@lahsivjar One question: In case that the metrics get reported much later (e.g. with the shutdown, 45 mins after the last invocation), which timestamp is reported for the corresponding metric? Is it the timestamp of the previous invocation or the timestamp of when the platform.report event is processed (which is in that case 45min later)?

In the latter case, would there be an easy way to use the invocation timestamp?

AlexanderWert · 2022-11-24T13:11:03Z

I think I found the answer to my question :-)

lahsivjar · 2022-11-24T13:17:01Z

In case that the metrics get reported much later (e.g. with the shutdown, 45 mins after the last invocation), which timestamp is reported for the corresponding metric?

@AlexanderWert The timestamp reported will be the timestamp in the platform.report log event. I am not sure if that time represents the timestamp of the previous invocation or the timestamp at which the log event was generated but definitely not the processing time. Also, in my tests I have observed the shutdown time to be in the range of 5 to 15 minutes (attaching an example of 250 function invocations plotted with @timestamp field as well as with ingested-time to give a rough idea).

^ plotted with @timestamp

^ plotted with ingestion-time

AlexanderWert · 2022-11-24T13:34:38Z

looking forward to test this with my functions that run periodically once a day (right now I don't see any metrics at all, hope this will change it ) :-)

kruskall · 2022-12-05T15:25:02Z

Somehow the assignee only got applied to the other extension PR.

I tested this on 8.6.0 and it worked fine. I discovered an issue with how we were compiling the test function and opened a PR to fix that (#350).
Followed the how to test section and used LAMBDA_RUNTIME=go1.x EC_API_KEY=<ec_api_key> STACK_VERSION=8.6.0 make bench. The number matches the result from the make task output and the platform.report metrics are reported correctly.

github-actions bot added the aws-λ-extension AWS Lambda Extension label Nov 23, 2022

Wait for the final platform report metrics on shutdown

2fad63b

lahsivjar force-pushed the 334-fix-pf-report branch from 6bfbe07 to 2fad63b Compare November 24, 2022 11:53

lahsivjar marked this pull request as ready for review November 24, 2022 12:09

Fix log message

0897019

lahsivjar requested review from a team and AlexanderWert November 24, 2022 12:18

AlexanderWert approved these changes Nov 24, 2022

View reviewed changes

lahsivjar added 2 commits November 24, 2022 21:43

Revert flush buffer to 200ms

ad3ebdc

Refactor process logs to not return an error

11a68d4

axw approved these changes Nov 28, 2022

View reviewed changes

Add changelog

68357b5

lahsivjar enabled auto-merge (squash) November 28, 2022 09:32

lahsivjar merged commit 5ed72b5 into elastic:main Nov 28, 2022

lahsivjar deleted the 334-fix-pf-report branch November 28, 2022 09:47

lahsivjar mentioned this pull request Nov 28, 2022

8.6 Manual Test Plan elastic/apm-server#9556

Closed

2 tasks

kruskall mentioned this pull request Dec 5, 2022

test: disable cgo in go test function #350

Merged

kruskall self-assigned this Dec 5, 2022

kruskall removed their assignment Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On shutdown wait for lambda logs API to report the final platform report metrics #347

On shutdown wait for lambda logs API to report the final platform report metrics #347

lahsivjar commented Nov 23, 2022 •

edited

Loading

apmmachine commented Nov 23, 2022 •

edited

Loading

Build stats

Test stats 🧪

AlexanderWert commented Nov 24, 2022 •

edited

Loading

AlexanderWert commented Nov 24, 2022

lahsivjar commented Nov 24, 2022

AlexanderWert commented Nov 24, 2022

kruskall commented Dec 5, 2022 •

edited

Loading

On shutdown wait for lambda logs API to report the final platform report metrics #347

On shutdown wait for lambda logs API to report the final platform report metrics #347

Conversation

lahsivjar commented Nov 23, 2022 • edited Loading

Motivation

Solution

How to test?

Related Issues

apmmachine commented Nov 23, 2022 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

🤖 GitHub comments

AlexanderWert commented Nov 24, 2022 • edited Loading

AlexanderWert commented Nov 24, 2022

lahsivjar commented Nov 24, 2022

AlexanderWert commented Nov 24, 2022

kruskall commented Dec 5, 2022 • edited Loading

lahsivjar commented Nov 23, 2022 •

edited

Loading

apmmachine commented Nov 23, 2022 •

edited

Loading

AlexanderWert commented Nov 24, 2022 •

edited

Loading

kruskall commented Dec 5, 2022 •

edited

Loading