Skip to content

fix: forward logs directly when invocation is over #613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kruskall
Copy link
Member

@kruskall kruskall commented Feb 18, 2025

The lambda extension process data in the background during an invocation:

apm-aws-lambda/app/run.go

Lines 197 to 204 in 3f5620c

// APM Data Processing
backgroundDataSendWg.Add(1)
go func() {
defer backgroundDataSendWg.Done()
if err := app.apmClient.ForwardApmData(invocationCtx); err != nil {
app.logger.Error(err)
}
}()

Data is sent to a channel. This is fine during an invocation when the goroutine is running but it's possible that during shutdown the channel will block because there's no goroutine running.

app.logsClient.FlushData(ctx, event.RequestID, event.InvokedFunctionArn, app.apmClient.ForwardLambdaData, true)

The solution is to bypass the channel and forward the data directly during shutdown.

do not send to the lambda data chan and potentially wait
forever causing a timeout on shutdown
@kruskall kruskall requested a review from rockdaboot February 18, 2025 13:37
@github-actions github-actions bot added the aws-λ-extension AWS Lambda Extension label Feb 18, 2025
@rockdaboot
Copy link
Contributor

Can you describe the issue in the PR description and how this PR solves it?

Co-authored-by: Tim Rühsen <tim.ruhsen@elastic.co>
@kruskall
Copy link
Member Author

Updated! 👍

Copy link
Member

@dmathieu dmathieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add tests, or is that too unpredictable?

@rockdaboot
Copy link
Contributor

Updated! 👍

Thanks! Maybe I misunderstand the code changes, but it looks to me that you replace direct writes to the channel with a function that writes to the data channel. How is that a difference in the data flow logic and don't we still have the same issue then (channel reader doesn't read any more)?

I somewhat agree with @dmathieu that we should try to create a test that reproduces the issue, and then prove that this code solves the issue. It doesn't have to be a unit test, if that is too involved. Maybe some kind of tool / setup / script, whatever is the simplest way.

@sboomsma
Copy link

just nudging here in case it gets forgotten because the related SDH was auto-closed

@kruskall
Copy link
Member Author

kruskall commented Apr 9, 2025

Thanks! Maybe I misunderstand the code changes, but it looks to me that you replace direct writes to the channel with a function that writes to the data channel. How is that a difference in the data flow logic and don't we still have the same issue then (channel reader doesn't read any more)?

There are three places that have been updated. The first two are directly forwarding the data bypassing the channel, the last one is unchanged and just send to a channel.
For the third case: It's fine to send to a channel inside processEvent because there's someone reading from it and in the worst case the ctx will make the func exit.

I somewhat agree with @dmathieu that we should try to create a test that reproduces the issue, and then prove that this code solves the issue. It doesn't have to be a unit test, if that is too involved. Maybe some kind of tool / setup / script, whatever is the simplest way.

maybe we can reuse the setup from @xrmx ? I'm not sure how feasible it is to write a unit test for this.

@rockdaboot
Copy link
Contributor

For the third case: It's fine to send to a channel inside processEvent because there's someone reading from it and in the worst case the ctx will make the func exit.

Thanks for the explanation.

maybe we can reuse the setup from @xrmx ? I'm not sure how feasible it is to write a unit test for this.

@xrmx Can you run your reproducer again with the changes here and confirm that it fixes the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws-λ-extension AWS Lambda Extension
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants