Allow the linux perf profiler to see Python calls #96143

pablogsal · 2022-08-20T17:32:57Z

The linux perf profiler is a very powerful tool but unfortunately is not able to see Python calls (only the C stack) and therefore it cannot be used (neither its very complete ecosystem) to profile Python applications and extensions.

Turns out that node and the JVM have developed a way to leverage the perf profiler for the Java and javascript frames. They use their JIT compilers to generate a unique area in memory where they place assembly code that in turn calls the frame evaluator function. This JIT compiled areas are unique per function/code object. They use the perf maps (perf allows to place a map in /temp/perf-PID.map with information mapping the JIT-ed areas to a string that identifies them and this allows perf to map java/javascript names to the JIT-ed areas, basically showing the non-native function names on the stack.

We can do a simple version of this idea in Python by using a very simple JIT compiler that compiles a assembly template that is the used to jump to PyEval_EvalFrameDefault and we can place the code names and filenames in the special perf file. This allows perf to see Python calls as well:

And this works with all the tools in the perf ecosystem, like flamegraphs:

See also:
https://www.brendangregg.com/Slides/KernelRecipes_Perf_Events.pdf

The text was updated successfully, but these errors were encountered:

pablogsal · 2022-08-21T16:07:16Z

Is also very easy to transform these into python-only flamegraphs by filtering the py:: prefix:

⚠️

⚠️ Note for reviewers, hackers and fellow systems/low-level/compiler engineers ⚠️ ⚠️ If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**. If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive.

…#96433) * gh-96132: Add some comments and minor fixes missed in the original PR * Update Doc/using/cmdline.rst Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com> Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>

gpshead · 2022-08-31T23:19:30Z

A Linux buildbot with PYTHONPERFSUPPORT=1 and relevant CFLAGS=-fno-omit-frame-pointer needs to be setup.
Something needs to garbage collect it's /tmp/perf-$pid.map files as well.

minor missed test cleanup to use the modern API from the big review. Automerge-Triggered-By: GH:gpshead

pablogsal · 2022-09-01T09:50:13Z

Something needs to garbage collect it's /tmp/perf-$pid.map files as well.

That's really up to the user unfortunately. The files must be available after the process finishes and at report time so I don't see what we can do that automatically cleans these files because we don't know when the user has finished with them.

Or do you mean in the buildbot? In that case, tests are deleting created files already so they should not be polluting the machine so these won't pile up in buildbots.

gpshead · 2022-09-02T06:59:54Z

Or do you mean in the buildbot? In that case, tests are deleting created files already so they should not be polluting the machine so these won't pile up in buildbots.

Yes I was talking about the desired buildbot config. A tmpwatcher of some form set to tmp files over a few hours old is likely sufficient. With the environment variable set to enable perf everywhere, a single test run probably has hundreds if not thousands of PIDs. ;)

(This is where the buildbot design really shows age. A fresh container per buildbot worker test session would make sense.)

pablogsal · 2022-09-02T08:42:40Z

Yes I was talking about the desired buildbot config. A tmpwatcher of some form set to tmp files over a few hours old is likely sufficient. With the environment variable set to enable perf everywhere, a single test run probably has hundreds if not thousands of PIDs. ;)

Ah in that case I don't think is needed. Tests check the perf files before and after the tests and deleted any new file that matches PIDs that have been spawned during tests. I will revise this logic to ensure it works correctly when running parallel test suites but I think that should be enough.

)

pablogsal · 2022-11-01T11:16:52Z

@gpshead We have a perf buildbot now:

https://buildbot.python.org/all/#/builders/1078

[buildbot@4142e9f43556 build]$ ./python -m test test_perf_profiler -v
== CPython 3.12.0a1+ (heads/main:0e15c31c7e, Nov 1 2022, 11:10:40) [GCC 12.2.0]
== Linux-5.4.0-131-generic-x86_64-with-glibc2.36 little-endian
== cwd: /buildbot/buildarea/3.x.pablogsal-arch-x86_64.perfbuild/build/build/test_python_21653æ
== CPU count: 8
== encodings: locale=UTF-8, FS=utf-8
0:00:00 load avg: 1.57 Run tests sequentially
0:00:00 load avg: 1.57 [1/1] test_perf_profiler
test_python_calls_appear_in_the_stack_if_perf_activated (test.test_perf_profiler.TestPerfProfiler.test_python_calls_appear_in_the_stack_if_perf_activated) ... ok
test_python_calls_do_not_appear_in_the_stack_if_perf_activated (test.test_perf_profiler.TestPerfProfiler.test_python_calls_do_not_appear_in_the_stack_if_perf_activated) ... ok
test_sys_api (test.test_perf_profiler.TestPerfTrampoline.test_sys_api) ... ok
test_sys_api_get_status (test.test_perf_profiler.TestPerfTrampoline.test_sys_api_get_status) ... ok
test_sys_api_with_existing_trampoline (test.test_perf_profiler.TestPerfTrampoline.test_sys_api_with_existing_trampoline) ... ok
test_sys_api_with_invalid_trampoline (test.test_perf_profiler.TestPerfTrampoline.test_sys_api_with_invalid_trampoline) ... ok
test_trampoline_works (test.test_perf_profiler.TestPerfTrampoline.test_trampoline_works) ... ok
test_trampoline_works_with_forks (test.test_perf_profiler.TestPerfTrampoline.test_trampoline_works_with_forks) ... ok

----------------------------------------------------------------------
Ran 8 tests in 0.928s

OK

== Tests result: SUCCESS ==

1 test OK.

Total duration: 1.2 sec
Tests result: SUCCESS

here is the relevant log from the last build:

0:00:47 load avg: 1.03 [ 80/437] test_perf_profiler passed

gpshead · 2022-11-01T18:59:31Z

sweet!

FYI - our TensorFlow folks internally just re-worked their equivalent parallel perf-trampoline-hook Python profile enabling work to do what it needs using our backport of these changes to our internal 3.9 soon to be 3.10 runtime. Once they start using that hopefully to helps shake any strange issues out (of which I'm sure we all hope there are none). 😄

marking closed as I think everything for this is done at this point?

pablogsal · 2022-11-01T19:59:33Z

marking closed as I think everything for this is done at this point?

Nothing critical that I can think. I will probably add some extra examples to the doc but I can open more issues for that.

FYI - our TensorFlow folks internally just re-worked their equivalent parallel perf-trampoline-hook Python profile enabling work to do what it needs using our backport of these changes to our internal 3.9 soon to be 3.10 runtime. Once they start using that hopefully to helps shake any strange issues out (of which I'm sure we all hope there are none).

That's wonderful. This will allow us to test this more heavily before the final release 👍

…poline files to the Python directory (python#98675)

bedevere-bot mentioned this issue Aug 20, 2022

gh-96143: Allow Linux perf profiler to see Python calls #96123

Merged

kumaraditya303 added type-feature A feature request or enhancement 3.12 only security fixes labels Aug 20, 2022

pablogsal self-assigned this Aug 20, 2022

tiran mentioned this issue Aug 22, 2022

Build Python with frame pointers (-fno-omit-frame-pointer) #96174

Open

bedevere-bot mentioned this issue Aug 30, 2022

gh-96143: Add some comments and minor fixes missed in the original PR #96433

Merged

bedevere-bot mentioned this issue Aug 30, 2022

gh-96143: Improve perf profiler docs #96445

Merged

erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 30, 2022

pythongh-96143: Improve perf profiler docs

41bab32

bedevere-bot mentioned this issue Sep 1, 2022

gh-96143: subprocess API %s/universal_newlines=/text=/g. #96468

Merged

miss-islington pushed a commit that referenced this issue Sep 1, 2022

gh-96143: subprocess API %s/universal_newlines=/text=/g. (GH-96468)

e93d1bd

minor missed test cleanup to use the modern API from the big review. Automerge-Triggered-By: GH:gpshead

bedevere-bot mentioned this issue Sep 1, 2022

gh-96143: Clear instruction cache after mprotect call #96476

Merged

pablogsal added a commit to pablogsal/cpython that referenced this issue Sep 1, 2022

pythongh-96143: Clear instruction cache after mprotect call

e000d0e

pablogsal added a commit that referenced this issue Sep 8, 2022

gh-96143: Clear instruction cache after mprotect call (#96476)

3fedfcf

bedevere-bot mentioned this issue Oct 25, 2022

gh-96143: Move the perf trampoline files to the Python directory #98675

Merged

pablogsal added a commit to pablogsal/cpython that referenced this issue Oct 25, 2022

pythongh-96143: Move the perf trampoline files to the Python directory

a581a3a

pablogsal added a commit that referenced this issue Oct 25, 2022

gh-96143: Move the perf trampoline files to the Python directory (#98675

1f737ed

)

pablogsal pushed a commit that referenced this issue Oct 27, 2022

gh-96143: Improve perf profiler docs (#96445)

723ebe7

gvanrossum pushed a commit to gvanrossum/cpython that referenced this issue Oct 28, 2022

pythongh-96143: Improve perf profiler docs (python#96445)

18ac789

gpshead closed this as completed Nov 1, 2022

dependabot bot added a commit to ronaldoussoren/cpython that referenced this issue Nov 14, 2022

pythongh-96143: Improve perf profiler docs (python#96445)

63b6c66

rajveerb pushed a commit to rajveerb/cpython that referenced this issue Mar 12, 2023

made changes according to 1f737ed: pythongh-96143: Move the perf tram…

731f90b

…poline files to the Python directory (python#98675)

czardoz mentioned this issue Sep 19, 2023

[proposal] Allow "precompiled" perf-trampolines to largely mitigate the cost of enabling perf-trampolines #109587

Closed

ixgbe00 mentioned this issue Jun 5, 2024

gh-120400 ：Support Linux perf profile to see Python calls on RISC-V architecture #120089

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow the linux perf profiler to see Python calls #96143

Allow the linux perf profiler to see Python calls #96143

pablogsal commented Aug 20, 2022

pablogsal commented Aug 21, 2022

gpshead commented Aug 31, 2022 •

edited by pablogsal

Loading

pablogsal commented Sep 1, 2022 •

edited

Loading

gpshead commented Sep 2, 2022

pablogsal commented Sep 2, 2022

pablogsal commented Nov 1, 2022

gpshead commented Nov 1, 2022

pablogsal commented Nov 1, 2022

Allow the linux perf profiler to see Python calls #96143

Allow the linux perf profiler to see Python calls #96143

Comments

pablogsal commented Aug 20, 2022

pablogsal commented Aug 21, 2022

gpshead commented Aug 31, 2022 • edited by pablogsal Loading

pablogsal commented Sep 1, 2022 • edited Loading

gpshead commented Sep 2, 2022

pablogsal commented Sep 2, 2022

pablogsal commented Nov 1, 2022

gpshead commented Nov 1, 2022

pablogsal commented Nov 1, 2022

gpshead commented Aug 31, 2022 •

edited by pablogsal

Loading

pablogsal commented Sep 1, 2022 •

edited

Loading