-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Allow the linux perf profiler to see Python calls #96143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
⚠️ ⚠️ Note for reviewers, hackers and fellow systems/low-level/compiler engineers⚠️ ⚠️ If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**. If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive.
|
minor missed test cleanup to use the modern API from the big review. Automerge-Triggered-By: GH:gpshead
That's really up to the user unfortunately. The files must be available after the process finishes and at report time so I don't see what we can do that automatically cleans these files because we don't know when the user has finished with them. Or do you mean in the buildbot? In that case, tests are deleting created files already so they should not be polluting the machine so these won't pile up in buildbots. |
Yes I was talking about the desired buildbot config. A tmpwatcher of some form set to tmp files over a few hours old is likely sufficient. With the environment variable set to enable perf everywhere, a single test run probably has hundreds if not thousands of PIDs. ;) (This is where the buildbot design really shows age. A fresh container per buildbot worker test session would make sense.) |
Ah in that case I don't think is needed. Tests check the perf files before and after the tests and deleted any new file that matches PIDs that have been spawned during tests. I will revise this logic to ensure it works correctly when running parallel test suites but I think that should be enough. |
@gpshead We have a perf buildbot now: https://buildbot.python.org/all/#/builders/1078
here is the relevant log from the last build:
|
sweet! FYI - our TensorFlow folks internally just re-worked their equivalent parallel perf-trampoline-hook Python profile enabling work to do what it needs using our backport of these changes to our internal 3.9 soon to be 3.10 runtime. Once they start using that hopefully to helps shake any strange issues out (of which I'm sure we all hope there are none). 😄 marking closed as I think everything for this is done at this point? |
Nothing critical that I can think. I will probably add some extra examples to the doc but I can open more issues for that.
That's wonderful. This will allow us to test this more heavily before the final release 👍 |
…poline files to the Python directory (python#98675)
The linux
perf
profiler is a very powerful tool but unfortunately is not able to see Python calls (only the C stack) and therefore it cannot be used (neither its very complete ecosystem) to profile Python applications and extensions.Turns out that node and the JVM have developed a way to leverage the
perf
profiler for the Java and javascript frames. They use their JIT compilers to generate a unique area in memory where they place assembly code that in turn calls the frame evaluator function. This JIT compiled areas are unique per function/code object. They use the perf maps (perf allows to place a map in/temp/perf-PID.map
with information mapping the JIT-ed areas to a string that identifies them and this allows perf to map java/javascript names to the JIT-ed areas, basically showing the non-native function names on the stack.We can do a simple version of this idea in Python by using a very simple JIT compiler that compiles a assembly template that is the used to jump to
PyEval_EvalFrameDefault
and we can place the code names and filenames in the specialperf
file. This allows perf to see Python calls as well:And this works with all the tools in the perf ecosystem, like flamegraphs:
See also:
https://www.brendangregg.com/Slides/KernelRecipes_Perf_Events.pdf
The text was updated successfully, but these errors were encountered: