Skip to content

Commit 0f53df8

Browse files
committed
[CSSPGO][llvm-profgen] Fix external address issues of perf reader (return to external addr part)
Before we have an issue with artificial LBR whose source is a return, recalling that "an internal code(A) can return to external address, then from the external address call a new internal code(B), making an artificial branch that looks like a return from A to B can confuse the unwinder". We just ignore the LBRs after this artificial LBR which can miss some samples. This change aims at fixing this by correctly unwinding them instead of ignoring them. List some typical scenarios covered by this change. 1) multiple sequential call back happen in external address, e.g. ``` [ext, call, foo] [foo, return, ext] [ext, call, bar] ``` Unwinder should avoid having foo return from bar. Wrong call stack is like [foo, bar] 2) the call stack before and after external call should be correctly unwinded. ``` {call stack1} {call stack2} [foo, call, ext] [ext, call, bar] [bar, return, ext] [ext, return, foo ] ``` call stack 1 should be the same to call stack2. Both shouldn't be truncated 3) call stack should be truncated after call into external code since we can't do inlining with external code. ``` [foo, call, ext] [ext, call, bar] [bar, call, baz] [baz, return, bar ] [bar, return, ext] ``` the call stack of code in baz should not include foo. ### Implementation: We leverage artificial frame to fix #2 and #3: when we got a return artificial LBR, push an extra artificial frame to the stack. when we pop frame, check if the parent is an artificial frame to pop(fix #2). Therefore, call/ return artificial LBR is just the same as regular LBR which can keep the call stack. While recording context on the trie, artificial frame is used as a tag indicating that we should truncate the call stack(fix #3). To differentiate #1 and #2, we leverage `getCallAddrFromFrameAddr`. Normally the target of the return should be the next inst of a call inst and `getCallAddrFromFrameAddr` will return the address of call inst. Otherwise, getCallAddrFromFrameAddr will return to 0 which is the case of #1. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D115550
1 parent 30c3aba commit 0f53df8

File tree

7 files changed

+251
-34
lines changed

7 files changed

+251
-34
lines changed
Binary file not shown.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
7fe8d7620597
2+
4007b0
3+
7fe8d727e493
4+
5541f689495641d7
5+
0x40069e/0x7fe8d7620597/P/-/-/2 0x7fe8d7620595/0x400690/P/-/-/1 0x400751/0x7fe8d762058b/P/-/-/1 0x40072b/0x40074c/P/-/-/5 0x4006ce/0x400720/P/-/-/4 0x40071b/0x4006c0/P/-/-/7 0x7fe8d76205a3/0x400715/P/-/-/1 0x4006ae/0x7fe8d7620597/P/-/-/2 0x7fe8d7620595/0x4006a0/P/-/-/3 0x40069e/0x7fe8d762058b/P/-/-/3 0x7fe8d7620589/0x400690/P/-/-/4 0x400590/0x7fe8d7620560/P/-/-/1 0x400710/0x400590/P/-/-/10 0x4006be/0x4006ec/P/-/-/2 0x4006e7/0x4006b0/P/-/-/3 0x400747/0x4006d0/P/-/-/2 0x7fe8d7620589/0x400730/P/-/-/4 0x400590/0x7fe8d7620560/P/-/-/1 0x4007ab/0x400590/P/-/-/2 0x4007bf/0x40077d/P/-/-/1 0x7fe8d76205a3/0x4007b0/P/-/-/2 0x40069e/0x7fe8d7620597/P/-/-/2 0x7fe8d7620595/0x400690/P/-/-/1 0x400751/0x7fe8d762058b/P/-/-/1 0x40072b/0x40074c/P/-/-/4 0x4006ce/0x400720/P/-/-/4 0x40071b/0x4006c0/P/-/-/3 0x7fe8d76205a3/0x400715/P/-/-/4 0x4006ae/0x7fe8d7620597/P/-/-/2 0x7fe8d7620595/0x4006a0/P/-/-/3 0x40069e/0x7fe8d762058b/P/-/-/3 0x7fe8d7620589/0x400690/P/-/-/4
6+
7+
4006ec
8+
40074c
9+
7fe8d762058b
10+
4007b0
11+
7fe8d727e493
12+
5541f689495641d7
13+
0x4006be/0x4006ec/P/-/-/2 0x4006e7/0x4006b0/P/-/-/3 0x400747/0x4006d0/P/-/-/2 0x7fe8d7620589/0x400730/P/-/-/4 0x400590/0x7fe8d7620560/P/-/-/1 0x4007ab/0x400590/P/-/-/2 0x4007bf/0x40077d/P/-/-/3 0x7fe8d76205a3/0x4007b0/P/-/-/2 0x40069e/0x7fe8d7620597/P/-/-/2 0x7fe8d7620595/0x400690/P/-/-/1 0x400751/0x7fe8d762058b/P/-/-/1 0x40072b/0x40074c/P/-/-/7 0x4006ce/0x400720/P/-/-/4 0x40071b/0x4006c0/P/-/-/2 0x7fe8d76205a3/0x400715/P/-/-/1 0x4006ae/0x7fe8d7620597/P/-/-/2 0x7fe8d7620595/0x4006a0/P/-/-/3 0x40069e/0x7fe8d762058b/P/-/-/4 0x7fe8d7620589/0x400690/P/-/-/6 0x400590/0x7fe8d7620560/P/-/-/1 0x400710/0x400590/P/-/-/5 0x4006be/0x4006ec/P/-/-/3 0x4006e7/0x4006b0/P/-/-/3 0x400747/0x4006d0/P/-/-/2 0x7fe8d7620589/0x400730/P/-/-/4 0x400590/0x7fe8d7620560/P/-/-/1 0x4007ab/0x400590/P/-/-/2 0x4007bf/0x40077d/P/-/-/3 0x7fe8d76205a3/0x4007b0/P/-/-/2 0x40069e/0x7fe8d7620597/P/-/-/2 0x7fe8d7620595/0x400690/P/-/-/1 0x400751/0x7fe8d762058b/P/-/-/1
14+
15+
40074c
16+
7fe8d762058b
17+
4007b0
18+
7fe8d727e493
19+
5541f689495641d7
20+
0x40072b/0x40074c/P/-/-/6 0x4006ce/0x400720/P/-/-/8 0x40071b/0x4006c0/P/-/-/1 0x7fe8d76205a3/0x400715/P/-/-/2 0x4006ae/0x7fe8d7620597/P/-/-/2 0x7fe8d7620595/0x4006a0/P/-/-/1 0x40069e/0x7fe8d762058b/P/-/-/2 0x7fe8d7620589/0x400690/P/-/-/4 0x400590/0x7fe8d7620560/P/-/-/1 0x400710/0x400590/P/-/-/2 0x4006be/0x4006ec/P/-/-/2 0x4006e7/0x4006b0/P/-/-/3 0x400747/0x4006d0/P/-/-/2 0x7fe8d7620589/0x400730/P/-/-/4 0x400590/0x7fe8d7620560/P/-/-/1 0x4007ab/0x400590/P/-/-/2 0x4007bf/0x40077d/P/-/-/1 0x7fe8d76205a3/0x4007b0/P/-/-/2 0x40069e/0x7fe8d7620597/P/-/-/2 0x7fe8d7620595/0x400690/P/-/-/1 0x400751/0x7fe8d762058b/P/-/-/1 0x40072b/0x40074c/P/-/-/4 0x4006ce/0x400720/P/-/-/4 0x40071b/0x4006c0/P/-/-/10 0x7fe8d76205a3/0x400715/P/-/-/1 0x4006ae/0x7fe8d7620597/P/-/-/2 0x7fe8d7620595/0x4006a0/P/-/-/3 0x40069e/0x7fe8d762058b/P/-/-/2 0x7fe8d7620589/0x400690/P/-/-/4 0x400590/0x7fe8d7620560/P/-/-/1 0x400710/0x400590/P/-/-/6 0x4006be/0x4006ec/P/-/-/4
21+
22+
400720
23+
40074c
24+
7fe8d762058b
25+
4007b0
26+
7fe8d727e493
27+
5541f689495641d7
28+
0x4006ce/0x400720/P/-/-/4 0x40071b/0x4006c0/P/-/-/3 0x7fe8d76205a3/0x400715/P/-/-/1 0x4006ae/0x7fe8d7620597/P/-/-/2 0x7fe8d7620595/0x4006a0/P/-/-/3 0x40069e/0x7fe8d762058b/P/-/-/7 0x7fe8d7620589/0x400690/P/-/-/5 0x400590/0x7fe8d7620560/P/-/-/1 0x400710/0x400590/P/-/-/5 0x4006be/0x4006ec/P/-/-/3 0x4006e7/0x4006b0/P/-/-/3 0x400747/0x4006d0/P/-/-/2 0x7fe8d7620589/0x400730/P/-/-/4 0x400590/0x7fe8d7620560/P/-/-/1 0x4007ab/0x400590/P/-/-/2 0x4007bf/0x40077d/P/-/-/2 0x7fe8d76205a3/0x4007b0/P/-/-/2 0x40069e/0x7fe8d7620597/P/-/-/2 0x7fe8d7620595/0x400690/P/-/-/1 0x400751/0x7fe8d762058b/P/-/-/1 0x40072b/0x40074c/P/-/-/4 0x4006ce/0x400720/P/-/-/4 0x40071b/0x4006c0/P/-/-/2 0x7fe8d76205a3/0x400715/P/-/-/2 0x4006ae/0x7fe8d7620597/P/-/-/2 0x7fe8d7620595/0x4006a0/P/-/-/3 0x40069e/0x7fe8d762058b/P/-/-/2 0x7fe8d7620589/0x400690/P/-/-/4 0x400590/0x7fe8d7620560/P/-/-/1 0x400710/0x400590/P/-/-/5 0x4006be/0x4006ec/P/-/-/2 0x4006e7/0x4006b0/P/-/-/3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/callback-external-addr.perfscript --binary=%S/Inputs/callback-external-addr.perfbin --output=%t --skip-symbolization
2+
; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-UNWINDER
3+
4+
; Test if call stack is correctly truncated.
5+
; CHECK-UNWINDER-NOT: main:3 @ bar
6+
; CHECK-UNWINDER-NOT: main:3 @ foo
7+
; CHECK-UNWINDER-NOT: qux:3 @ baz
8+
; CHECK-UNWINDER-NOT: qux:3 @ bar
9+
10+
; Test if return to wrong internal target
11+
; CHECK-UNWINDER-NOT: baz:0 @ bar
12+
; CHECK-UNWINDER-NOT: bar:0 @ baz
13+
; CHECK-UNWINDER-NOT: baz:0 @ main
14+
; CHECK-UNWINDER-NOT: bar:0 @ foo
15+
; CHECK-UNWINDER-NOT: baz:0 @ qux
16+
17+
; Test for callback return from internal address to external address.
18+
; [foo:2 @ qux:2 @ callBeforeReturn] and [foo:2 @ qux:4 @ callAfterReturn] should exist
19+
; which means the callback return won't interrupt the previous call stack
20+
21+
; CHECK-UNWINDER: [bar]
22+
; CHECK-UNWINDER: 1
23+
; CHECK-UNWINDER: 690-69e:12
24+
; CHECK-UNWINDER: 0
25+
; CHECK-UNWINDER: [baz]
26+
; CHECK-UNWINDER: 1
27+
; CHECK-UNWINDER: 6a0-6ae:7
28+
; CHECK-UNWINDER: 0
29+
; CHECK-UNWINDER: [foo]
30+
; CHECK-UNWINDER: 2
31+
; CHECK-UNWINDER: 730-747:5
32+
; CHECK-UNWINDER: 74c-751:5
33+
; CHECK-UNWINDER: 1
34+
; CHECK-UNWINDER: 747->6d0:5
35+
; CHECK-UNWINDER: [foo:2 @ qux]
36+
; CHECK-UNWINDER: 4
37+
; CHECK-UNWINDER: 6d0-6e7:5
38+
; CHECK-UNWINDER: 6ec-710:6
39+
; CHECK-UNWINDER: 715-71b:7
40+
; CHECK-UNWINDER: 720-72b:6
41+
; CHECK-UNWINDER: 3
42+
; CHECK-UNWINDER: 6e7->6b0:6
43+
; CHECK-UNWINDER: 71b->6c0:7
44+
; CHECK-UNWINDER: 72b->74c:6
45+
; CHECK-UNWINDER: [foo:2 @ qux:2 @ callBeforeReturn]
46+
; CHECK-UNWINDER: 1
47+
; CHECK-UNWINDER: 6b0-6be:6
48+
; CHECK-UNWINDER: 1
49+
; CHECK-UNWINDER: 6be->6ec:7
50+
; CHECK-UNWINDER: [foo:2 @ qux:4 @ callAfterReturn]
51+
; CHECK-UNWINDER: 1
52+
; CHECK-UNWINDER: 6c0-6ce:7
53+
; CHECK-UNWINDER: 1
54+
; CHECK-UNWINDER: 6ce->720:7
55+
; CHECK-UNWINDER: [main]
56+
; CHECK-UNWINDER: 2
57+
; CHECK-UNWINDER: 77d-7ab:5
58+
; CHECK-UNWINDER: 7b0-7bf:5
59+
; CHECK-UNWINDER: 1
60+
; CHECK-UNWINDER: 7bf->77d:5
61+
62+
; libcallback.c
63+
; clang -shared -fPIC -o libcallback.so libcallback.c
64+
65+
int callback(int *cnt, int (*func1)(int), int (*func2)(int), int p) {
66+
(*cnt)++;
67+
return func1(p) + func2(p);
68+
}
69+
70+
; test.c
71+
; clang test.c -O0 -g -fno-optimize-sibling-calls -fdebug-info-for-profiling -L $PWD -lcallback -fno-inline
72+
73+
#include <stdio.h>
74+
75+
int callbackCnt = 0;
76+
77+
int callback(int *cnt, int (*func1)(int), int (*func2)(int), int p);
78+
79+
int bar(int p) {
80+
return p + 1;
81+
}
82+
83+
int baz(int p) {
84+
return p - 1;
85+
}
86+
87+
int callBeforeReturn(int p) {
88+
return p + 10;
89+
}
90+
91+
int callAfterReturn(int p) {
92+
return p - 10;
93+
}
94+
95+
int qux(int p) {
96+
p += 10;
97+
int ret = callBeforeReturn(p);
98+
ret = callback(&callbackCnt, bar, baz, ret);
99+
ret = callAfterReturn(ret);
100+
return ret;
101+
}
102+
103+
int foo (int p) {
104+
p -= 10;
105+
return qux(p);
106+
}
107+
108+
int main(void) {
109+
int sum = 0;
110+
for (int i = 0; i < 1000 * 1000; i++) {
111+
sum += callback(&callbackCnt, foo, bar, i);
112+
}
113+
printf("callback count=%d, sum=%d\n", callbackCnt, sum);
114+
}

llvm/test/tools/llvm-profgen/inline-noprobe2.test

+5-2
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,11 @@
88
; RUN: llvm-profgen --format=extbinary --perfscript=%S/Inputs/inline-noprobe2.perfscript --binary=%S/Inputs/inline-noprobe2.perfbin --output=%t --populate-profile-symbol-list=1
99
; RUN: llvm-profdata show -show-prof-sym-list -sample %t | FileCheck %s --check-prefix=CHECK-SYM-LIST
1010

11-
; CHECK-ARTIFICIAL-BRANCH: 0
12-
; CHECK-ARTIFICIAL-BRANCH: 0
11+
; CHECK-ARTIFICIAL-BRANCH: 2
12+
; CHECK-ARTIFICIAL-BRANCH: 400870-400870:2
13+
; CHECK-ARTIFICIAL-BRANCH: 400875-4008bf:1
14+
; CHECK-ARTIFICIAL-BRANCH: 1
15+
; CHECK-ARTIFICIAL-BRANCH: 4008bf->400870:2
1316

1417
; CHECK-SYM-LIST: Dump profile symbol list
1518
; CHECK-SYM-LIST: main

llvm/tools/llvm-profgen/PerfReader.cpp

+67-15
Original file line numberDiff line numberDiff line change
@@ -51,18 +51,44 @@ namespace llvm {
5151
namespace sampleprof {
5252

5353
void VirtualUnwinder::unwindCall(UnwindState &State) {
54+
uint64_t Source = State.getCurrentLBRSource();
55+
// An artificial return should push an external frame and an artificial call
56+
// will match it and pop the external frame so that the context before and
57+
// after the external call will be the same.
58+
if (State.getCurrentLBR().IsArtificial) {
59+
NumExtCallBranch++;
60+
// A return is matched and pop the external frame.
61+
if (State.getParentFrame()->isExternalFrame()) {
62+
State.popFrame();
63+
} else {
64+
// An artificial return is missing, it happens that the sample is just hit
65+
// in the middle of the external code. In this case, the leading branch is
66+
// a call to external, we just keep unwinding use a context-less stack.
67+
if (State.getParentFrame() != State.getDummyRootPtr())
68+
NumMissingExternalFrame++;
69+
State.clearCallStack();
70+
State.pushFrame(Source);
71+
State.InstPtr.update(Source);
72+
return;
73+
}
74+
}
75+
76+
auto *ParentFrame = State.getParentFrame();
5477
// The 2nd frame after leaf could be missing if stack sample is
5578
// taken when IP is within prolog/epilog, as frame chain isn't
5679
// setup yet. Fill in the missing frame in that case.
5780
// TODO: Currently we just assume all the addr that can't match the
5881
// 2nd frame is in prolog/epilog. In the future, we will switch to
5982
// pro/epi tracker(Dwarf CFI) for the precise check.
60-
uint64_t Source = State.getCurrentLBRSource();
61-
auto *ParentFrame = State.getParentFrame();
62-
6383
if (ParentFrame == State.getDummyRootPtr() ||
6484
ParentFrame->Address != Source) {
6585
State.switchToFrame(Source);
86+
if (ParentFrame != State.getDummyRootPtr()) {
87+
if (State.getCurrentLBR().IsArtificial)
88+
NumMismatchedExtCallBranch++;
89+
else
90+
NumMismatchedProEpiBranch++;
91+
}
6692
} else {
6793
State.popFrame();
6894
}
@@ -118,6 +144,19 @@ void VirtualUnwinder::unwindReturn(UnwindState &State) {
118144
const LBREntry &LBR = State.getCurrentLBR();
119145
uint64_t CallAddr = Binary->getCallAddrFromFrameAddr(LBR.Target);
120146
State.switchToFrame(CallAddr);
147+
// Push an external frame for the case of returning to external
148+
// address(callback), later if an aitificial call is matched and it will be
149+
// popped up. This is to 1)avoid context being interrupted by callback,
150+
// context before or after the callback should be the same. 2) the call stack
151+
// of function called by callback should be truncated which is done during
152+
// recording the context on trie. For example:
153+
// main (call)--> foo (call)--> callback (call)--> bar (return)--> callback
154+
// (return)--> foo (return)--> main
155+
// Context for bar should not include main and foo.
156+
// For the code of foo, the context of before and after callback should both
157+
// be [foo, main].
158+
if (LBR.IsArtificial)
159+
State.pushFrame(ExternalAddr);
121160
State.pushFrame(LBR.Source);
122161
State.InstPtr.update(LBR.Source);
123162
}
@@ -180,7 +219,9 @@ template <typename T>
180219
void VirtualUnwinder::collectSamplesFromFrameTrie(
181220
UnwindState::ProfiledFrame *Cur, T &Stack) {
182221
if (!Cur->isDummyRoot()) {
183-
if (!Stack.pushFrame(Cur)) {
222+
// Truncate the context for external frame since this isn't a real call
223+
// context the compiler will see.
224+
if (Cur->isExternalFrame() || !Stack.pushFrame(Cur)) {
184225
// Process truncated context
185226
// Start a new traversal ignoring its bottom context
186227
T EmptyStack(Binary);
@@ -453,6 +494,21 @@ void HybridPerfReader::unwindSamples() {
453494
SampleCounters.size(),
454495
"of profiled contexts are truncated due to missing probe "
455496
"for call instruction.");
497+
498+
emitWarningSummary(
499+
Unwinder.NumMismatchedExtCallBranch, Unwinder.NumTotalBranches,
500+
"of branches'source is a call instruction but doesn't match call frame "
501+
"stack, likely due to unwinding error of external frame.");
502+
503+
emitWarningSummary(
504+
Unwinder.NumMismatchedProEpiBranch, Unwinder.NumTotalBranches,
505+
"of branches'source is a call instruction but doesn't match call frame "
506+
"stack, likely due to frame in prolog/epilog.");
507+
508+
emitWarningSummary(Unwinder.NumMissingExternalFrame,
509+
Unwinder.NumExtCallBranch,
510+
"of artificial call branches but doesn't have an external "
511+
"frame to match.");
456512
}
457513

458514
bool PerfScriptReader::extractLBRStack(TraceStream &TraceIt,
@@ -538,15 +594,6 @@ bool PerfScriptReader::extractLBRStack(TraceStream &TraceIt,
538594
break;
539595
}
540596

541-
if (Binary->addressIsReturn(Src)) {
542-
// In a callback case, a return from internal code, say A, to external
543-
// runtime can happen. The external runtime can then call back to
544-
// another internal routine, say B. Making an artificial branch that
545-
// looks like a return from A to B can confuse the unwinder to treat
546-
// the instruction before B as the call instruction.
547-
break;
548-
}
549-
550597
// For transition to external code, group the Source with the next
551598
// availabe transition target.
552599
Dst = PrevTrDst;
@@ -854,10 +901,15 @@ void PerfScriptReader::computeCounterFromLBR(const PerfSample *Sample,
854901
SampleCounter &Counter = SampleCounters.begin()->second;
855902
uint64_t EndOffeset = 0;
856903
for (const LBREntry &LBR : Sample->LBRStack) {
904+
assert(LBR.Source != ExternalAddr &&
905+
"Branch' source should not be an external address, it should be "
906+
"converted to aritificial branch.");
857907
uint64_t SourceOffset = Binary->virtualAddrToOffset(LBR.Source);
858-
uint64_t TargetOffset = Binary->virtualAddrToOffset(LBR.Target);
908+
uint64_t TargetOffset = LBR.Target == ExternalAddr
909+
? ExternalAddr
910+
: Binary->virtualAddrToOffset(LBR.Target);
859911

860-
if (!LBR.IsArtificial) {
912+
if (!LBR.IsArtificial && TargetOffset != ExternalAddr) {
861913
Counter.recordBranchCount(SourceOffset, TargetOffset, Repeat);
862914
}
863915

llvm/tools/llvm-profgen/PerfReader.h

+25-17
Original file line numberDiff line numberDiff line change
@@ -214,14 +214,6 @@ using AggregatedCounter =
214214

215215
using SampleVector = SmallVector<std::tuple<uint64_t, uint64_t, uint64_t>, 16>;
216216

217-
// The special frame addresses.
218-
enum SpecialFrameAddr {
219-
// Dummy root of frame trie.
220-
DummyRoot = 0,
221-
// Represent all the addresses outside of current binary.
222-
ExternalAddr = 1,
223-
};
224-
225217
// The state for the unwinder, it doesn't hold the data but only keep the
226218
// pointer/index of the data, While unwinding, the CallStack is changed
227219
// dynamicially and will be recorded as the context of the sample
@@ -313,6 +305,8 @@ struct UnwindState {
313305

314306
void popFrame() { CurrentLeafFrame = CurrentLeafFrame->Parent; }
315307

308+
void clearCallStack() { CurrentLeafFrame = &DummyTrieRoot; }
309+
316310
void initFrameTrie(const SmallVectorImpl<uint64_t> &CallStack) {
317311
ProfiledFrame *Cur = &DummyTrieRoot;
318312
for (auto Address : reverse(CallStack)) {
@@ -426,10 +420,8 @@ struct FrameStack {
426420
ProfiledBinary *Binary;
427421
FrameStack(ProfiledBinary *B) : Binary(B) {}
428422
bool pushFrame(UnwindState::ProfiledFrame *Cur) {
429-
// Truncate the context for external frame since this isn't a real call
430-
// context the compiler will see
431-
if (Cur->isExternalFrame())
432-
return false;
423+
assert(!Cur->isExternalFrame() &&
424+
"External frame's not expected for context stack.");
433425
Stack.push_back(Cur->Address);
434426
return true;
435427
}
@@ -446,10 +438,8 @@ struct ProbeStack {
446438
ProfiledBinary *Binary;
447439
ProbeStack(ProfiledBinary *B) : Binary(B) {}
448440
bool pushFrame(UnwindState::ProfiledFrame *Cur) {
449-
// Truncate the context for external frame since this isn't a real call
450-
// context the compiler will see
451-
if (Cur->isExternalFrame())
452-
return false;
441+
assert(!Cur->isExternalFrame() &&
442+
"External frame's not expected for context stack.");
453443
const MCDecodedPseudoProbe *CallProbe =
454444
Binary->getCallProbeForAddr(Cur->Address);
455445
// We may not find a probe for a merged or external callsite.
@@ -506,6 +496,12 @@ class VirtualUnwinder {
506496
bool unwind(const PerfSample *Sample, uint64_t Repeat);
507497
std::set<uint64_t> &getUntrackedCallsites() { return UntrackedCallsites; }
508498

499+
uint64_t NumTotalBranches = 0;
500+
uint64_t NumExtCallBranch = 0;
501+
uint64_t NumMissingExternalFrame = 0;
502+
uint64_t NumMismatchedProEpiBranch = 0;
503+
uint64_t NumMismatchedExtCallBranch = 0;
504+
509505
private:
510506
bool isCallState(UnwindState &State) const {
511507
// The tail call frame is always missing here in stack sample, we will
@@ -516,7 +512,19 @@ class VirtualUnwinder {
516512
bool isReturnState(UnwindState &State) const {
517513
// Simply check addressIsReturn, as ret is always reliable, both for
518514
// regular call and tail call.
519-
return Binary->addressIsReturn(State.getCurrentLBRSource());
515+
if (!Binary->addressIsReturn(State.getCurrentLBRSource()))
516+
return false;
517+
518+
// In a callback case, a return from internal code, say A, to external
519+
// runtime can happen. The external runtime can then call back to
520+
// another internal routine, say B. Making an artificial branch that
521+
// looks like a return from A to B can confuse the unwinder to treat
522+
// the instruction before B as the call instruction. Here we detect this
523+
// case if the return target is not the next inst of call inst, then we just
524+
// do not treat it as a return.
525+
uint64_t CallAddr =
526+
Binary->getCallAddrFromFrameAddr(State.getCurrentLBRTarget());
527+
return (CallAddr != 0);
520528
}
521529

522530
void unwindCall(UnwindState &State);

0 commit comments

Comments
 (0)