Skip to content

[ms] [llvm-ml] SIGSEGV in checkForValidSection in MasmParser #123189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MisterDA opened this issue Jan 16, 2025 · 5 comments · Fixed by #123355
Closed

[ms] [llvm-ml] SIGSEGV in checkForValidSection in MasmParser #123189

MisterDA opened this issue Jan 16, 2025 · 5 comments · Fixed by #123355
Labels
crash Prefer [crash-on-valid] or [crash-on-invalid] mc Machine (object) code

Comments

@MisterDA
Copy link

MisterDA commented Jan 16, 2025

I'm trying to cross-compile the OCaml compiler with a Debian host, targeting x86_64-pc-windows with clang-cl. I'm running into a segfault from llvm-ml (the MASM assembler), a drop-in replacement for Microsoft's ml64. I hit the issue with LLVM 18 and LLVM 20 (ea14bdb as the time of writing). Here is a reproducer, as a Dockerfile (build with docker build --platform linux/amd64 .), and the backtrace. I'm also attaching the offending asm file as a text file.

amd64nt.txt

# syntax=docker/dockerfile:1
FROM debian:experimental
ARG LLVM_VERSION=20

ENV DEBUGINFOD_URLS="https://debuginfod.debian.net"
COPY <<EOF /etc/apt/sources.list.d/debug.list
deb http://deb.debian.org/debian-debug/ experimental-debug main
EOF

RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
    --mount=type=cache,target=/var/lib/apt,sharing=locked \
    apt update && DEBIAN_FRONTEND=noninteractive apt upgrade -y && \
    DEBIAN_FRONTEND=noninteractive apt-get --no-install-recommends install -y \
    clang-$LLVM_VERSION clang-$LLVM_VERSION-dbgsym \
    clang-tools-$LLVM_VERSION clang-tools-$LLVM_VERSION-dbgsym \
    lld-$LLVM_VERSION lld-$LLVM_VERSION-dbgsym \
    llvm-$LLVM_VERSION llvm-$LLVM_VERSION-dbgsym \
    lldb-$LLVM_VERSION lldb-$LLVM_VERSION-dbgsym \
    make gdb
ADD --keep-git-dir --link https://github.com/ocaml/ocaml.git /root/ocaml
WORKDIR /root/ocaml

ENV LLVM_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer-$LLVM_VERSION

RUN clang-cl-20 -nologo -EP -TC runtime/caml/domain_state.tbl > runtime/domain_state.inc

# llvm-ml-20 -m64 dislikes parentheses on macro calls
RUN sed -e 's/(//g' -e 's/)//g' -i runtime/domain_state.inc

# llvm-ml-20 doesn't understand NEAR
RUN sed -E -e 's/(EXTRN.*):.*NEAR/\1:PROC/g' -i runtime/amd64nt.asm

RUN llvm-ml-20 -m64 -nologo -Iruntime -c -Foruntime/amd64nt.obj runtime/amd64nt.asm
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llvm-ml-20 -m64 -nologo -Iruntime -c -Foruntime/amd64nt.obj runtime/amd64nt.asm
 #0 0x00007ffff7fa117a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0x00007ffff7f9ed14 llvm::sys::RunSignalHandlers() build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Signals.cpp:106:18
 #2 0x00007ffff7fa182b SignalHandler build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x00007ffff6ac0da0 (/lib/x86_64-linux-gnu/libc.so.6+0x3fda0)
 #4 0x00007ffff98923a2 checkForValidSection build-llvm/tools/clang/stage2-bins/llvm/lib/MC/MCParser/MasmParser.cpp:1457:31
 #5 0x00007ffff9895133 parseStatement build-llvm/tools/clang/stage2-bins/llvm/lib/MC/MCParser/MasmParser.cpp:0:7
 #6 0x00007ffff988d2a5 Run build-llvm/tools/clang/stage2-bins/llvm/lib/MC/MCParser/MasmParser.cpp:0:0
 #7 0x000055555555d0f0 AssembleInput build-llvm/tools/clang/stage2-bins/llvm/tools/llvm-ml/llvm-ml.cpp:186:13
 #8 0x000055555555bc9a llvm_ml_main build-llvm/tools/clang/stage2-bins/llvm/tools/llvm-ml/llvm-ml.cpp:0:11
 #9 0x000055555555e45a main build-llvm/tools/clang/stage2-bins/build-llvm/tools/clang/stage2-bins/tools/llvm-ml/llvm-ml-driver.cpp:17:10
#10 0x00007ffff6aaad68 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#11 0x00007ffff6aaae25 call_init ./csu/../csu/libc-start.c:128:20
#12 0x00007ffff6aaae25 __libc_start_main ./csu/../csu/libc-start.c:347:5
#13 0x0000555555559d71 (/usr/lib/llvm-20/bin/llvm-ml+0x5d71)
Segmentation fault

bool MasmParser::checkForValidSection() {
if (!ParsingMSInlineAsm && !getStreamer().getCurrentSectionOnly()) {
Out.initSections(false, getTargetParser().getSTI());
return Error(getTok().getLoc(),
"expected section directive before assembly directive");
}
return false;
}

Presumably getStreamer() returns a nullptr.

It's possibly similar to #97635, I'll ping the participants: @sivan-shani @MaskRay.
Thanks for any help!

@MisterDA MisterDA changed the title [MASM] SIGSEGV in checkForValidSection in MasmParser [ms] [llvm-ml] SIGSEGV in checkForValidSection in MasmParser Jan 16, 2025
@MisterDA
Copy link
Author

Taking the liberty of pinging @ericastor too as you've committed this code.

@sivan-shani
Copy link
Contributor

#97635 ended up not being an actual issue in llvm.
It was an external tool that used some llvm libs, the issue was not in those libs but in the tool itself.

@EugeneZelenko EugeneZelenko added mc Machine (object) code crash Prefer [crash-on-valid] or [crash-on-invalid] and removed new issue labels Jan 16, 2025
@ericastor
Copy link
Contributor

Thanks! It looks like this was missed in db48f1a, which changed the safety of calling getCurrentSectionOnly. I think I have a fix, and will put it through shortly!

Once that's fixed, it looks like you're still going to run into at least a couple of issues.

  1. Your sed command (trying to compensate for the lack of support for optional parentheses on macros) is slightly incorrect, and results in lines like DOMAIN_STATEatomic_uintnat, young_limit. You'll need to replace at least the opening parenthesis with a space character, to prevent issues like this. Alternatively, I'd be happy to accept a patch that adds optional-parenthesis support! (Looks like I had a draft once upon a time... if I have the time, I may try to revive it.)
  2. It seems you're going to hit the problem that we haven't supported the @CatStr built-in just yet. This is a TODO that could use to be done... no chance you have the capacity to contribute? Otherwise, I'm not sure how fast that'll happen.

Thanks for finding another practical use case for llvm-ml - I for one am happy to accept bug reports & contributions!

@MisterDA
Copy link
Author

Hi! Thanks a lot for the quick answer and fix!
Thanks for noticing my flaky sed, I'll change it.

Unfortunately, I don't think I have enough time to invest into patching LLVM (or learning MASM, LLVM internals, and C++)… we'll see. Note for later: @CatStr macro function.

Incidentally, does this make sense to you? Do we still need NEAR/FAR for amd64?

# llvm-ml-20 doesn't understand NEAR
sed -E -e 's/(EXTRN.*):.*NEAR/\1:PROC/g' -i runtime/amd64nt.asm

@ericastor
Copy link
Contributor

I haven't had occasion to see a case where NEAR/FAR mattered... so I haven't gone through & implemented all the variations on that. Unfortunately, MASM is less well-documented than we might like, so it takes work (and a motivating example!) to figure out its exact semantics.

For example, the NEAR question comes down to: exactly what do you expect in the binary when you CALL an EXTRN id : NEAR vs an EXTRN id : PROC? How should they act (on both 64- and 32-bit targets)? That's a thing we could test & try to match, though. I get the issue of not having time available to do it... but someone needs to do it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash Prefer [crash-on-valid] or [crash-on-invalid] mc Machine (object) code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants