Skip to content

Disallow exports that are not valid C/C++ identifiers #23563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 3, 2025

Conversation

sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Jan 31, 2025

I use std.isdentifier() here from python since it has the same rules for valid identifiers are C/C++.

We could try instead to allow this but I'm not sure it worth it given that our inputs are almost exclusively compiled from C/C++/rust.

See #23560

@dschuff
Copy link
Member

dschuff commented Jan 31, 2025

as I mentioned in #23560 it's actually JS identifier rules that matter here, not C; are they different?

@sbc100
Copy link
Collaborator Author

sbc100 commented Jan 31, 2025

as I mentioned in #23560 it's actually JS identifier rules that matter here, not C; are they different?

JS identifier rules are a bit more relaxed. You can do $foo for example. I think we want to stick to the more strict C/C++ rules. I don't see any reason to allow anything else at this point.

@sbc100
Copy link
Collaborator Author

sbc100 commented Jan 31, 2025

as I mentioned in #23560 it's actually JS identifier rules that matter here, not C; are they different?

JS identifier rules are a bit more relaxed. You can do $foo for example. I think we want to stick to the more strict C/C++ rules. I don't see any reason to allow anything else at this point.

We can always relax things later if had the need to.

I use `std.isdentifier()` here from python since it has the same rules
for valid identifiers are C/C++.

We could try instead to allow this but I'm not sure it worth it given
that our inputs are almost exclusively compiled from C/C++/rust.

See emscripten-core#23560
@sbc100 sbc100 enabled auto-merge (squash) February 1, 2025 00:13
@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 1, 2025

I propose that we land this and then consider expanding later.

@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 3, 2025

OK to land this?

@sbc100 sbc100 merged commit 08d2cc3 into emscripten-core:main Feb 3, 2025
29 checks passed
@sbc100 sbc100 deleted the invalid_exports branch February 3, 2025 21:17
@hoodmane
Copy link
Collaborator

hoodmane commented Mar 31, 2025

This change broke rust linking. Apparently Rust likes to put . in its symbols? For instance, when linking bcrypt the following symbols are present:

Details

_ZN102_$LT$std..panicking..begin_panic_handler..FormatStringPayload$u20$as$u20$core..panic..PanicPayload$GT$3get17h30a3dc6781b1270fE
_ZN102_$LT$std..panicking..begin_panic_handler..FormatStringPayload$u20$as$u20$core..panic..PanicPayload$GT$8take_box17h8daf4c01572178f0E
_ZN41_$LT$char$u20$as$u20$core..fmt..Debug$GT$3fmt17h3cdc8398142e4ad0E
_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h2f91a7b408b120faE
_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$u16$GT$3fmt17h0ce69c07cdda4743E
_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$u32$GT$3fmt17h8d4e5e4a3c003025E
_ZN4core3fmt3num3imp54_$LT$impl$u20$core..fmt..Display$u20$for$u20$usize$GT$3fmt17h3f7cd8ccfbfa7191E
_ZN4core3fmt3num55_$LT$impl$u20$core..fmt..LowerHex$u20$for$u20$usize$GT$3fmt17h00fcc06ba2aed218E
_ZN53_$LT$pyo3..err..PyErr$u20$as$u20$core..fmt..Debug$GT$3fmt17hb070043817a867a6E
_ZN54_$LT$bcrypt..Version$u20$as$u20$core..fmt..Display$GT$3fmt17h42a9296e7a7df387E
_ZN58_$LT$std..io..error..Error$u20$as$u20$core..fmt..Debug$GT$3fmt17h1aa29bb7bbc2eeb5E
_ZN59_$LT$core..fmt..Arguments$u20$as$u20$core..fmt..Display$GT$3fmt17h76e6ce769a6f4265E
_ZN60_$LT$getrandom..error..Error$u20$as$u20$core..fmt..Debug$GT$3fmt17h903a4f5113c0f72cE
_ZN60_$LT$std..io..error..Error$u20$as$u20$core..fmt..Display$GT$3fmt17hd1de43a7c5657cf1E
_ZN63_$LT$core..cell..BorrowMutError$u20$as$u20$core..fmt..Debug$GT$3fmt17h377dbb2be6b357a9E
_ZN68_$LT$core..fmt..builders..PadAdapter$u20$as$u20$core..fmt..Write$GT$10write_char17h96b8a02c7ab3b3efE
_ZN68_$LT$core..fmt..builders..PadAdapter$u20$as$u20$core..fmt..Write$GT$9write_str17h8882f4b64c6e1b68E
_ZN69_$LT$std..sys..pal..unix..stdio..Stderr$u20$as$u20$std..io..Write$GT$14write_vectored17h994d9285b1819824E
_ZN69_$LT$std..sys..pal..unix..stdio..Stderr$u20$as$u20$std..io..Write$GT$5write17hf7b76c928cd9d57cE
_ZN89_$LT$std..panicking..rust_panic_without_hook..RewrapBox$u20$as$u20$core..fmt..Display$GT$3fmt17h5ba14cab399dd494E
_ZN92_$LT$std..panicking..begin_panic_handler..StaticStrPayload$u20$as$u20$core..fmt..Display$GT$3fmt17h7a4e0a698b535eefE
_ZN95_$LT$std..panicking..begin_panic_handler..FormatStringPayload$u20$as$u20$core..fmt..Display$GT$3fmt17hb8edd6215e5e1644E
_ZN96_$LT$std..panicking..rust_panic_without_hook..RewrapBox$u20$as$u20$core..panic..PanicPayload$GT$8take_box17hfba217f1a3c9d613E
_ZN98_$LT$std..sys..backtrace..BacktraceLock..print..DisplayBacktrace$u20$as$u20$core..fmt..Display$GT$3fmt17hfea12c1bdc8f45e8E
_ZN99_$LT$std..panicking..begin_panic_handler..StaticStrPayload$u20$as$u20$core..panic..PanicPayload$GT$8take_box17h4e42b4f6003a02d5E

@sbc100
Copy link
Collaborator Author

sbc100 commented Mar 31, 2025

And do those symbols need to be exported?

One problem is that we cannot create JS symbols with names like so we would have to figure out what to when these symbols are exported. How are these symbols been exported exactly?

@hoodmane
Copy link
Collaborator

It's linking with -sSIDE_MODULE=2 -sEXPORTED_FUNCTIONS=["_PyInit__bcrypt"]. So no I don't think they are intended as exports.

@hoodmane
Copy link
Collaborator

llvm-readobj shows they have VISIBILITY_HIDDEN:

  Symbol {
    Name: _ZN44_$LT$$RF$T$u20$as$u20$core..fmt..Display$GT$3fmt17h6faa50f40609d0dcE
    Type: FUNCTION (0x0)
    Flags [ (0x4)
      VISIBILITY_HIDDEN (0x4)
    ]
    ElementIndex: 0x37
  }

@sbc100
Copy link
Collaborator Author

sbc100 commented Mar 31, 2025

llvm-readobj shows they have VISIBILITY_HIDDEN:

  Symbol {
    Name: _ZN44_$LT$$RF$T$u20$as$u20$core..fmt..Display$GT$3fmt17h6faa50f40609d0dcE
    Type: FUNCTION (0x0)
    Flags [ (0x4)
      VISIBILITY_HIDDEN (0x4)
    ]
    ElementIndex: 0x37
  }

Strange, with visibility=hidden i would not expect that linker to export the symbol. Any idea whats up with that?

@hoodmane
Copy link
Collaborator

Nope, I will investigate when I have a chance and see if I can figure out what's happening. Since with -sSIDE_MODULE=2 -sEXPORTED_FUNCTIONS=["_PyInit__bcrypt"] and the symbols being marked hidden in the objects, we seem to have asked twice not to export them.

@hoodmane
Copy link
Collaborator

They are definitely showing up as exports in the final linked so:

Details

_bcrypt.cpython-313-wasm32-emscripten.so:	file format wasm 0x1

Section Details:

Export[36]:
 - func[89] <__wasm_call_ctors> -> "__wasm_call_ctors"
 - global[51] -> "__rust_no_alloc_shim_is_unstable"
 - global[52] -> "_ZN5alloc4sync18STATIC_INNER_SLICE17h440601ee98d0c01bE"
 - func[177] <_ZN4core3fmt3num3imp54_$LT$impl$u20$core..fmt..Display$u20$for$u20$usize$GT$3fmt17h3f7cd8ccfbfa7191E> -> "_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$u32$GT$3fmt17h8d4e5e4a3c003025E"
 - func[117] <_ZN54_$LT$bcrypt..Version$u20$as$u20$core..fmt..Display$GT$3fmt17h42a9296e7a7df387E> -> "_ZN54_$LT$bcrypt..Version$u20$as$u20$core..fmt..Display$GT$3fmt17h42a9296e7a7df387E"
 - func[176] <_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$u16$GT$3fmt17h0ce69c07cdda4743E> -> "_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$u16$GT$3fmt17h0ce69c07cdda4743E"
 - func[212] <_ZN60_$LT$getrandom..error..Error$u20$as$u20$core..fmt..Debug$GT$3fmt17h903a4f5113c0f72cE> -> "_ZN60_$LT$getrandom..error..Error$u20$as$u20$core..fmt..Debug$GT$3fmt17h903a4f5113c0f72cE"
 - func[295] <_ZN58_$LT$std..io..error..Error$u20$as$u20$core..fmt..Debug$GT$3fmt17h1aa29bb7bbc2eeb5E> -> "_ZN58_$LT$std..io..error..Error$u20$as$u20$core..fmt..Debug$GT$3fmt17h1aa29bb7bbc2eeb5E"
 - func[146] <PyInit__bcrypt> -> "PyInit__bcrypt"
 - func[160] <_ZN4core3fmt3num55_$LT$impl$u20$core..fmt..LowerHex$u20$for$u20$usize$GT$3fmt17h00fcc06ba2aed218E> -> "_ZN4core3fmt3num55_$LT$impl$u20$core..fmt..LowerHex$u20$for$u20$usize$GT$3fmt17h00fcc06ba2aed218E"
 - func[163] <_ZN59_$LT$core..fmt..Arguments$u20$as$u20$core..fmt..Display$GT$3fmt17h76e6ce769a6f4265E> -> "_ZN59_$LT$core..fmt..Arguments$u20$as$u20$core..fmt..Display$GT$3fmt17h76e6ce769a6f4265E"
 - func[171] <_ZN41_$LT$char$u20$as$u20$core..fmt..Debug$GT$3fmt17h3cdc8398142e4ad0E> -> "_ZN41_$LT$char$u20$as$u20$core..fmt..Debug$GT$3fmt17h3cdc8398142e4ad0E"
 - func[179] <_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h2f91a7b408b120faE> -> "_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h2f91a7b408b120faE"
 - func[177] <_ZN4core3fmt3num3imp54_$LT$impl$u20$core..fmt..Display$u20$for$u20$usize$GT$3fmt17h3f7cd8ccfbfa7191E> -> "_ZN4core3fmt3num3imp54_$LT$impl$u20$core..fmt..Display$u20$for$u20$usize$GT$3fmt17h3f7cd8ccfbfa7191E"
 - func[201] <_ZN68_$LT$core..fmt..builders..PadAdapter$u20$as$u20$core..fmt..Write$GT$9write_str17h8882f4b64c6e1b68E> -> "_ZN68_$LT$core..fmt..builders..PadAdapter$u20$as$u20$core..fmt..Write$GT$9write_str17h8882f4b64c6e1b68E"
 - func[202] <_ZN68_$LT$core..fmt..builders..PadAdapter$u20$as$u20$core..fmt..Write$GT$10write_char17h96b8a02c7ab3b3efE> -> "_ZN68_$LT$core..fmt..builders..PadAdapter$u20$as$u20$core..fmt..Write$GT$10write_char17h96b8a02c7ab3b3efE"
 - func[207] <_ZN63_$LT$core..cell..BorrowMutError$u20$as$u20$core..fmt..Debug$GT$3fmt17h377dbb2be6b357a9E> -> "_ZN63_$LT$core..cell..BorrowMutError$u20$as$u20$core..fmt..Debug$GT$3fmt17h377dbb2be6b357a9E"
 - global[53] -> "__rust_alloc_error_handler_should_panic"
 - global[54] -> "_ZN3std6thread7current7CURRENT17h814f5da4a4d899fdE"
 - func[236] <_ZN53_$LT$pyo3..err..PyErr$u20$as$u20$core..fmt..Debug$GT$3fmt17hb070043817a867a6E> -> "_ZN53_$LT$pyo3..err..PyErr$u20$as$u20$core..fmt..Debug$GT$3fmt17hb070043817a867a6E"
 - global[55] -> "_ZN3std9panicking11panic_count18GLOBAL_PANIC_COUNT17h3218074ee4dae193E"
 - func[296] <_ZN60_$LT$std..io..error..Error$u20$as$u20$core..fmt..Display$GT$3fmt17hd1de43a7c5657cf1E> -> "_ZN60_$LT$std..io..error..Error$u20$as$u20$core..fmt..Display$GT$3fmt17hd1de43a7c5657cf1E"
 - global[56] -> "_ZN3std6thread7current2id2ID17h3489c73247d6251aE"
 - global[57] -> "_ZN3std2io5stdio6stderr8INSTANCE17h732fc941f19950d3E"
 - func[332] <_ZN98_$LT$std..sys..backtrace..BacktraceLock..print..DisplayBacktrace$u20$as$u20$core..fmt..Display$GT$3fmt17hfea12c1bdc8f45e8E> -> "_ZN98_$LT$std..sys..backtrace..BacktraceLock..print..DisplayBacktrace$u20$as$u20$core..fmt..Display$GT$3fmt17hfea12c1bdc8f45e8E"
 - func[367] <_ZN92_$LT$std..panicking..begin_panic_handler..StaticStrPayload$u20$as$u20$core..fmt..Display$GT$3fmt17h7a4e0a698b535eefE> -> "_ZN92_$LT$std..panicking..begin_panic_handler..StaticStrPayload$u20$as$u20$core..fmt..Display$GT$3fmt17h7a4e0a698b535eefE"
 - func[366] <_ZN99_$LT$std..panicking..begin_panic_handler..StaticStrPayload$u20$as$u20$core..panic..PanicPayload$GT$8take_box17h4e42b4f6003a02d5E> -> "_ZN99_$LT$std..panicking..begin_panic_handler..StaticStrPayload$u20$as$u20$core..panic..PanicPayload$GT$8take_box17h4e42b4f6003a02d5E"
 - func[365] <_ZN95_$LT$std..panicking..begin_panic_handler..FormatStringPayload$u20$as$u20$core..fmt..Display$GT$3fmt17hb8edd6215e5e1644E> -> "_ZN95_$LT$std..panicking..begin_panic_handler..FormatStringPayload$u20$as$u20$core..fmt..Display$GT$3fmt17hb8edd6215e5e1644E"
 - func[363] <_ZN102_$LT$std..panicking..begin_panic_handler..FormatStringPayload$u20$as$u20$core..panic..PanicPayload$GT$8take_box17h8daf4c01572178f0E> -> "_ZN102_$LT$std..panicking..begin_panic_handler..FormatStringPayload$u20$as$u20$core..panic..PanicPayload$GT$8take_box17h8daf4c01572178f0E"
 - func[364] <_ZN102_$LT$std..panicking..begin_panic_handler..FormatStringPayload$u20$as$u20$core..panic..PanicPayload$GT$3get17h30a3dc6781b1270fE> -> "_ZN102_$LT$std..panicking..begin_panic_handler..FormatStringPayload$u20$as$u20$core..panic..PanicPayload$GT$3get17h30a3dc6781b1270fE"
 - func[340] <_ZN69_$LT$std..sys..pal..unix..stdio..Stderr$u20$as$u20$std..io..Write$GT$5write17hf7b76c928cd9d57cE> -> "_ZN69_$LT$std..sys..pal..unix..stdio..Stderr$u20$as$u20$std..io..Write$GT$5write17hf7b76c928cd9d57cE"
 - func[341] <_ZN69_$LT$std..sys..pal..unix..stdio..Stderr$u20$as$u20$std..io..Write$GT$14write_vectored17h994d9285b1819824E> -> "_ZN69_$LT$std..sys..pal..unix..stdio..Stderr$u20$as$u20$std..io..Write$GT$14write_vectored17h994d9285b1819824E"
 - global[58] -> "_ZN3std9panicking4HOOK17h377c113085c16d7fE"
 - func[370] <_ZN96_$LT$std..panicking..rust_panic_without_hook..RewrapBox$u20$as$u20$core..panic..PanicPayload$GT$8take_box17hfba217f1a3c9d613E> -> "_ZN96_$LT$std..panicking..rust_panic_without_hook..RewrapBox$u20$as$u20$core..panic..PanicPayload$GT$8take_box17hfba217f1a3c9d613E"
 - func[371] <_ZN89_$LT$std..panicking..rust_panic_without_hook..RewrapBox$u20$as$u20$core..fmt..Display$GT$3fmt17h5ba14cab399dd494E> -> "_ZN89_$LT$std..panicking..rust_panic_without_hook..RewrapBox$u20$as$u20$core..fmt..Display$GT$3fmt17h5ba14cab399dd494E"
 - func[90] <__wasm_apply_data_relocs> -> "__wasm_apply_data_relocs"

@sbc100
Copy link
Collaborator Author

sbc100 commented Mar 31, 2025

Perhaps they are marked with an explicit "export_name" in the linking section of the object file. Perhaps that is taking precedence over the visibility=hidden but perhaps it should not?

@hoodmane
Copy link
Collaborator

I'll try to see. The first step is to minimize the reproducer. I bet it'll fail even for a hello world style example.

@hoodmane
Copy link
Collaborator

hoodmane commented May 1, 2025

@sbc100 could we revert this in the meantime to unbreak rust?

@sbc100
Copy link
Collaborator Author

sbc100 commented May 1, 2025

But wouldn't that result in invalid JS being genrated? i.e. all the unexpected exports are exported as JS symbols, and this is an invalid JS symbols?

Or is this only when generating a side module? I suppose to would limit the error to when we are actually generating JS (i.e. not when building a side module)?

@hoodmane
Copy link
Collaborator

hoodmane commented May 2, 2025

Yes, limiting the check to when not building a side module would work for our purposes.

@sbc100
Copy link
Collaborator Author

sbc100 commented May 2, 2025

Yes, limiting the check to when not building a side module would work for our purposes.

If you would like to submit such a PR, I think would be acceptable. I would like to see this investigated more (i.e. please include a comment and link to a bug) and fixed at some point though since those symbols should not be exported, should they?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants