-
Notifications
You must be signed in to change notification settings - Fork 20.8k
Geth crashes on Holesky when trying to add invalid block after syncing good branch #31320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you define what you mean by "geth crashes"? Or is the issue that geth is stuck not able to reorg? |
It exited with exit code 1, see logs: https://github.com/user-attachments/files/19088525/Terminal.Saved.Output.txt.zip |
Oh I see now, the failed to rollback state is a CRIT. Let me look into it. |
I think this is a rollback before the snapshot point, thats why we crash. In this case we can't rollback, so crashing might be the correct behavior in this case? But in longer non-finality, the CL MUST NOT reorg back to a chain before our snapshot, so I'm wondering how to best handle this |
Well, with crashes / exits, the CL has no idea to identify what's going on. And the error message that does not indicate what the user should do is not helping. Is it: delete datadir and restart Geth with In any case, the current snapsync behaviour means that Geth treats the initially synced state as a finalized state. Which is wrong in any situation where that initial state later gets orphaned, as any of its parent states cannot be restored, including non-finalized ones. Would it make more sense to change the initial sync target to the It is tricky, for sure. |
The problem is that we need nodes to have the state at the finalizedBlockHash available for syncing, otherwise the snap sync will never finish. I agree that the message should be different and maybe we should just return |
Raising the 90k limit doesn't help in this situation / is not being hit here. That limit only applies if the initially synced state was ≤ finalized. Here, in principle a random non-canonical branch got initially synced (depth doesn't matter, could be just a single orphaned block), and then the canonical branch (not descending from the non-canonical branch) was discovered later and switched to after the initial sync to the orphaned block/state completed. Perpetual SYNCING is also tricky to resolve. The CL eventually needs an INVALID signal to switch away from the bad branch (ideally on the newPayload if it's already known to be invalid at that point, like nethermind does it). With SYNCING, CL will just keep requesting the same head as dictated by fork choice. |
System information
Geth version:
geth --holesky --log.vmodule='rpc=5'
CL client & version: status-im/nimbus-eth2 v25.3.0
ethpandaops/nimbus-eth2:splitview
nimbus_beacon_node \ --data-dir="$HOME/Downloads/nimbus/data/holesky" \ --network=holesky \ --rest \ --tcp-port=9010 \ --udp-port=9010 \ --history=prune \ --el=http://127.0.0.1:8551 \ --jwt-secret=/Users/etan/Library/Ethereum/holesky/geth/jwtsecret \ --max-peers=80
OS & Version:
Commit hash : installed via Home brew
Expected behaviour
Geth should just say that block is invalid without crashing
Actual behaviour
Geth sometime crashes when trying to add the invalid block on Holesky
Steps to reproduce the behaviour
Geth is started with
geth --holesky --log.vmodule='rpc=5'
No idea how to reproduce, but logs attached. Geth was on the canonical branch, but then a peer provided the invalid branch to Nimbus, and Nimbus tried to validate it against Geth. Note that Nimbus does not randomly blacklist hardcoded blocks.
Terminal Saved Output.txt.zip
Backtrace
The text was updated successfully, but these errors were encountered: