-
Notifications
You must be signed in to change notification settings - Fork 412
[Parallel Router] Random strong test failures #3029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@AlexandreSinger @ueqri Any ideas why this might have happened? Thank a lot! |
Update: Just re-ran the test for the PR and it was successful. Update 2: Same tests failed in master branch of my fork (Same as VTR master). |
@AmirhosseinPoolad Hi Amir, I am investigating the test failures you mentioned. I have checked the stdout log files of the failed tests in the two action runs:
(Updated) It turned out to be segmentation fault issues (program crashed with signal 11) in CI logs. Additionally, I ran these specific tests locally on wintermute 70 times with no errors or failures occurring. I highly doubt that the issue might be related to the GitHub runner environment. If there are any GitHub runner docs I should refer to (for the limitation/restrictions) or more context other than the test log files, please let me know and I will try to fix those tests based on that info. Also, could you please pin me the next time this happens? Hopefully that would provide more helpful context for debugging. |
From looking at the log, there's nothing very strange about the command line or this circuit. @ueqri : if this issue can't be resolved quickly, we should temporarily comment out this test in CI, and can reactivate it once it is stable. |
@ueqri What is the status on this? Should we temporarily comment this test out? It is causing random failures. |
…outer Temporarily disabled the `strong_multiclock` test in `vtr_reg_strong` CI regression tests for the parallel connection router, due to some random failures as mentioned in Issue verilog-to-routing#3029. After fixing the problem with the `strong_multiclock` test, this will be reactivated.
Still investigating the issue. I tried ThreadSanitizer but nothing particularly interesting was found. Only a few data races were detected (e.g.,
I am currently following the clue provided by Vaughn to see if anything interesting can be found. We can comment out this test for now in the meanwhile (PR #3047 created). Also, since the CI reproduces these random failures so frequently (I cannot reproduce it even once locally after running ~100 times), it might be worth building an identical CI environment myself and run tests inside that environment to catch the seg faults with gdb hopefully. |
Strong tests for parallel routing can randomly fail. Here's the failed CI run for a PR that did not touch VTR's code in any way and only changed an unrelated workflow file:
https://github.com/verilog-to-routing/vtr-verilog-to-routing/actions/runs/14929980706/job/41943464845#step:8:3197
There was a run failure and some QoR failures in some of the parallel router tests. When I ran the strong tests for the branch locally it successfully worked without any run or QoR failures, and VTR master also doesn't have any strong test failures.
The text was updated successfully, but these errors were encountered: