-
Notifications
You must be signed in to change notification settings - Fork 267
Interactive problems checkers are not very robust #465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I looked into this yesterday evening a bit. We do kill the submission but it ends up in defunct/zombie state. This is because we don't call wait/waitpid. After adding more debug statements, I found that we hang in either slice/read (depending on what type is used). So my current idea is the following: The pstree looks like this:
@eldering I did look for the pipe picture of how we set up judging for interactive problems, but haven't found it. Do you know where it lives? |
I think I never committed that image, but just sent it to domjudge-devel. See attached. Oh, and the |
Thanks for the image, that looks roughly like what I had in mind. |
I investigated a bit more. In the runpipe we wait until both children have exited (note that not the team submission is one of them but runguard). After that we would close the pipes / file descriptors. But the children didn't exit so far. In runguard, we killed the team submission which is in zombie state because we didn't wait for it yet. We do hang on a blocking read, so we don't reach the waitpid which would allow us to terminate ourselves. My current (not well thought out) guess is that we should merge runpipe into runguard. |
@eldering what is this code supposed to do?https://github.com/DOMjudge/domjudge/blob/master/judge/runpipe.c#L252-L253 Anyway, Jaap mentioned in IRC that merging runpipe into runguard is a lot of change which we probably don't want if we can avoid it.
@eldering, since you are clearly the best person to fix this, I'll assign the bug to you :-) |
This fixes a bug with interactive problems where the judgedaemon got stuck on simple forever loops. Partly addresses DOMjudge#465.
This fixes a bug with interactive problems where the judgedaemon got stuck on simple forever loops. Partly addresses #465.
See http://www.problemarchive.org/wiki/index.php/Output_validator and https://github.com/Kattis/problemtools/blob/develop/examples/guess/output_validators/guess_validator/validate.cc Usage and UI will come in follow-up commits. Part of #465.
This basically means that we use the combined run and compare script. (The old method with separate scripts is still supported.) Note that we don't create any compile helpful run scripts yet that call runpipe. This currently has to be done by the jury. Part of #465.
There's still a difference in judging compared to problemtools which needs further discussion: What's the correct verdict if a submission is both TLE and WA and the WA is determined before the time limit? For non-interactive problems, the answer is a clear TLE, but problemtools returns WA for those cases. Part of DOMjudge#465.
There's still a difference in judging compared to problemtools which needs further discussion: What's the correct verdict if a submission is both TLE and WA and the WA is determined before the time limit? For non-interactive problems, the answer is a clear TLE, but problemtools returns WA for those cases. Part of DOMjudge#465.
There's still a difference in judging compared to problemtools which needs further discussion: What's the correct verdict if a submission is both TLE and WA and the WA is determined before the time limit? For non-interactive problems, the answer is a clear TLE, but problemtools returns WA for those cases. Part of #465.
…, override TLE. See Kattis/problemtools#77 for an in-depth discussion. With this, all test submissions of the example guess problem give the expected answer, but we need to do some cleanup and UI/import changes before we can close DOMjudge#465.
Thanks GitHub. Before we can close is not the same as actually close it now ;-) |
This fixes a bug with interactive problems where the judgedaemon got stuck on simple forever loops. Partly addresses #465.
This will give us more test coverage, especially for the new interactive problem format, see DOMjudge#465. Also, we're testing zipped problem import with it.
This will give us more test coverage, especially for the new interactive problem format, see DOMjudge#465. Also, we're testing zipped problem import with it.
This will give us more test coverage, especially for the new interactive problem format, see DOMjudge#465. Also, we're testing zipped problem import with it.
This will give us more test coverage, especially for the new interactive problem format, see #465. Also, we're testing zipped problem import with it.
Update: this is now mostly solved and interactive problems work in the problemtools format. There is still room for improvement, but I don't know if any functional changes that are still to do. |
So then the issue can be closed? |
I'll do some more testing (probably to evening) and close it afterwards. |
I've done more testing and integrated the problemtools example in our travis test. I also sent Kattis/problemtools#112 |
Description of the problem / feature request
Support interactive problems properly, preferably by supporting the problemtools format, see https://github.com/Kattis/problemtools/tree/develop/examples/guess for an example similar to our own boolfind.
Currently it's easy to block a judgedaemon or produce "incorrect" verdicts because of how we separate running from validation.
We should consider adding a proper wrapper to both runner / validator scripts during problem import to make setting up these problems as easy as possible.
Steps to reproduce
Example of a incorrect/malicious submission:
Expected behaviour
Wrong answer.
Actual behaviour
Judgedaemon is hanging.
Any other information that you want to share?
There are other problematic submissions as well, I'll add examples in a later post when I got them.
The text was updated successfully, but these errors were encountered: