Skip to content

Interactive problems checkers are not very robust #465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
meisterT opened this issue Nov 26, 2018 · 10 comments · Fixed by #484
Closed

Interactive problems checkers are not very robust #465

meisterT opened this issue Nov 26, 2018 · 10 comments · Fixed by #484

Comments

@meisterT
Copy link
Member

Description of the problem / feature request

Support interactive problems properly, preferably by supporting the problemtools format, see https://github.com/Kattis/problemtools/tree/develop/examples/guess for an example similar to our own boolfind.

Currently it's easy to block a judgedaemon or produce "incorrect" verdicts because of how we separate running from validation.
We should consider adding a proper wrapper to both runner / validator scripts during problem import to make setting up these problems as easy as possible.

Steps to reproduce

Example of a incorrect/malicious submission:

int main() {
	while (true) {
		cout << "READ 0" << endl;
	}
	return 0;
}

Expected behaviour

Wrong answer.

Actual behaviour

Judgedaemon is hanging.

Any other information that you want to share?

There are other problematic submissions as well, I'll add examples in a later post when I got them.

@meisterT
Copy link
Member Author

meisterT commented Dec 4, 2018

I looked into this yesterday evening a bit. We do kill the submission but it ends up in defunct/zombie state. This is because we don't call wait/waitpid. After adding more debug statements, I found that we hang in either slice/read (depending on what type is used).

So my current idea is the following:
While processing output from the program, we receive SIGALRM in the parent process and then process it. We do a SA_RESTART, so we restart the system call after handling the signal. This then seems to block indefinitely.

The pstree looks like this:

php(13445)───sh(13566)───testcase_run.sh(13567)───runpipe(13628)─┬─runjury(13629)
                                                                                        └─sudo(13630,root)───runguard(13631)

@eldering I did look for the pipe picture of how we set up judging for interactive problems, but haven't found it. Do you know where it lives?

@eldering
Copy link
Member

eldering commented Dec 4, 2018

new-io-redirection.pdf

I think I never committed that image, but just sent it to domjudge-devel. See attached.

Oh, and the runpipe helper that's executed by the run script is missing here.

@eldering eldering closed this as completed Dec 4, 2018
@meisterT
Copy link
Member Author

meisterT commented Dec 4, 2018

Thanks for the image, that looks roughly like what I had in mind.
Since adding an image doesn't fix the bug, I'll reopen it ;-)

@meisterT meisterT reopened this Dec 4, 2018
@meisterT
Copy link
Member Author

meisterT commented Dec 6, 2018

I investigated a bit more.

In the runpipe we wait until both children have exited (note that not the team submission is one of them but runguard). After that we would close the pipes / file descriptors. But the children didn't exit so far.

In runguard, we killed the team submission which is in zombie state because we didn't wait for it yet. We do hang on a blocking read, so we don't reach the waitpid which would allow us to terminate ourselves.

My current (not well thought out) guess is that we should merge runpipe into runguard.

@meisterT
Copy link
Member Author

meisterT commented Dec 8, 2018

@eldering what is this code supposed to do?https://github.com/DOMjudge/domjudge/blob/master/judge/runpipe.c#L252-L253
I don't get why we do a blocking wait right after the non-blocking wait, but perhaps I'm missing something.

Anyway, Jaap mentioned in IRC that merging runpipe into runguard is a lot of change which we probably don't want if we can avoid it.
So I toss another idea around:

  • add option --notifyontle <pid> to runguard
  • special case runpipe to make it aware which command is which and add own pid to runguard's arguments
  • send SIGUSR1 from runguard to runpipe when we run into SIGALRM
  • do the "right thing" in runpipe on SIGUSR1 receival, i.e. probably close all fds, so that the normal flow in runguard that keeps reading from the pipes can finish

@eldering, since you are clearly the best person to fix this, I'll assign the bug to you :-)

meisterT added a commit to meisterT/domjudge that referenced this issue Dec 30, 2018
This fixes a bug with interactive problems where the judgedaemon got
stuck on simple forever loops.

Partly addresses DOMjudge#465.
meisterT added a commit that referenced this issue Dec 30, 2018
This fixes a bug with interactive problems where the judgedaemon got
stuck on simple forever loops.

Partly addresses #465.
meisterT added a commit that referenced this issue Dec 30, 2018
meisterT added a commit that referenced this issue Dec 30, 2018
meisterT added a commit that referenced this issue Jan 3, 2019
This basically means that we use the combined run and compare script.
(The old method with separate scripts is still supported.)

Note that we don't create any compile helpful run scripts yet that call
runpipe. This currently has to be done by the jury.

Part of #465.
meisterT added a commit to meisterT/domjudge that referenced this issue Jan 3, 2019
There's still a difference in judging compared to problemtools which
needs further discussion:
What's the correct verdict if a submission is both TLE and WA and the WA
is determined before the time limit? For non-interactive problems, the
answer is a clear TLE, but problemtools returns WA for those cases.

Part of DOMjudge#465.
meisterT added a commit to meisterT/domjudge that referenced this issue Jan 4, 2019
There's still a difference in judging compared to problemtools which
needs further discussion:
What's the correct verdict if a submission is both TLE and WA and the WA
is determined before the time limit? For non-interactive problems, the
answer is a clear TLE, but problemtools returns WA for those cases.

Part of DOMjudge#465.
meisterT added a commit that referenced this issue Jan 4, 2019
There's still a difference in judging compared to problemtools which
needs further discussion:
What's the correct verdict if a submission is both TLE and WA and the WA
is determined before the time limit? For non-interactive problems, the
answer is a clear TLE, but problemtools returns WA for those cases.

Part of #465.
meisterT added a commit to meisterT/domjudge that referenced this issue Jan 4, 2019
…, override TLE.

See
Kattis/problemtools#77
for an in-depth discussion.

With this, all test submissions of the example guess problem give the
expected answer, but we need to do some cleanup and UI/import changes
before we can close DOMjudge#465.
@meisterT
Copy link
Member Author

meisterT commented Jan 4, 2019

Thanks GitHub. Before we can close is not the same as actually close it now ;-)

@meisterT meisterT reopened this Jan 4, 2019
thijskh pushed a commit that referenced this issue Jan 5, 2019
This fixes a bug with interactive problems where the judgedaemon got
stuck on simple forever loops.

Partly addresses #465.
meisterT added a commit to meisterT/domjudge that referenced this issue Jan 6, 2019
This will give us more test coverage, especially for the new interactive
problem format, see DOMjudge#465. Also, we're testing zipped problem import with
it.
meisterT added a commit to meisterT/domjudge that referenced this issue Jan 6, 2019
This will give us more test coverage, especially for the new interactive
problem format, see DOMjudge#465. Also, we're testing zipped problem import with
it.
meisterT added a commit to meisterT/domjudge that referenced this issue Jan 6, 2019
This will give us more test coverage, especially for the new interactive
problem format, see DOMjudge#465. Also, we're testing zipped problem import with
it.
meisterT added a commit that referenced this issue Jan 6, 2019
This will give us more test coverage, especially for the new interactive
problem format, see #465. Also, we're testing zipped problem import with
it.
@meisterT
Copy link
Member Author

meisterT commented Jan 6, 2019

Update: this is now mostly solved and interactive problems work in the problemtools format. There is still room for improvement, but I don't know if any functional changes that are still to do.

@thijskh
Copy link
Member

thijskh commented Jan 8, 2019

So then the issue can be closed?

@meisterT
Copy link
Member Author

meisterT commented Jan 8, 2019

I'll do some more testing (probably to evening) and close it afterwards.

@eldering eldering assigned meisterT and unassigned eldering Jan 8, 2019
@meisterT
Copy link
Member Author

I've done more testing and integrated the problemtools example in our travis test. I also sent Kattis/problemtools#112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants