Interactive problems checkers are not very robust #465

meisterT · 2018-11-26T08:10:24Z

Description of the problem / feature request

Support interactive problems properly, preferably by supporting the problemtools format, see https://github.com/Kattis/problemtools/tree/develop/examples/guess for an example similar to our own boolfind.

Currently it's easy to block a judgedaemon or produce "incorrect" verdicts because of how we separate running from validation.
We should consider adding a proper wrapper to both runner / validator scripts during problem import to make setting up these problems as easy as possible.

Steps to reproduce

Example of a incorrect/malicious submission:

int main() {
	while (true) {
		cout << "READ 0" << endl;
	}
	return 0;
}

Expected behaviour

Wrong answer.

Actual behaviour

Judgedaemon is hanging.

Any other information that you want to share?

There are other problematic submissions as well, I'll add examples in a later post when I got them.

The text was updated successfully, but these errors were encountered:

meisterT · 2018-12-04T06:22:23Z

I looked into this yesterday evening a bit. We do kill the submission but it ends up in defunct/zombie state. This is because we don't call wait/waitpid. After adding more debug statements, I found that we hang in either slice/read (depending on what type is used).

So my current idea is the following:
While processing output from the program, we receive SIGALRM in the parent process and then process it. We do a SA_RESTART, so we restart the system call after handling the signal. This then seems to block indefinitely.

The pstree looks like this:

php(13445)───sh(13566)───testcase_run.sh(13567)───runpipe(13628)─┬─runjury(13629)
                                                                                        └─sudo(13630,root)───runguard(13631)

@eldering I did look for the pipe picture of how we set up judging for interactive problems, but haven't found it. Do you know where it lives?

eldering · 2018-12-04T06:31:42Z

new-io-redirection.pdf

I think I never committed that image, but just sent it to domjudge-devel. See attached.

Oh, and the runpipe helper that's executed by the run script is missing here.

meisterT · 2018-12-04T07:05:59Z

Thanks for the image, that looks roughly like what I had in mind.
Since adding an image doesn't fix the bug, I'll reopen it ;-)

meisterT · 2018-12-06T16:47:35Z

I investigated a bit more.

In the runpipe we wait until both children have exited (note that not the team submission is one of them but runguard). After that we would close the pipes / file descriptors. But the children didn't exit so far.

In runguard, we killed the team submission which is in zombie state because we didn't wait for it yet. We do hang on a blocking read, so we don't reach the waitpid which would allow us to terminate ourselves.

My current (not well thought out) guess is that we should merge runpipe into runguard.

meisterT · 2018-12-08T08:55:20Z

@eldering what is this code supposed to do?https://github.com/DOMjudge/domjudge/blob/master/judge/runpipe.c#L252-L253
I don't get why we do a blocking wait right after the non-blocking wait, but perhaps I'm missing something.

Anyway, Jaap mentioned in IRC that merging runpipe into runguard is a lot of change which we probably don't want if we can avoid it.
So I toss another idea around:

add option --notifyontle <pid> to runguard
special case runpipe to make it aware which command is which and add own pid to runguard's arguments
send SIGUSR1 from runguard to runpipe when we run into SIGALRM
do the "right thing" in runpipe on SIGUSR1 receival, i.e. probably close all fds, so that the normal flow in runguard that keeps reading from the pipes can finish

@eldering, since you are clearly the best person to fix this, I'll assign the bug to you :-)

This fixes a bug with interactive problems where the judgedaemon got stuck on simple forever loops. Partly addresses DOMjudge#465.

This fixes a bug with interactive problems where the judgedaemon got stuck on simple forever loops. Partly addresses #465.

See http://www.problemarchive.org/wiki/index.php/Output_validator and https://github.com/Kattis/problemtools/blob/develop/examples/guess/output_validators/guess_validator/validate.cc Usage and UI will come in follow-up commits. Part of #465.

This basically means that we use the combined run and compare script. (The old method with separate scripts is still supported.) Note that we don't create any compile helpful run scripts yet that call runpipe. This currently has to be done by the jury. Part of #465.

There's still a difference in judging compared to problemtools which needs further discussion: What's the correct verdict if a submission is both TLE and WA and the WA is determined before the time limit? For non-interactive problems, the answer is a clear TLE, but problemtools returns WA for those cases. Part of DOMjudge#465.

There's still a difference in judging compared to problemtools which needs further discussion: What's the correct verdict if a submission is both TLE and WA and the WA is determined before the time limit? For non-interactive problems, the answer is a clear TLE, but problemtools returns WA for those cases. Part of #465.

…, override TLE. See Kattis/problemtools#77 for an in-depth discussion. With this, all test submissions of the example guess problem give the expected answer, but we need to do some cleanup and UI/import changes before we can close DOMjudge#465.

meisterT · 2019-01-04T08:26:40Z

Thanks GitHub. Before we can close is not the same as actually close it now ;-)

This fixes a bug with interactive problems where the judgedaemon got stuck on simple forever loops. Partly addresses #465.

Part of #465.

This will give us more test coverage, especially for the new interactive problem format, see DOMjudge#465. Also, we're testing zipped problem import with it.

This will give us more test coverage, especially for the new interactive problem format, see #465. Also, we're testing zipped problem import with it.

meisterT · 2019-01-06T19:58:05Z

Update: this is now mostly solved and interactive problems work in the problemtools format. There is still room for improvement, but I don't know if any functional changes that are still to do.

thijskh · 2019-01-08T17:48:28Z

So then the issue can be closed?

meisterT · 2019-01-08T18:19:04Z

I'll do some more testing (probably to evening) and close it afterwards.

meisterT · 2019-01-13T16:48:45Z

I've done more testing and integrated the problemtools example in our travis test. I also sent Kattis/problemtools#112

meisterT added bug judging backend labels Nov 26, 2018

eldering closed this as completed Dec 4, 2018

meisterT reopened this Dec 4, 2018

meisterT assigned eldering Dec 8, 2018

meisterT added a commit to meisterT/domjudge that referenced this issue Dec 30, 2018

Also wait for child after terminating it (due to SIGALRM/timelimit).

ef7c42f

This fixes a bug with interactive problems where the judgedaemon got stuck on simple forever loops. Partly addresses DOMjudge#465.

meisterT mentioned this issue Dec 30, 2018

Also wait for child after terminating it (due to SIGALRM/timelimit). #479

Merged

meisterT added a commit that referenced this issue Dec 30, 2018

Also wait for child after terminating it (due to SIGALRM/timelimit).

d7fc771

This fixes a bug with interactive problems where the judgedaemon got stuck on simple forever loops. Partly addresses #465.

meisterT added a commit that referenced this issue Dec 30, 2018

Add a test for #465.

a10eecb

meisterT added a commit that referenced this issue Dec 30, 2018

Add a test for #465.

0656669

meisterT mentioned this issue Jan 3, 2019

Add option to write out metadata from runpipe. #482

Merged

meisterT mentioned this issue Jan 3, 2019

Implement combined run and compare scripts for interactive problems. #483

Merged

meisterT closed this as completed in fc968e2 Jan 4, 2019

meisterT reopened this Jan 4, 2019

meisterT mentioned this issue Jan 4, 2019

If the validator of a combined run/compare script exits first with WA, override TLE. #484

Merged

thijskh pushed a commit that referenced this issue Jan 5, 2019

Also wait for child after terminating it (due to SIGALRM/timelimit).

3b9f831

This fixes a bug with interactive problems where the judgedaemon got stuck on simple forever loops. Partly addresses #465.

meisterT added a commit that referenced this issue Jan 6, 2019

Add UI option to use run script also as compare script.

34fe428

Part of #465.

meisterT mentioned this issue Jan 6, 2019

Load the examples from problemtools in travis. #486

Merged

meisterT added a commit that referenced this issue Jan 6, 2019

Load the examples from problemtools in travis.

5b95519

This will give us more test coverage, especially for the new interactive problem format, see #465. Also, we're testing zipped problem import with it.

eldering assigned meisterT and unassigned eldering Jan 8, 2019

meisterT closed this as completed Jan 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interactive problems checkers are not very robust #465

Interactive problems checkers are not very robust #465

meisterT commented Nov 26, 2018

meisterT commented Dec 4, 2018

eldering commented Dec 4, 2018 •

edited

Loading

meisterT commented Dec 4, 2018

meisterT commented Dec 6, 2018 •

edited

Loading

meisterT commented Dec 8, 2018

meisterT commented Jan 4, 2019

meisterT commented Jan 6, 2019

thijskh commented Jan 8, 2019

meisterT commented Jan 8, 2019

meisterT commented Jan 13, 2019

Interactive problems checkers are not very robust #465

Interactive problems checkers are not very robust #465

Comments

meisterT commented Nov 26, 2018

Description of the problem / feature request

Steps to reproduce

Expected behaviour

Actual behaviour

Any other information that you want to share?

meisterT commented Dec 4, 2018

eldering commented Dec 4, 2018 • edited Loading

meisterT commented Dec 4, 2018

meisterT commented Dec 6, 2018 • edited Loading

meisterT commented Dec 8, 2018

meisterT commented Jan 4, 2019

meisterT commented Jan 6, 2019

thijskh commented Jan 8, 2019

meisterT commented Jan 8, 2019

meisterT commented Jan 13, 2019

eldering commented Dec 4, 2018 •

edited

Loading

meisterT commented Dec 6, 2018 •

edited

Loading