Skip to content

Use validator output if it exits first, for interactive problems #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 19, 2018

Conversation

simonlindholm
Copy link
Member

As per the commit message:

Before this change, if a validator detected, say, an invalid input
format and exited, it would close its output pipe, and the submission
would proceed to read EOF and possibly crash/TLE. The output of the
submission would then be displayed to the user.

This isn't great -- it either results in confusing judgements, or
necessitates adding convoluted clean exit protocols to the problem to
deal with these cases. In the latter case, the contestant needs to add
probably-unnecessary hard-to-test code, and the description of the
protocol can bog down the problem statement (see e.g. "Go, Gopher" from
Google Code Jam 2018: https://goo.gl/ccT6t3).

This commit takes the stand that if the validator exits first with a WA
status, that should take precedence. Note that this may result in WA
statuses where the time limit is exceeded.

With a few other cleanups thrown in.

Changes the output format of interactive.cc, I don't know if that matters for other consumers than kattis.

It mixed spaces and tabs.
Before this change, if a validator detected, say, an invalid input
format and exited, it would close its output pipe, and the submission
would proceed to read EOF and possibly crash/TLE. The output of the
submission would then be displayed to the user.

This isn't great -- it either results in confusing judgements, or
necessitates adding convoluted clean exit protocols to the problem to
deal with these cases. In the latter case, the contestant needs to add
probably-unnecessary hard-to-test code, and the description of the
protocol can bog down the problem statement (see e.g. "Go, Gopher" from
Google Code Jam 2018: https://goo.gl/ccT6t3).

This commit takes the stand that if the validator exits first with a WA
status, that should take precedence. Note that this may result in WA
statuses where the time limit is exceeded.
@simonlindholm
Copy link
Member Author

Now also with another (not yet tested) fix for validator SIGPIPEs. Unrelated but pretty similar to (and dependent upon) the rest of the PR.

@simonlindholm
Copy link
Member Author

@austrin Think you could review this? Does the approach make sense?

@simonlindholm
Copy link
Member Author

@austrin Ping! Or are you trying to safeguard yourself against interactive problems in the coding cup finals? ;)

@simonlindholm
Copy link
Member Author

@austrin Re-ping.

Copy link
Contributor

@austrin austrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the goal of giving the validator a bit more say. One general concern I have however is that this breaks the invariant "judgement = TLE <=> runtime > timelim", and I really don't want to do that. I can see two ways of addressing that:

  1. Change it so that if runtime > timelim then result is TLE regardless of what validator says (but if submission crashed or exceeded walltime limit, which are probably more common ways of failing after validator has terminated, then validator gets its say).
  2. Set runtime=timelim in the case when validator terminated first with WA but submission ran over time

I think option 2 is pretty ugly so would prefer option 1.

volatile int val_pid = -1, user_pid = -1;
volatile int user_status = -1, val_status = -1;
volatile static rusage user_ru;
volatile static rusage val_ru;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think these (and others below) need to be volatile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used from a signal handler, so it seemed "more correct" (even if still actually incorrect -- only volatile sig_atomic_ts and atomics are allowed, and even then we have race conditions that make the whole approach wrong). But until the signal handler is fixed properly, this may at least prevent the compiler from reordering some assignments etc. which I wouldn't be surprised if it does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

while (remaining > 0) {
int status;
struct rusage ru;
int r = wait4(-1, &status, 0, &ru);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to wait3?

if (r == user_pid) {
// In case of broken pipes, let validator decide
user_status = (WIFSIGNALED(status) && WTERMSIG(status) == SIGPIPE ? 0 : status);
memcpy((void*)&val_ru, &ru, sizeof(rusage));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be user_ru not val_ru.

@simonlindholm
Copy link
Member Author

Change it so that if runtime > timelim then result is TLE regardless of what validator says (but if submission crashed or exceeded walltime limit, which are probably more common ways of failing after validator has terminated, then validator gets its say).

With the PR as is walltime limit exceeded will practically never happen when the validator exits early, because then we close all pipes and you cannot block on them forever. We could of course decide not to close the pipes and then wait for the wall time limit to hit, but that seems somewhat silly in that it only prolongs time until error...

How about setting runtime = min(validator exit time, submission exit time)?

@austrin
Copy link
Contributor

austrin commented Jun 28, 2018

With the PR as is walltime limit exceeded will practically never happen when the validator exits early, because then we close all pipes and you cannot block on them forever. We could of course decide not to close the pipes and then wait for the wall time limit to hit, but that seems somewhat silly in that it only prolongs time until error...

Right, I agree. (But the crashing part is still probably the most common way for a solution to fail after the validator has terminated.)

How about setting runtime = min(validator exit time, submission exit time)?

I don't think that makes sense, the two programs may need very differing amounts of CPU time and there is no reason why the CPU time of the validator would be any reasonable measure of the CPU time of the submission. However, one possibility that I think would make a lot of sense would be to fetch the current CPU time of the submission process at the time the validator exits.

@simonlindholm
Copy link
Member Author

Right, I agree. (But the crashing part is still probably the most common way for a solution to fail after the validator has terminated.)

If it's just "the most common way" then it doesn't help because you still need to know about the edge case... Also, I'm not sure my experiences agree with crashing being more likely than TLE.

However, one possibility that I think would make a lot of sense would be to fetch the current CPU time of the submission process at the time the validator exits.

Sounds good! I'll see if I can make this work here and in Kattis (tomorrow or so).

@simonlindholm
Copy link
Member Author

Also worth mentioning: the first approach I had to this was to kill the submission when the validator exited; however, it turns out that that doesn't work well with isolate, which runs as sudo and is thus unkillable without complex hackery. Fetching the current CPU time should have the same effect as that (UI-wise), nicely enough.

@niemela
Copy link
Member

niemela commented Jun 28, 2018

However, one possibility that I think would make a lot of sense would be to fetch the current CPU time of the submission process at the time the validator exits.

Sounds good! I'll see if I can make this work here and in Kattis (tomorrow or so).

Agreed, that would give the expected results. Because what you would actually really want is...

Also worth mentioning: the first approach I had to this was to kill the submission when the validator exited;

... to kill the submissions as soon as the validator has decided to give up.

however, it turns out that that doesn't work well with isolate, which runs as sudo and is thus unkillable without complex hackery. Fetching the current CPU time should have the same effect as that (UI-wise), nicely enough.

👍

@simonlindholm
Copy link
Member Author

simonlindholm commented Jul 1, 2018

It appears to be pretty hard to get to that information. wait4 doesn't return usage information about running processes; you have to parse /proc/<id>/stat. However, that only gives you the CPU usage of the process and the sum of CPU usage of terminated-and-waited-for children, and if the process is, say, isolate, both of these values are zero. Which means we only have bad options remaining:

  • do some sort of process tree traversal, with all the racefulness that implies. Could be improved by SIGSTOP'ing the process first, but ugh. In a sense it's OK if we mess up and return an invalid value because the result will be WA anyway, but...
  • kill the submission as soon as the validator stops, and on Kattis's side try to run the program with extended caps that allow killing privileged processes (setcap cap_kill+ep). Hard to get right (security issues, and is it even OK to kill isolate half-way through?).
  • pull a runtime out of thin air, e.g. 0 or exactly the TL.

@austrin
Copy link
Contributor

austrin commented Aug 23, 2018

There is also the (also bad) option of keeping TLE if process uses too much CPU. I would slightly prefer that, but I am also fine with capping runtime at timelim in the case when validator decides on WA before the user process exits.

@simonlindholm
Copy link
Member Author

There is also the (also bad) option of keeping TLE if process uses too much CPU. I would slightly prefer that, but I am also fine with capping runtime at timelim in the case when validator decides on WA before the user process exits.

In my view that's more or less equivalent to (maybe slightly worse than) wontfixing this PR, since it would still require explaining in the problem statement that you must (somehow, by some problem-specific scheme) exit nicely when the validator exits to avoid confusing verdicts. But sure, it's an option.

Anyway, I pushed a commit for the runtime = min(runtime, timelim) behavior (not very well tested but I expect it to work), and another to address the review comments you had earlier.

@austrin
Copy link
Contributor

austrin commented Aug 23, 2018

Sounds good. How about also adding a test submission to examples/guess which gets WA this way?

@austrin austrin merged commit fec8d1e into Kattis:develop Sep 19, 2018
@simonlindholm
Copy link
Member Author

Thanks for merging! I agree that a test is a great idea, I'll try to get to that at some point if you don't do it before me. Also, does this need some sort of spec update or coordination with other judges?

meisterT added a commit to meisterT/domjudge that referenced this pull request Jan 4, 2019
…, override TLE.

See
Kattis/problemtools#77
for an in-depth discussion.

With this, all test submissions of the example guess problem give the
expected answer, but we need to do some cleanup and UI/import changes
before we can close DOMjudge#465.
meisterT added a commit to DOMjudge/domjudge that referenced this pull request Jan 4, 2019
…, override TLE.

See
Kattis/problemtools#77
for an in-depth discussion.

With this, all test submissions of the example guess problem give the
expected answer, but we need to do some cleanup and UI/import changes
before we can close #465.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants