Predicting Generalization in Deep Learning Forum

Go back to competition Back to thread list Post in this thread

> Submission Stuck

Hi, my submission seems to be stuck at the 'Submitted' stage.

Can you please check?

Posted by: parthnatekar @ Aug. 7, 2020, 8:56 a.m.

If a submission stays at submitted stage, it means that all workers are occupied right now.
This is the intended behavior and waiting for a bit usually works (we have some autoscaling scheme implemented).

Posted by: ydjiang @ Aug. 8, 2020, 5:17 p.m.

Hi,

Thanks for your reply.

Also, for some reason, my top scoring submission is not showing up on the dashboard. Is this an error from Codalab's side, and if so, how can this be solved?

Posted by: parthnatekar @ Aug. 13, 2020, 7:24 a.m.

Yes, this error is unfortunately on Codalab's side. There are some bugs when members of a team submit.
If you can send me the time-stamp of the submission, I can make it re-appear for you.

Posted by: ydjiang @ Aug. 13, 2020, 3:43 p.m.

Thanks, the time stamp is 08/09/2020 04:23:04

But I believe this will only show till the next time my team makes a submission?

Posted by: parthnatekar @ Aug. 13, 2020, 5:21 p.m.

It should be on the leaderboard now.

Posted by: ydjiang @ Aug. 13, 2020, 5:35 p.m.

Hi, my submission timestamped 09/23/2020 05:50:23 has been running for nearly 24 hours. It runs on my local in reasonable time.

Very similar (almost identical) previous submissions have had no problems.

Can you please check?

Posted by: parthnatekar @ Sept. 24, 2020, 3:07 a.m.

Hi, if it's been more than 24 hours and there is no ingestion logs, it usually means it was timed out (i.e. this job has been terminated).
Codalab for some reason does not update the GUI properly.

Posted by: ydjiang @ Sept. 24, 2020, 3:21 a.m.

Thanks for your reply. Is there a reason that two very similar submissions act differently, i.e. one times out while the other doesn't. The memory/time complexity is the same.

Posted by: parthnatekar @ Sept. 24, 2020, 4:19 a.m.

How long does the other one take?

Posted by: ydjiang @ Sept. 24, 2020, 4:21 a.m.

How long does the other one take?

Posted by: ydjiang @ Sept. 24, 2020, 4:21 a.m.

Tentatively about a quarter of a day.

Posted by: parthnatekar @ Sept. 24, 2020, 4:33 a.m.

Ok. That is strange. I re-ran that submission. Let's see if this happens again.

Posted by: ydjiang @ Sept. 24, 2020, 4:48 a.m.

Thankyou. Just for reference, I ran this on my local on the public data, and it takes a few hours. I have about half the memory resources as Codalab assigns to individual submissions.

Posted by: parthnatekar @ Sept. 24, 2020, 7:19 a.m.

Hi, the re-run submission has now finished and is waiting to be scored (which I believe will happen soon, but I'll let you know if this doesn't happen).

I'm not sure why the original submission timed out.

However, the re-run has created a new submission with a new timestamp, so I now have a duplicate submission.

Posted by: parthnatekar @ Sept. 24, 2020, 10:23 a.m.

Hi, this is the expected behavior. Re-run copies the old submission and creates a new submission.
I don't think that should count towards your submission count? Could you confirm that?

Posted by: ydjiang @ Sept. 24, 2020, 9:21 p.m.

Hi, yes, the submission that you re-ran from your end is adding to my submission count.

Posted by: parthnatekar @ Sept. 29, 2020, 6:55 a.m.

Hi, this seems to be happening again for the submission with timestamp 10/07/2020 11:26:12. As we get closer to the competition deadline this is costing me valuable time. Can you please look into this?

Posted by: parthnatekar @ Oct. 8, 2020, 5:36 a.m.

I have re-ran that submission.
Really sorry for the inconvenience. As it stands, we don't actually have a good solution to this problem because we haven't been able to reliably reproduce the error and it doesn't help that the jobs are distributed through tens of machines.
I believe this is the first time that Codalab is hosting a competition where submission needs to run for such extended period of time so this is pretty much uncharted territory. Thank you for understanding!

Posted by: ydjiang @ Oct. 8, 2020, 5:42 a.m.

Thanks for the prompt reply, I understand.

Would it be possible to increase my daily submission count by 1, since the submission you re-ran has taken up 1 of my slots?

Posted by: parthnatekar @ Oct. 8, 2020, 5:45 a.m.

Sure. You should have 4 submissions now.

Posted by: ydjiang @ Oct. 8, 2020, 5:48 a.m.

Thank you

Posted by: parthnatekar @ Oct. 8, 2020, 6:06 a.m.

Hi, the submission you re-ran failed without any error. It ran for a few models in the past ~1.5 hours (I can see partial results in the ingestion output), which tells me that it isn't timing out. I will recheck my code to see if there are any issues.

Posted by: parthnatekar @ Oct. 8, 2020, 7:56 a.m.

The time-stamp for the above submission is 10/08/2020 05:38:53.

If a submission times out for a particular model, does it continue or stop there?

Posted by: parthnatekar @ Oct. 8, 2020, 7:58 a.m.

The timeout is computed per task. If a task exceeds the allocated time (i.e. number of model X 5 minutes), it will automatically time-out. Everything after will not be run. But as long as there is time left it's ok for a single model to exceeds 5 minutes.
That being said, I don't think your submission timed out. Check the error log for more details.
It's 4 am for me right now so I unfortunately cannot dig further. Let me know if you cannot fix this issue.

Posted by: ydjiang @ Oct. 8, 2020, 8:03 a.m.
Post in this thread