DSTC 8: End-to-End Multi-Domain Dialog Challenge Track - Task 1 Forum

Go back to competition Back to thread list Post in this thread

> A couple of questions wrt. uploading system configs and evaluation

Hi,

I hope this isn't obvious and I have simply overlooked the answers, but I'm a bit stumped for how to properly test our system before submitting them, and what happens with a submission.
If you could answer some questions, that'd be great!

1) Automatic evaluation:
I Have actually just checked the ConvLab GitHub repository, and see that you have updated the evaluation description, which is great!
It now says "Automatic end2end Evaluation: The submitted system will be evaluated using the current user-simulator in the repository with milu_rule_rule_template setting in demo.json".
However, in the next sentence, it then states "We will report metrics including success rate, average reward, number of turms, precision, recall, and F1 score."

The two sentences are somewhat contradictory I think? The milu_rule_rule_template setup in demo.json (when we run it), evaluates to something like "100 episodes, 45.55 return, 77.00% success rate, 8.85 turns". On the other hand, when we run the same setup from *baseline.json*, we get a way more detailed output like "100 episodes, 20.30 return, 56.00% success rate, 8.90 turns, 0.64 P, 0.77 R, 0.68 F1, 86.11% book rate". The difference here being, as far as we can see, that "baseline.json" specifies an explicit "evaluator: MultiWozEvaluator" module in the environment.
Note that the two results are vastly different wrt. return and success rate, and that only the latter reports precition, recall, and F1 score... hence the question, will it really be the setup from "demo.json", or should we rather trust the numbers from "baseline.json", using the Evaluator class, when training/testing our models before submission?

2) Submitting systems
For submitting systems, are we understanding this correctly in that we are to zip and upload the entire "ConvLab" folder including the convlab/, data/, docs/, and tutorial/ folders, as well as the run.py script etc.?
Also, how long does it take to see the results online? We submitted a test system (using the "Test Submissions" part of the form) to submit as indicated above, we did get a confirmation email, and the status of the submission is indicated as "Finished" on the submission page however, for score, it currently states "---".
Maybe we did something wrong there? We checked, and the submission works in the docker environment 0.2.2 ...

Thank you so much for organising this challenge!

Cheers,
Philip

Posted by: PhilJohnG @ Aug. 23, 2019, 9:57 a.m.

Thank you for your interest in the challenge and raising the questions!

1) You are right on the difference between demo.json and baseline.json. You need to use the evaluator class for both training and evaluation, and the submitted dialog system will be evaluated based on milu_rule_rule_template in baseline.json. We will update the description soon. Thank you for pointing this out!

2) The current submission system does not support auto evaluation, and therefore you can not see the test submission result. In this competition, we only evaluate the submission once after the submission deadline. In order to guarantee the correctness of submission format, we have reserved the period Sep. 23, 2019 - Sep. 29, 2019 for participants to test submission. If you have needs before that, please do not hesitate to let us know.

Posted by: jincli @ Aug. 24, 2019, 3:42 a.m.
Post in this thread