1. Can we use the prompt and the role sentences for our model? are these sentences also included in the real test data?
2. Are support dialogs given in the actual evaluation data the same task as the test dialog(within-task setting), or just the same domain? (cross-task setting).
3. Is the number of support dialogs per one test case also 128 in real test data?
4. During the development phase, could we assume that it is similar with the actual evaluation environment to test our model on MutiWoz dataset by measuring slot/intent accuracy of the predicted user response using Task1 baseline NLU?
5. Could you provide a more detailed description on the automatic measures(e.g. F1 score or accuracy of the slot/intent detection) that will be used for the actual test environment?
Thank you!
Posted by: han0ah @ Aug. 9, 2019, 1:44 a.m.Hi! Thanks for your interest and questions, please see the response posted to the issue on GitHub: https://github.com/microsoft/dstc8-meta-dialog/issues/4
Posted by: adatkins @ Aug. 12, 2019, 3:09 p.m.