AutoML 2018 challenge :: PAKDD2018 Forum

Go back to competition Back to thread list Post in this thread

> *_public.info files and number of realisations

Hello, I have two unrelated questions.

1 - Will the dataset folders contain .info files in the AutoML phase?
2 - Is this feedback rank based in the submitted codes running in a single realisation of each dataset? If not, how many are used?

Thank you.

Posted by: vctrop @ Feb. 2, 2018, 9:42 p.m.

Actually I realized that the second questions does not make sense, sorry.

Posted by: vctrop @ Feb. 2, 2018, 9:54 p.m.

1 - Will the dataset folders contain .info files in the AutoML phase?

Yes. The datasets in the AutoML phase are in the same format as in the Feedback phase.

Posted by: tuweiwei @ Feb. 6, 2018, 5:46 a.m.

Hello,

We also have some questions regarding the public.info file. We know the actual challenge is that we do not know what will happen in the final phase, but nevertheless we wonder why:
* the StarterKit assigns "Mixed" as the default "feat_type" (if there is no info-file). Could you maybe provide us with a set of feat_types we should be aware of?
* public.info for rl does not contain "num_valid" and "usage". Is this intended or do we not need to produce validation predictions for such datasets?
* 'is_sparse' is never active. Should we expect sparse datasets? If yes, could you provide us with possible formats to prepare for that?

Best,
AAD Freiburg

Posted by: aad_freiburg @ Feb. 6, 2018, 1:56 p.m.

Thank you for the answer, tuweiwei, and thank you AAD Freiburg for pointing these questions.

Best wishes.

Posted by: vctrop @ Feb. 7, 2018, 3:22 a.m.

Dear AAD Freidburg team,
Based in the absence of 'dataname_valid.solution' files in the dataset folders, I believe that there is no point in generating validation predictions for any of the datasets, as this could not be used in the model selection procedure.

Posted by: vctrop @ Feb. 7, 2018, 3:37 a.m.

Dear colleagues,

Regarding AAD_Freiburg team's comments, we can answer:

* the StarterKit assigns "Mixed" as the default "feat_type" (if there is no info-file). Could you maybe provide us with a set of feat_types we should be aware of?

> The possible file types are "Categorical", "Numerical", "Binary" and "Mixed" (which can be further detailed, e.g., "Numerical & Categorical")

* public.info for rl does not contain "num_valid" and "usage". Is this intended or do we not need to produce validation predictions for such datasets?

You are right, we do not expect predictions for validation sets in the final phase (regardless of wether public.info indicates presence or not of this split)

* 'is_sparse' is never active. Should we expect sparse datasets? If yes, could you provide us with possible formats to prepare for that?
no sparse data sets will be provided

Posted by: hugo.jair @ Feb. 7, 2018, 3:57 p.m.
Post in this thread