2019 Untapped Energy reCLAIM Data Competition: Regression Challenge Forum

Go back to competition Back to thread list Post in this thread

> Treatment of NaNs

Hey, i have a few questions on this overfitting exercise :)

1) Is the current metric a weighted average of all three targets ? if yes, what are the weights?
2) How specifically does your current metric on the platform treat NaNs in the predictions ?

Thanks.

Posted by: ANTONBIRYUKOV2019 @ Oct. 25, 2019, 7:09 p.m.

The “answers” to your questions are as follows :)
1a. Yes the current metric is a weighted average of all three targets.
1b. I’ll tell you the weights after the competition has closed, but you’ll probably have to remind me.
2. I am not sure why someone would submit NaNs as predictions, so I am not sure why you would want to know how the metric handles them, so I am not going to answer your question unless you can explain why it makes sense to submit NaNs as predictions.
Thanks.

Posted by: untappedenergy2019 @ Oct. 25, 2019, 7:56 p.m.

Quite a few wells in the data do not have all three target metrics provided in both validation and training sets.

Knowing how the metric treats NaN helps figure out what to do with them during training - the imputation strategy / dropping out completely / adversarial validation strategy idea generation.
Since you're trying to follow a Kaggle's model, I am pretty sure things like that are disclosed in detail in the appropriate section.

Posted by: ANTONBIRYUKOV2019 @ Oct. 25, 2019, 8:15 p.m.

I am pretty sure there would be a few things I'd do differently if I were told that a metric is calculated on df.dropna(subset=['target_1','target_2','target_3']) vs calculating 3 numbers using df.dropna(subset='target_i') yielding 3 different subsets of wells.

Judging by the fact that you simply did a random shuffled test-val-train, i can foresee the NaN situation described above relevant in the test set.

Posted by: ANTONBIRYUKOV2019 @ Oct. 25, 2019, 8:24 p.m.

Okay, now I understand your question. Thanks for the clarification.
Anything in the target fields in the training and validation set that is blank or NaNs should be assumed to be equal to 0. There are no NaNs in the test set targets that you are submitting against.
Let me know if you need further details.

Posted by: untappedenergy2019 @ Oct. 25, 2019, 8:48 p.m.
Post in this thread