2019 Untapped Energy reCLAIM Data Competition: Classification Challenge Forum

Go back to competition Back to thread list Post in this thread

> Re: From Song li to CodaLab Team

Some body send me an email without a valid email address to receive feedback. So I answer the question here.
Since the data is highly unbalanced so if you try different combination of training data it would have different results.
It range from 34% which is random to 94% when given 80% data to rain on an extremely simplified model with 20 epochs to stabilized. but you can get even up to 1.0 in certain cases if given 90% data for training and higher epochs.
Since the active wells in Canada are more than 80% in general and 1 is basically not possible. So we must need to process the data by SMOTE or manipulate the data to generate 10000 times the data to get a better balanced data set.
The best model I tried is use a combination of both deep and wide neuron network + rule based filter to process the data after the prediction.
Since I was a professional in the oil and gas industry for more than 20 years. I realized this is a very complicated problem, so it is not a simple rule or algorithm based problem, it has a lot to do with companies strategy and individual judgement when it comes to make decisions to abandon a well or not. Even the numbers from the model are high, but I don't believe in the data and model either and thus not ready to publish it yet. But we can discuss results and coding so we can learn from each other and improve on that.

Posted by: SeanLiUE2019 @ Nov. 2, 2019, 4:22 a.m.
Post in this thread