My understanding is that the Oracle has been trained on the full sample of unaggregated data. So given that already we have models that are within 3% of the Oracle - is the Oracle score a strict upper bound?
Is it possible to do better than the Oracle?
So even though it is unlikely because of the noise, in practice it could be possible to do a little bit better than the "Oracle".
As you said it is only a good model learned on the granular data (without aggregation and noise) so it might be possible for an exceptional model learned on the aggregated, noisy data to perform better.
Posted by: eustache @ June 16, 2021, 1:11 p.m.