Diagnostic Questions - The NeurIPS 2020 Education Challenge Forum

Go back to competition Back to thread list Post in this thread

> DateOfBirth field in student metadata has invalid values

The values in DateOfBirth are of range 9999-12-01 00:00:00.000 - 9999-12-01 00:00:00.000 . Any thoughts on why it is this way ?

Posted by: Raghavan @ Aug. 7, 2020, 8:16 a.m.

Yeah, that made me curse the EEDI developers who didn't implement simple date validators in their registration flow :)

This is just kids being kids, and typing whatever they want in EEDI registration :)

It took me some hours to clean all these bad data. The snippet below might help you:

df['DateAnswered'] = pd.to_datetime(df['DateAnswered'], errors='coerce')
df['DateOfBirth'] = pd.to_datetime(df['DateOfBirth'], errors='coerce')
df['Age'] = (df['DateAnswered'] - df['DateOfBirth']) / np.timedelta64(1, 'Y')
df.loc[df[df['Age'] < 5].index, 'Age'] = 5
df.loc[df[df['Age'] > 50].index, 'Age'] = 50
df['Age'] = df['Age'].fillna(df['Age'].mean())

Cheers!

Posted by: carlossouza @ Aug. 8, 2020, 3:07 p.m.
Post in this thread