The values in DateOfBirth are of range 9999-12-01 00:00:00.000 - 9999-12-01 00:00:00.000 . Any thoughts on why it is this way ?
Posted by: Raghavan @ Aug. 7, 2020, 8:16 a.m.Yeah, that made me curse the EEDI developers who didn't implement simple date validators in their registration flow :)
This is just kids being kids, and typing whatever they want in EEDI registration :)
It took me some hours to clean all these bad data. The snippet below might help you:
df['DateAnswered'] = pd.to_datetime(df['DateAnswered'], errors='coerce')
df['DateOfBirth'] = pd.to_datetime(df['DateOfBirth'], errors='coerce')
df['Age'] = (df['DateAnswered'] - df['DateOfBirth']) / np.timedelta64(1, 'Y')
df.loc[df[df['Age'] < 5].index, 'Age'] = 5
df.loc[df[df['Age'] > 50].index, 'Age'] = 50
df['Age'] = df['Age'].fillna(df['Age'].mean())
Cheers!