About the following sentence:
"We thus expect an absolute quality score for each sentence translation (z-standardised DA)."
Are we suppposed to z-standardize DA scores before submission as if the predictor is a human annotator?
I built models predicting z-scores on the training set directly, which did not achieve high scores.
I then built models predicting DA scores, which achieve MAE of ~13 while the leader board contains entries having MAE of 0.5.
Are we expected to predict DA-scores and then z-standardize them before submission via (x - x.mean()) / x.std() ?
Hi Ergun,
according to your submissions, your sentence indices start at 1,
while they should start at 0. This would explain why you get those
scores.
I sent you this information by email yesterday already.
Best,
Fred