Can you please give some insight on the process of evaluating the regression model?
Do you drop first all the non-humor texts? (The real 0-labeled texts)
The baseline is trained dropping the empty targets and then predicting on all the test corpus?
Thanks for the help
Hi,
Yes, the RMSE is calculated only for the tweets considered humorous, the others are discarded (notice that you need to calculate a rating for all tweets anyway, because you do not know which ones are humorous in the test set).
The baseline was trained using only the ratings for the humorous tweets in the training corpus.
Regards,
Luis