in the public_train.csv file, id=82 there are wrong-labelled sample with: text in the num_comment_post column and relatively high value for the num_share_post
also: id=432 in warm up train set, id=5835 in public test set
Hi,
Thank you for the feedback.
There are minor shifting errors for some data points. The long sequence of number should be in "timestamp" column. And the strings in the "likes", "comments" columns are parts of the post content.
Therefore, you can (1) combine parts of the post content , (2) paste the timestamp to its correct column and (3) treat values in "likes", "comments" and "share" columns as missing data.
We apologize for the inconvenience. Thank you for pointing out the error.
Regards,
ReINTEL Team