> Error while loading train data

data=pd.read_csv('/kaggle/working/train.csv')
ParserError: Error tokenizing data. C error: EOF inside string starting at row 46363

The error seems to go away when using the following workaround:
https://github.com/pandas-dev/pandas/issues/22140

However, would be grateful if you could suggest a cleaner alternative.

Posted by: vasudev13 @ Sept. 6, 2020, 9:41 p.m.

In most cases, it might be an issue with:

the delimiters in your data.
confused by the headers/column of the file.

To solve pandas.parser.CParserError: Error tokenizing data , try specifying the sep and/or header arguments when calling read_csv.

pandas.read_csv(fileName, sep='you_delimiter', header=None)

Also, the Error tokenizing data may arise when you're using separator (for eg. comma ',') as a delimiter and you have more separator than expected (more fields in the error row than defined in the header). So you need to either remove the additional field or remove the extra separator if it's there by mistake. The better solution is to investigate the offending file and to fix it manually so you don't need to skip the error lines.

http://net-informations.com/ds/err/token.htm

Posted by: eltonjorn @ July 12, 2022, 4:54 a.m.
Post in this thread