You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ParserError: Error tokenizing data. C error: Expected 464 fields in line 4929258, saw 542
I've checked the offending line and its neighbors using awk to count tabs and they all have 463 tabs (hence 464 fields). I also looked through the fields in those lines and didn't see any odd characters, just the usual strings, identifiers separated by | and dates.
FWIW the same error occur when using skiprows=2 which is a suggestion to deal with problematic headers (shouldn't be the case here anyway).
It's a bit of a puzzle.
The text was updated successfully, but these errors were encountered:
Here is a snippet of rows, the offending row should be data row 2 or 3 or 4, depending on how the pandas code counts lines (w/ or w/o header, 0 or 1 offset).
When I read this snippet with the same pandas code there is no error! And the resulting dataframe is as expected with 4 rows and 464 columns.
When reading the full biosample tsv table with pandas like this:
df_biosample = pd.read_cvs("harmonized-table.tsv", sep="\t")
This error pops up deep into the file:
ParserError: Error tokenizing data. C error: Expected 464 fields in line 4929258, saw 542
I've checked the offending line and its neighbors using awk to count tabs and they all have 463 tabs (hence 464 fields). I also looked through the fields in those lines and didn't see any odd characters, just the usual strings, identifiers separated by | and dates.
FWIW the same error occur when using skiprows=2 which is a suggestion to deal with problematic headers (shouldn't be the case here anyway).
It's a bit of a puzzle.
The text was updated successfully, but these errors were encountered: