How to read xlsx files using fread from pydatatable? #3259
Replies: 4 comments
-
Yes, for the moment datatable requires There is a PR to address the issue you mention, however, merging it could significantly drop performance when reading |
Beta Was this translation helpful? Give feedback.
-
Ups, I am very sorry for stalling the PR. I tried some other things at the python level but I did not get a better performance than a ~2x decrease. Using the raw data like for xlrd is not possible with openpyxl without an equivalent overhead, at least from my attempts. I agree with previous comments in that going with a C++ implementation may be more aligned with datatable's goals, since the performance decrease seems like inevitable with openpyxl. Feel free to close the PR (or build on top of that). |
Beta Was this translation helpful? Give feedback.
-
OK. thanks for your information. |
Beta Was this translation helpful? Give feedback.
-
https://github.com/dimastbk/python-calamine offers fast reading for excel files. We can use this as a backend instead of openpyxl |
Beta Was this translation helpful? Give feedback.
-
There is a requirement for us to work with xlsx files using pydatatable. I posted a question about how to read XLSX files using fread on SO, here one of community member said that it is required to downgrade xlrd version so that pydatatable would be able to load the given XLSX file. Thats fine, i could do that. but How pydatatable is going to work with XLSX files in future as XLRD is not supporting XLSX files since the lasest versions.
https://stackoverflow.com/questions/71823357/how-to-read-xlsx-files-using-fread-from-pydatatable
Beta Was this translation helpful? Give feedback.
All reactions