-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
np.fromfile() is very slow #58
Comments
I am on vacation so don't have a lot of time to work on this. I recommend that you use the ToSerializable/FromSerializable methods to save/restore an ndarray. Then you can use .NET standard XML/JSON serialization operations to save/restore If you really need to use fromfile for some reason and the performance does not meet your needs, I suggest trying to write your own code to open a file and parse/save it. |
use either ndarray.ToSerializable() or np.ToSerializable(ndarray a). |
please look at issue 48 in this repository for example code |
I'm sorry for not mentioning that I need to read\write binary files, i.e. not XML/JSON. To be precise - I need to read file, then create ndarray from this file using some small offset from the start till the end of the file. Do calculations, and write new data to the same file with the same offset. Typical file size is 100 MB, offset - 100 KB. So it seems to me that To\FromSerializable is not the right choice. |
below is basically what I am doing internally. I don't have time to write the tofile completely today.
|
One big difference is that python/C code can very quickly cast an array of Int16 values to a byte array and do a very fast write of the data. .NET does not like it if you try to cast Int16 to byte so you have to write each value in a loop. That will be slower. |
Got it, thank you. Thinking now if I can get enough speed with .net at all :( |
Here is another idea. Convert the array to bytes first and then write it to disk. See the example below.
|
This is the fastest method of all in scope of NumpyDotNet, achieved 810 MB\s. Thank you! |
I'm porting app from python to c# and now I'm trying to choose .net numpy equivalent. Options are NumpyDotNet and NumSharp.
NumpyDotNet is obvious winner because NumSharp has "not implemented" here and there, and absent documentation and samples. But my app needs to read and write lots of data as fast as it is possible, and here is the problem - NumpyDotNet's np.fromfile() is very slow compared to NumSharp's. Here is a benchmark, MB\s:
NumpyDotNet:
np.fromfile(fullFilePath, np.UInt8);
~150np.fromfile(fullFilePath, np.UInt16);
~140np.fromfile(fullFilePath, np.UInt32);
~260byte[] bytes = File.ReadAllBytes(fullFilePath);
return np.frombuffer(bytes, np.UInt8, metadata.dataSize, metadata.dataOffset);
~580byte[] bytes = File.ReadAllBytes(fullFilePath);
return np.frombuffer(bytes, np.UInt16, metadata.dataSize, metadata.dataOffset);
~690byte[] bytes = File.ReadAllBytes(fullFilePath);
return np.frombuffer(bytes, np.UInt32, metadata.dataSize, metadata.dataOffset);
~690(don't mind offset and size, almost whole array is read)
NumSharp:
np.fromfile(fullFilePath, NPTypeCode.Int32);
~2700reading as NPTypeCode.Int16 is not implemented in NumSharp, so I'm unable to measure.
python numpy:
np.fromfile(file, np.uint16)
~2700My app mostly works with UInt16 with some Float32 in the middle of calculation chain, so I need effective reading\writing of UInt16.
The text was updated successfully, but these errors were encountered: