-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dumping arrays into Base64 encoding #9
Comments
Writing: import numpy as np
import base64
import json
def write_array_to_json(array: np.ndarray, file_path: str):
# Ensure the array is in a byte form suitable for Base64 encoding
binary_data = array.tobytes()
# Encode the array data as Base64
encoded_data = base64.b64encode(binary_data).decode('utf-8')
# Determine the array's ordering
ordering = 'F' if array.flags['F_CONTIGUOUS'] else 'C'
# Construct the JSON object
array_metadata = {
"binaryData": encoded_data,
"shape": array.shape,
"dtype": str(array.dtype),
"ordering": ordering,
"encoding": "Base64"
}
# Write the JSON object to a file
with open(file_path, 'w') as json_file:
json.dump(array_metadata, json_file)
# Example usage
array = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float64)
write_array_to_json(array, 'array_data.json') Reading: def read_array_from_json(file_path: str) -> np.ndarray:
# Read the JSON object from the file
with open(file_path, 'r') as json_file:
array_metadata = json.load(json_file)
# Decode the Base64-encoded data
binary_data = base64.b64decode(array_metadata["binaryData"])
# Reconstruct the array using the decoded data and metadata
array = np.frombuffer(binary_data, dtype=array_metadata["dtype"])
array = array.reshape(array_metadata["shape"], order=array_metadata["ordering"])
return array
# Example usage
reconstructed_array = read_array_from_json('array_data.json')
print(reconstructed_array) |
Got this built into the model_dump_json method here: I figure if we're b64 encoding we might as well compress :) This is what I wanna do today anyway: we'll want to make a meta-schema that lets us express multiple encodings for a given array schema. So that'll include a reference to what format the array is in, and then for a given format like numpy what information is needed to pack and unpack. I like what ya got there as a start for the numpy schema, you got a linkml version? I think we'll also want to store numpy array format version, dtype, and order in a way that matches their format spec I think: https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html#module-numpy.lib.format |
This is great, @sneakers-the-rat! I would say I'd probably prefer gzip as it comes with Python and does not have any external dependencies. Using a bool for F vs. C sounds fine. |
prototype for expressing array in Base64:
The text was updated successfully, but these errors were encountered: