-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode LogMsg
using protobuf
#8347
base: main
Are you sure you want to change the base?
Conversation
Latest documentation preview deployed successfully.
Note: This comment is updated whenever you push a commit. |
Web viewer built successfully. If applicable, you should also test it:
Note: This comment is updated whenever you push a commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for untangling these dependency issues!
generally looks ok to me, few comments and a question about more clear re-use of common arrow seriailzation logic to make re_log_encoding
crate clearer.
|
||
/// Helper function that deserializes raw bytes into arrow schema and record batch | ||
/// using Arrow IPC format. | ||
fn read_arrow_from_bytes<R: std::io::Read>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have this duplication?
could common arrow serialialization logic be moved somewhere and used between "wire" and "file" protocol?
we also now have codec
, decode
, encoder
and then in protobuf
we also have encoder.rs
and decoder.rs
, not sure if this is temporary, but I wonder if it could be more obvious what is common logic and then what is specific to our file stream and our grpc stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: decoder
encoder
being in two places, that's because we still have to continue supporting msgpack for the time being. So all the new protobuf encoding stuff is under protobuf
. But fundamentally we have 3 layers, our custom stream-level protocol (crate::decoder
, crate::encoder
), the message encoding (crate::protobuf
, with some leftovers in crate::decoder
and crate::encoder
due to msgpack), and the arrow serialization (should be in crate::arrow
).
re: codec
, it's there separately because that's the format used by gRPC, with its own MessageHeader
. the existing encoder
and decoder
will eventually only be used for files (either local or served over HTTP), so I thought it'd be better to keep them separate.
The naming could be clearer, I will reshuffle the modules a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think just some naming reshuffling might help to make things more obvious. Perhaps even worth splitting grpc codec
into encoder and decoder just for consistency...
UnsupportedEncoding, | ||
|
||
#[error("Invalid file header")] | ||
InvalidFileHeader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this unused? ditto for few below
Related
enum LogMsg
with protobuf #7988re_protos
dependencies #8381What
This PR introduces
Serializer::Protobuf
tore_log_encoding
, and inverts the dependency graph ofre_protos
, which no longer depends on otherre_*
crates. This meant the conversion impls from protobuf types to rerun types and back had to be moved into their respective crates. For example,From<StoreId> for
re_protos::common::v0::StoreIdis now in
re_log_types`.When encoding a file using this serializer, the data is encoded using a combination of:
The stream-level protocol has changed only a bit, because compression is no longer done for all messages in the stream, and only the contents of
ArrowMsg
are ever compressed at all. This means theuncompressed_len
andcompressed_len
could be unified to justlen
.The actual layout of the messages has not changed,
LogMsg
is preserved and so are its semantics.The stream of data stored in an example RRD file using this new encoding looks like:
Note that this stream-level protocol is only used for
.rrd
files. On the wire, we will use gRPC, which has its own protocol.In the case of
ArrowMsg
, the schema+chunk is encoded using Arrow IPC into a byte payload, which may additionally be compressed. The compression setting is stored separately for everyArrowMsg
, but per-message compression functionality is not yet exposed throughre_log_encoding
.