Encode `LogMsg` using protobuf #8347

jprochazk · 2024-12-06T17:44:47Z

What

This PR introduces Serializer::Protobuf to re_log_encoding, and inverts the dependency graph of re_protos, which no longer depends on other re_* crates. This meant the conversion impls from protobuf types to rerun types and back had to be moved into their respective crates. For example, From<StoreId> for re_protos::common::v0::StoreIdis now inre_log_types`.

When encoding a file using this serializer, the data is encoded using a combination of:

A custom stream-level protocol
Protocol buffers
Arrow IPC

The stream-level protocol has changed only a bit, because compression is no longer done for all messages in the stream, and only the contents of ArrowMsg are ever compressed at all. This means the uncompressed_len and compressed_len could be unified to just len.

The actual layout of the messages has not changed, LogMsg is preserved and so are its semantics.

The stream of data stored in an example RRD file using this new encoding looks like:

FileHeader { b"RRIO", version, compression, serializer }     ;; 10 bytes
MessageHeader { kind, len }                                  ;; 8 bytes
SetStoreInfo { application_id, store_id, store_source, ... } ;; len bytes
MessageHeader { kind, len }                                  ;; 8 bytes
ArrowMsg { store_id, arrow_msg }                             ;; len bytes
MessageHeader { kind: End, len: 0 }                          ;; 8 bytes

Note that this stream-level protocol is only used for .rrd files. On the wire, we will use gRPC, which has its own protocol.

In the case of ArrowMsg, the schema+chunk is encoded using Arrow IPC into a byte payload, which may additionally be compressed. The compression setting is stored separately for every ArrowMsg, but per-message compression functionality is not yet exposed through re_log_encoding.

github-actions · 2024-12-06T17:54:17Z

Latest documentation preview deployed successfully.

Result	Commit	Link
✅	`0d3ec40`	https://landing-hx1w29tq8-rerun.vercel.app/docs

^{Note: This comment is updated whenever you push a commit.}

github-actions · 2024-12-06T17:55:33Z

Web viewer built successfully. If applicable, you should also test it:

I have tested the web viewer

Result	Commit	Link
✅	3d405ff	https://rerun.io/viewer/pr/8347

^{Note: This comment is updated whenever you push a commit.}

zehiko

thanks for untangling these dependency issues!

generally looks ok to me, few comments and a question about more clear re-use of common arrow seriailzation logic to make re_log_encoding crate clearer.

crates/build/re_protos_builder/src/bin/build_re_remote_store_types.rs

crates/store/re_chunk_store/src/lib.rs

crates/store/re_log_encoding/src/codec/mod.rs

crates/store/re_log_encoding/src/codec/wire.rs

zehiko · 2024-12-09T07:46:02Z

crates/store/re_log_encoding/src/protobuf/decoder.rs

+
+/// Helper function that deserializes raw bytes into arrow schema and record batch
+/// using Arrow IPC format.
+fn read_arrow_from_bytes<R: std::io::Read>(


why do we have this duplication?

could common arrow serialialization logic be moved somewhere and used between "wire" and "file" protocol?

we also now have codec, decode, encoder and then in protobuf we also have encoder.rs and decoder.rs, not sure if this is temporary, but I wonder if it could be more obvious what is common logic and then what is specific to our file stream and our grpc stream.

re: decoder encoder being in two places, that's because we still have to continue supporting msgpack for the time being. So all the new protobuf encoding stuff is under protobuf. But fundamentally we have 3 layers, our custom stream-level protocol (crate::decoder, crate::encoder), the message encoding (crate::protobuf, with some leftovers in crate::decoder and crate::encoder due to msgpack), and the arrow serialization (should be in crate::arrow).

re: codec, it's there separately because that's the format used by gRPC, with its own MessageHeader. the existing encoder and decoder will eventually only be used for files (either local or served over HTTP), so I thought it'd be better to keep them separate.

The naming could be clearer, I will reshuffle the modules a bit.

yeah, I think just some naming reshuffling might help to make things more obvious. Perhaps even worth splitting grpc codec into encoder and decoder just for consistency...

zehiko · 2024-12-09T07:50:30Z

crates/store/re_log_encoding/src/codec/mod.rs

+    UnsupportedEncoding,
+
+    #[error("Invalid file header")]
+    InvalidFileHeader,


is this unused? ditto for few below

crates/store/re_protos/proto/rerun/v0/log_msg.proto

crates/store/re_chunk_store/src/protobuf_conversions.rs

crates/store/re_log_encoding/src/decoder/mod.rs

jprochazk added 3 commits December 5, 2024 12:42

fix typo

bdb22ed

temp

8b51ba8

wip

9b4a9a3

jprochazk added include in changelog 🪵 Log & send APIs Affects the user-facing API for all languages dataplatform Rerun Data Platform integration labels Dec 6, 2024

jprochazk added 2 commits December 6, 2024 18:48

Merge branch 'main' into jan/recording-protobuf

317e9c0

fix after merge

8209db4

jprochazk added 3 commits December 6, 2024 18:57

remove unused dep

34dcd4c

exclude re_grpc_client/address links

ba4cfbb

cargo fmt

691fb46

jprochazk mentioned this pull request Dec 6, 2024

Use gRPC everywhere (over the wire) #8349

Open

jprochazk added 3 commits December 6, 2024 19:36

fix lints

a2e58fb

rm dead comment

f22cb03

add todo

b76cf9c

jprochazk marked this pull request as ready for review December 6, 2024 18:40

jprochazk added 3 commits December 6, 2024 19:51

gate behind feature

f91737b

fix check

7a720f8

fix more lints

a2d7ea4

zehiko reviewed Dec 9, 2024

View reviewed changes

jprochazk marked this pull request as draft December 9, 2024 10:31

jprochazk added 3 commits December 10, 2024 11:41

Merge branch 'main' into jan/recording-protobuf

4c18bda

update lockfile

0d3ec40

Merge branch 'main' into jan/recording-protobuf

3d405ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode `LogMsg` using protobuf #8347

Encode `LogMsg` using protobuf #8347

jprochazk commented Dec 6, 2024 •

edited

Loading

github-actions bot commented Dec 6, 2024 •

edited

Loading

github-actions bot commented Dec 6, 2024 •

edited

Loading

zehiko left a comment

zehiko Dec 9, 2024

jprochazk Dec 9, 2024 •

edited

Loading

zehiko Dec 9, 2024

zehiko Dec 9, 2024

Encode LogMsg using protobuf #8347

Are you sure you want to change the base?

Encode LogMsg using protobuf #8347

Conversation

jprochazk commented Dec 6, 2024 • edited Loading

Related

What

github-actions bot commented Dec 6, 2024 • edited Loading

github-actions bot commented Dec 6, 2024 • edited Loading

zehiko left a comment

Choose a reason for hiding this comment

zehiko Dec 9, 2024

Choose a reason for hiding this comment

jprochazk Dec 9, 2024 • edited Loading

Choose a reason for hiding this comment

zehiko Dec 9, 2024

Choose a reason for hiding this comment

zehiko Dec 9, 2024

Choose a reason for hiding this comment

Encode `LogMsg` using protobuf #8347

Encode `LogMsg` using protobuf #8347

jprochazk commented Dec 6, 2024 •

edited

Loading

github-actions bot commented Dec 6, 2024 •

edited

Loading

github-actions bot commented Dec 6, 2024 •

edited

Loading

jprochazk Dec 9, 2024 •

edited

Loading