-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch Allocator to act more like an owned arena #467
Comments
This would eliminate the main source of A safety vs performance analysis here with benchmark numbers would probably be interesting to much of the broader Rust community. |
It would be relatively straightforward to change the interface so that @Joeoc2001 would such a change on its own help in your use cases at all? |
I'm not sure that'd solve my use case specifically, unfortunately. The issue with WebAssembly allocators with the current API design of this crate isn't just that the provided module allocators can fail to allocate, but that ultimately the allocator is just arbitrarily instructions provided by the loaded module meaning the module (and host) may do anything, including moving or remapping the module memory in host memory, making the current pointers implementation UB for a malicious, buggy or unexpected allocator implementation. |
If the host can just drop the memory out from under your feet (even if you're holding a mutable reference to it) that doesn't sound like a stable place to perform complex write operations anyway. It might seriously be faster to copy segments directly into the wasm memory instead of having to do at least 3 redirects every time you want to write something somewhere. In fact, how would this work given that the arena stores pointers to the start of each segment? Are we implying the arena doesn't actually store segments anymore? So now it's dynamic, which makes it actually 4 redirects.
Now you could coalesce a lot of this I suppose and redefine the allocator trait to look something like pub trait Allocator {
fn alloc(&self, amount: u32) -> (SegmentId, u32);
fn write_bit(&self, segment: SegmentId, slot: u64, value: bool);
fn write_u8(&self, segment: SegmentId, slot: u64, value: u8);
fn write_16(&self, segment: SegmentId, slot: u64, value: u16);
fn write_u32(&self, segment: SegmentId, slot: u64, value: u32);
fn write_u64(&self, segment: SegmentId, slot: u64, value: u64);
} But once again, every single time you write anywhere you'll hit a redirect. The compiler most likely won't be able to optimize writes in any way. |
I think there's an important distinction between 'can' and 'will' here - the module
Potentially, but probably not for most WebAssembly host implementations if they can guarantee that the instance memory is contiguous in host memory. Then it'd just be a double offset, where the Arena stores offsets into the instance memory giving the starts of segments instead of the raw pointers. Also, the segment locations could only change at the point of calling My ideal trait would look something like this: pub trait Allocator {
type Segment;
fn alloc(&mut self, amount: u32) -> (Self::SegmentId, u32);
fn write_bit(&mut self, segment: Self::Segment, slot: u64, value: bool);
fn write_u8(&mut self, segment: Self::Segment, slot: u64, value: u8);
fn write_16(&mut self, segment: Self::Segment, slot: u64, value: u16);
fn write_u32(&mut self, segment: Self::Segment, slot: u64, value: u32);
fn write_u64(&mut self, segment: Self::Segment, slot: u64, value: u64);
} Then the current Allocator implementations could implement |
This would require every capnp builder type to be generic over the Allocator itself, which would inherently cause every downstream user type to be generic over the Allocator as well. It doesn't really lend itself to ease of use. |
That is something I've considered before, and similarly on the read side having every capnproto-rust/capnp/Cargo.toml Line 33 in 3180a44
capnproto-rust/capnp/Cargo.toml Line 42 in 3180a44
because that information could live in the traits instead.
Yep, that's that cost. My guess is that the cost outweighs the benefits. But still-- it could be interesting to investigate more deeply to get a better sense of just what it would look like. |
The bug described in #484 is maybe tangentially related to the above discussion. The bug would have been avoided if |
Currently, the
Allocator
of amessage::Builder
functions as an API for some kind of Malloc/Free. This is fine when the message to be encoded resides in host memory that we have full control over, but for some use cases this API is overly restrictive.Specifically for my use case, Capn' Proto should be perfect for sharing data with a resident WebAssembly instance, as the data does not need to be copied and can simply be encoded directly into the WebAssembly instance's memory. However doing so requires the blocks of memory in the instance to be allocated by an exported function by the module, which may be fallible, and which requires a mutable borrow over the host memory to be invoked, invalidating Rust's aliasing rules on the mutable pointer returned by the allocator.
An alternative, proposed partially by #78, would be to have the allocator and relevant sub-builders deal with slices instead of raw pointers. Further to this discussion, I would propose that each builder keep integer offsets into the segments in the builder, and have them borrow the segments only for the period in which they are writing the data. This would allow the allocation and deallocation methods to have mutable access to all of the segments, without invalidating any existing pointers if moving the underlying buffers (as may be the case if a WebAssembly instance opted to grow its memory on
alloc
invocation).Performance
Based on zero testing, I would guess that the borrowing from the owned allocator would be mostly inlined away, aside from the one extra level of indirection added by the offset+slice combo. I'd expect most CPUs to be able to deal with this fairly smoothly since the pre-fetcher should notice that the underlying segment is cohesive, but benchmarks would be required.
Backwards compatibility
Since the Allocator already acts like it owns the data allocated, a trait covering temporary borrows of slices only when written (as described above) would be a more general case. Therefore a new trait could be introduced, and an auto-implementation from Allocators to the new trait could be provided, meaning no existing code would break.
Are there any thoughts on this? For now, I intend to encode in a host buffer and then copy the segments into the instance memory, but this obviously isn't ideal when the underlying protocol is designed to be 0-copy.
The text was updated successfully, but these errors were encountered: