-
-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: add record-oriented IO functions #41
Conversation
I blame any mistakes on Japanese whisky. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I left a few comments that we should address before merging. But overall this looks great.
In terms of whether we want these APIs though... Could you say more about your use case for them? I grant that they are a nice convenience, but still, always good to hear about use cases.
/// assert_eq!(records[2], "dolor".as_bytes()); | ||
/// # Ok(()) }; example().unwrap() | ||
/// ``` | ||
fn for_byte_terminated_record<F>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the inconsistency in naming? Seems like this should be for_byte_record
given that you have byte_records
above. It's also shorter. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was actually the last bit I added - I was leaning towards being more explicit, but I guess I was a bit sick of it by the time I got to the iterator :)
@@ -208,6 +321,20 @@ pub struct ByteLines<B> { | |||
buf: B, | |||
} | |||
|
|||
/// An iterator over records from an instance of | |||
/// [`std::io::BufRead`](https://doc.rust-lang.org/std/io/trait.BufRead.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this add a sentence explaining what a byte record is? e.g., "A byte record is any sequence of bytes terminated by a particular byte chosen by the caller. For example, NUL separate byte strings are said to be NUL-terminated byte records."
if chunk.last() == Some(&terminator) { | ||
for_each_record(&chunk[0..chunk.len() - 1]) | ||
} else { | ||
for_each_record(&chunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to keep things consistent, maybe rename trim_slice
to trim_line_slice
and add trim_record_slice
so that it can be used here. Or at least, it would be nice to call for_each_record
in only one place in the source.
Regarding use-case, I mainly see it being used for apps wanting to support This article is what prompted me - she wrote her own custom implementation, lamenting the effort it took. Which is bit of a shame when I've had a previous iteration of this lying on my disk forgotten for the past 6 months. |
I also mentioned similar use cases in #12 (comment) -- I've worked with files that delimited entries with \xff before (as well as \x00). That said, for my case it would have been better on ByteSlice rather than on io (sorry, I missed where this was in my initial comment). |
Note there are iterators included here - they're just thin wrappers around The callback functions simply lend out slices to the buffer, only copying when the buffer does not contain a full record. You can't do this with an iterator, and this can have a substantial impact on performance. |
Yeah, I missed that this was in the io module, and thus the iterator approach involved a good amount of overhead, and edited my comment to remove that bit. |
This mirrors the line iterating methods, but permits the caller to specify an arbitrary terminator. Closes #41
This adds:
byte_records()
iteratorfor_byte_terminated_record_with_terminator()
for_byte_terminated_record()
Which are basically parametrised versions of the existing functions for reading lines.