Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change split_single_document to work on STDIN & STDOUT #31

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jelmervdl
Copy link
Contributor

@jelmervdl jelmervdl commented Jun 14, 2021

Single document splitting just directly goes from Perl's STDIN to STDOUT now. In multidoc mode I locally override STDIN and STDOUT to point to variables.

I still buffer a single document in multidoc mode because I didn't see an easy way to stream base64 without re-implementing base64-encode in Perl. Piping it through base64 with open() would work, but that would entail forking for every document.

Fixes #30. Also fixes a warning printed by $text = $text.$words[$i] when dealing with an empty line in -k mode.

In b64 mode it still buffers the output and encodes it in one go.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sentence splitter uses unbounded memory in -k mode
1 participant