You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was debugging browsermt/bergamot-translator#273 when I noticed that xh_scanner does test for MAX_TOKEN_SIZE everywhere it adds characters to buffer, but does not call push_back(c) if the limit is hit. As a result, if any of the for-loops that add characters to its internal buffers do hit that limit, a character may be lost.
I think this only affects CDATA sections, comments, attribute values and tag names. So for the main use case of warc2text there is little impact for this bug.
Edit: Thinking about it, it would only affect the tag filters.
The text was updated successfully, but these errors were encountered:
I was debugging browsermt/bergamot-translator#273 when I noticed that xh_scanner does test for MAX_TOKEN_SIZE everywhere it adds characters to buffer, but does not call
push_back(c)
if the limit is hit. As a result, if any of the for-loops that add characters to its internal buffers do hit that limit, a character may be lost.I think this only affects CDATA sections, comments, attribute values and tag names. So for the main use case of warc2text there is little impact for this bug.
Edit: Thinking about it, it would only affect the tag filters.
The text was updated successfully, but these errors were encountered: