-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Text Extractor can't handle empty strings #9
Comments
Thanks, @Michael-F-Bryan, for finding it out. I will correct it. |
If it helps, here is what we log when the panic occurs: {
"level": "ERROR",
"message": "panicked at 'Start index: 0 is greater than or equal to end index: 0', /Users/mohit/.cargo/git/checkouts/proc-blocks-49e4d24512f5d324/1e53d6f/text_extractor/src/lib.rs:44:13",
"target": "hotg_runicos_base_wasm",
"module_path": "hotg_runicos_base_wasm",
"file": "/Users/mohit/.cargo/registry/src/github.com-1ecc6299db9ec823/hotg-runicos-base-wasm-0.10.0/src/lib.rs",
"line": 52
} |
I believe @Mohit0928 created #10 to resolve this, but we're running into the panics in the I think we need to rewrite these lines in a way that lets us handle zero, one, or multiple sentences as an output. Possibly using |
@Michael-F-Bryan, When I ran the Bert rune with sentence-1 empty. I'm getting below as error message.
Aren't we supposed to get this? Or are we expecting something different? |
No, that's not what we want to do. The underlying problem is that the model was unable to generate an answer for the provided inputs so the text extractor crashed the Rune (the logits tensor passed to the text extractor was all zeroes). Your "sentence 1 is empty" assertion is in the tokenizer so it won't help prevent this situation. Another thing to keep in mind is the text passed to a tokenizer is controlled by the user. in general, you should try to handle bad user input graciously (e.g. by setting the tokenizer output to all zeroes) so unexpected input won't crash the runtime. There's a pretty common saying for this:
Also, assertions are mainly be used for unrecoverable errors that were most probably caused by a programming bug (e.g. non-finite floats due to a divide by zero). For errors you can expect to see frequently like empty input, we should try to return a sensible default value like an empty string. |
I was playing around with the web runtime when I noticed passing an empty string to the Bert rune will trigger this assertion.
In this case, it looks like
end_index
andstart_index
are both zero.I'm guessing we'll also see this happen when the model is unable to figure out the answer to your question.
The text was updated successfully, but these errors were encountered: