-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new SuffixParser to handle UTF-8 chracters with more than one byte #82
Conversation
Machine m = new Machine(); | ||
String rule = "{\n" + | ||
" \"status\": {\n" + | ||
" \"weatherText\": [{\"suffix\": \"雨\"}]\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth having a suffix with more than one multi-byte character?
" }\n" + | ||
"}"; | ||
m.addRule("r1", rule); | ||
List<String> matchRules = m.rulesForJSONEvent(eventStr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I have been doing recently is added rulesForJSONEvent tests to ACMachineTest and adding rulesForEvent tests (otherwise identical) to MachineTest. I think that is the intended difference between the two test classes? So change this and add a version to ACMachineTest? Of course, we should probably think through a path forward to stop duplicating all tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't thought about it. Let me follow it as well. I filed #83 to keep track of duplicate effort.
Issue #, if available: #73
Description of changes:
Add a new suffix parser that reverses utf-8 characters correctly.
Before we would convert
"雨
to[34, -23, -101, -88]
whileaddSuffixMatch(...)
in ByteMatchine expected the the value to be[34, -88, -101, -23]
.In this change, we're changing how we build
InputCharacter[]
within the default parser by introducing SuffixParser for the current "reverse and then match backwards" quirky behaviour within suffix and anything-bit-suffix matcher.I decided against switching to a wildcard based pattern like
*雨
within suffix becauseNo issues detected within benchmarks.
Benchmark / Performance (for source code changes):
Before
After
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.