-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement optimal parenthesis for ternary operator + add tests #289
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current TranslatorSpec tests lack of support for NimCompiler
and RustCompiler
. I'll add them to existing tests in #290, but you should add them to the new tests.
You can use https://play.nim-lang.org and https://play.rust-lang.org to check the code.
)) | ||
|
||
// When evaluated, must be 12 + 34 = 46 | ||
everybodyExcept("2 < 3 ? 12 + 34 : 45 + 67", "2 < 3 ? 12 + 34 : 45 + 67", ResultMap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, it is better to explicitly test all possible expressions in arms, not only +
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I follow what you're suggesting?
What extra value it will bring or rather what type of situation you anticipate to catch with that?
Also, what limit on "all possible expressions" you want to put in here? "All possible" literally is infinite set, we won't be able to test infinite permutations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What extra value it will bring or rather what type of situation you anticipate to catch with that?
Different operators have different priority, so in some cases we need to wrap them in parenthesis. I cannot be sure that in all languages it is handled correctly until it is not tested.
For example, I'm working on bringing Rust tests to TranslatorSpec and found a bug where .to_s
on expression does not generate parenthesis. In all languages them was generated, except Rust. Such mistakes can arise in the future, so I think it is better to protect from them preventively.
"All possible" literally is infinite set, we won't be able to test infinite permutations.
I don't think so. KS expression language has a limited set of operators, just test them all here. +
, -
, *
, /
(float and integer), string concatenation, boolean operators, etc., all from Ast.operators
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood your correctly, what you're suggesting is effectively testing all permutations of all pairs:
- ternary operator, int addition
- ternary operator, int subtraction
- ternary operator, int multiplication
- etc
I don't think it's a great strategy to move forward with, because of two reasons:
- It's very bulky: we're talking about ~20-30 more tests. It will be very hard to maintain and keep updated manually.
- It doesn't really provide 100% coverage you're looking to, and it quickly gets us into exponential explosion:
- For instance, where do you put the operation? There are three places to put it in ternary: "condition", "ifTrue", "ifFalse", so even for one addition you'll be looking into three tests:
1 + 2 ? 3 : 4
1 ? 2 + 3 : 4
1 ? 2 : 3 + 4
- But it's not enough, we also need to test all permutations in arguments (thus 9 tests total):
1 + 2 ? 3 + 4 : 5
1 + 2 ? 3 : 4 + 5
1 + 2 ? 3 + 4 : 5 + 6
1 ? 3 + 4 : 5
1 ? 3 : 4 + 5
1 ? 3 + 4 : 5 + 6
- If we repeat only that for 30 operators, it will be 9 * 30 = 270 tests.
- But that's not enough, we also need to test combinations of additions + multiplications or addition + bit shift, e.g.
1 << 2 ? 3 + 4 : 5 * 6
. Simple binomial coefficient calculation of C(30)(3) = 30! / (3! * 27!) gives us 4060 combinations. - But that is also not enough, as we likely need to test combinations of combinations, e.g. more than one operator in "condition", "ifTrue" and "ifFalse" parts.
- For instance, where do you put the operation? There are three places to put it in ternary: "condition", "ifTrue", "ifFalse", so even for one addition you'll be looking into three tests:
I don't think it's practical approach to follow. Instead, what I propose is to be aware of the fact that in all languages we track this operator is lower precedence than anything and test only corner cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just 1 x 2 ? 3 x 4 : 5 x 6
is enough (where x
is operator). Each arm of ternary operator is independent of each other, so it does not require to test all possible variations that differs only in arms. The same is applied to arm and condition. This is only linear from count of operators and will provide 100% coverage. It is not much, actually, just additional 20-30 everybody/everybodyExcept
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What extra value it will bring or rather what type of situation you anticipate to catch with that?
Different operators have different priority, so in some cases we need to wrap them in parenthesis. I cannot be sure that in all languages it is handled correctly until it is not tested.
My 2 cents: I don't think it makes sense to go super crazy with these manual TranslatorSpec
cases - trying to cover all permutations by hand is not only tedious, but also unmaintainable and in the end it doesn't test the thing it should, i.e. whether all translated expressions will give the (same) expected result when evaluated in each language (to really be sure of that, you would have to manually take each translated expression and paste it into some interactive shell in every target language, which is insane).
In my view, these tests should only be complementary to normal .ksy + .kst tests, not replace them (that would be a step back). They can provide some value that .ksy/.kst tests can't - they can be used to catch that we have unnecessarily too many parentheses, which makes the generated code ugly, but doesn't affect functionality. But IMHO they shouldn't be meant to catch too few parentheses - .ksy/.kst tests are much better suited for this, because then there's always a set of values for which the actual evaluation of the translated expression in the target language will result in a different value than expected, which proves that we really have a problem, and we'll instantly know in what target languages exactly.
I wrote a very immature proof-of-concept Python script that demonstrates what automatic generation of such .ksy/.kst pair could look like: https://gist.github.com/generalmimon/a24d65d1890e73b4036fdd48e7d6772f
To give you an idea, the generated files look something like this:
# expr_gen.ksy
meta:
id: expr_gen
enums:
fruit:
9: apple
6: banana
instances:
l0_r0_lsig0_rsig0:
value: (0 + 7) + 5
l0_r0_lsig0_rsig1:
value: (0 + 7) + 13.7
l0_r0_lsig1_rsig2:
value: (3 + 4.4) + 13
l0_r0_lsig1_rsig3:
value: (3 + 4.4) + 1.1
# ...
l0_r4_lsig0_rsig0:
value: (3 + 12) % 6
l0_r5_lsig0_rsig0:
value: (11 + 13) | 8
l0_r6_lsig0_rsig0:
value: (7 + 13) ^ 14
l0_r7_lsig0_rsig0:
value: (6 + 1) & 12
# ...
l11_r0_lsig3_rsig0:
value: fruit::banana.to_i + 7
l11_r0_lsig3_rsig1:
value: fruit::banana.to_i + 5.3
l11_r0_lsig4_rsig0:
value: false.to_i + 1
l11_r0_lsig4_rsig1:
value: false.to_i + 11.1
# ...
l20_r10_lsig5_rsig1:
value: '(true ? [50, 48, 52] : [56, 57, 51, 48]).to_s(''ASCII'')'
# ...
# expr_gen.kst
id: expr_gen
data: enum_negative.bin
asserts:
- actual: l0_r0_lsig0_rsig0
expected: '12'
- actual: l0_r0_lsig0_rsig1
expected: '20.7'
- actual: l0_r0_lsig1_rsig2
expected: '20.4'
- actual: l0_r0_lsig1_rsig3
expected: '8.5'
# ...
- actual: l0_r4_lsig0_rsig0
expected: '3'
- actual: l0_r5_lsig0_rsig0
expected: '24'
- actual: l0_r6_lsig0_rsig0
expected: '26'
- actual: l0_r7_lsig0_rsig0
expected: '4'
# ...
- actual: l11_r0_lsig3_rsig0
expected: '13'
- actual: l11_r0_lsig3_rsig1
expected: '11.3'
- actual: l11_r0_lsig4_rsig0
expected: '1'
- actual: l11_r0_lsig4_rsig1
expected: '11.1'
# ...
- actual: l20_r10_lsig5_rsig1
expected: '''204'''
# ...
It's not a silver bullet (and the script needs some serious refactoring, comparison operators need to be added, the right-side grouping like X left_op (Y right_op Z)
should be tested too etc.), but I believe it's a much better approach than trying to cover all of these (currently 531 value instances) in TranslatorSpec
.
And I've already found some bugs with this that I shared in #277 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with KST is that they does not run in PRs and I do not know how to run them locally. So it is hard to change code which potentially can break some other languages when I'm working on a language that's interest for me. I do not know many Kaitai languages and I wouldn't be happy that I should blindly change their or common code.
If you could write a documentation how to do that on Windows (and even better also on Windows 7) and setup corresponding checks on CI, it would be great simplification!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with KST is that they does not run in PRs and I do not know how to run them locally.
Happy to help you with that (I'm on Windows too), but not here - are you on Gitter? We can talk privately/publicly as you wish.
If you could write a documentation how to do that on Windows (and even better also on Windows 7) and setup corresponding checks on CI
Both things would be nice, but as with everything, the problem is that someone has to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@generalmimon Love the idea, that's actually a very good separation line that you've drawn here!
Would you mind creating a PR with that script to tests repo?
@@ -26,6 +26,8 @@ trait CommonOps extends AbstractTranslator { | |||
Ast.operator.BitOr -> 80, | |||
) | |||
|
|||
val IF_EXP_PRECEDENCE = 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, this field needs documentation, in what cases it should be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, let me add that.
@@ -37,7 +37,7 @@ class NimTranslator(provider: TypeProvider, importList: ImportList) extends Base | |||
case Identifier.ROOT => s"${ksToNim(provider.determineType(Identifier.ROOT))}(this.${doName(s)})" | |||
case _ => s"this.${doName(s)}" | |||
} | |||
override def doIfExp(condition: expr, ifTrue: expr, ifFalse: expr): String = | |||
override def doIfExp(condition: expr, ifTrue: expr, ifFalse: expr, extPrec: Int): String = | |||
s"(if ${translate(condition)}: ${translate(ifTrue)} else: ${translate(ifFalse)})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to rewrite that part too. You can check result at https://play.nim-lang.org
"if " + translate(condition) + | ||
" { " + translate(ifTrue) + " } else { " + | ||
translate(ifFalse) + "}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to rewrite this part too. You can check result at https://play.rust-lang.org
GoCompiler -> | ||
"""var tmp1 int8; | ||
|if (2 < 3) { | ||
| tmp1 = 123 | ||
|} else { | ||
| tmp1 = 456 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
456
clearly doesn't fit into int8
, so I wouldn't approve this as the "expected" correct translation just to make the tests green and provide a false sense of "everything works fine" when it doesn't.
This doesn't even compile, see https://go.dev/play/p/tZ81LHWP5bM:
./prog.go:10:10: cannot use 456 (untyped int constant) as int8 value in assignment (overflows)
6 var tmp1 int8
7 if 2 < 3 {
8 tmp1 = 123
9 } else {
* 10 tmp1 = 456
11 }
12 fmt.Println(tmp1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually perfect catch, thanks for noticing it!
I don't think there's anything wrong with the expression, but actually we should fix GoTranslator implementation, so it should use more generic CalcIntType.
GoCompiler -> | ||
"""var tmp1 interface{}; | ||
|if (true) { | ||
| tmp1 = this.Foo | ||
|} else { | ||
| tmp1 = this.Bar | ||
|} | ||
|tmp1._io""".stripMargin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too can hardly be approved as correct. Accessing _io
on interface{}
doesn't make sense. See https://go.dev/tour/methods/14:
An empty interface may hold values of any type. (Every type implements at least zero methods.)
The code snippet on the linked page also demonstrates that interface{}
can be anything, e.g. an int
or a string
:
func main() {
var i interface{}
describe(i) // => (<nil>, <nil>)
i = 42
describe(i) // => (42, int)
i = "hello"
describe(i) // => (hello, string)
}
func describe(i interface{}) {
fmt.Printf("(%v, %T)\n", i, i)
}
To implement this correctly in Go, interface{}
will presumably have to be replaced with a concrete "KaitaiStruct" interface, which will define a method Io()
(so tmp1._io
will become tmp1.Io()
), or maybe a different name than Io
should be used to avoid name conflicts with user-defined fields, but it must start with an uppercase letter so that it's accessible from outside the kaitai_struct_go_runtime
package, see https://go.dev/tour/basics/3 (perhaps M_Io()
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this too, I was too naive to suppose it will just work.
Looks like we indeed just don't have that functionality at all. As far as I can tell, there's no inheritance in Go, thus we can't have common ancestor KaitaiStruct which will implement storage of a stream, but we indeed can have a common interface promising that every KaitaiStruct type we define has to implement a method that will return what we can use for ._io
.
Let me actually give it a try separately.
The hardest thing here, as you've pointed out, is to decide on a naming. I really don't like M_Io
— it's kind of copying ugly twist we've done in C# on not-even-a-standard-but-convention from a totally different language (C++), with it's m_foo
prefixing.
But, generally speaking, given that our Golang name rendition is UpperCamelCase, which guarantees no underscores, I believe putting in the underscore is a way to go.
I wonder what do you folks think about Kaitai_IO()
? It's a bit verbose, but generally puts a strong emphasis on ownership of the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm concerned, both M_Io
and Kaitai_IO
work the same (both satisfy the Go capital letter requirement, and that's the only thing that matters in the end), I don't have any strong feelings here. Perhaps it indeed doesn't hurt to have the very descriptive Kaitai_
prefix, at least everyone sees that it's a special implicit field coming from the Kaitai_
"namespace", and it pretty much can't clash with anything ever.
More interesting to me is another naming problem :), which is whether to pick Io
or IO
. Personally, as my heart gravitates more towards Rust than Go (and I believe that XmlHttpRequest
is the only spelling that makes sense, the actual name XMLHttpRequest
feels like it can't pick one consistent style of writing acronyms even within itself), Io
feels more natural to me, but since it stands for "I/O", which can be considered an acronym, it seems that the Go convention is to keep all letters in an initialism the same case, which suggests IO
(see https://google.github.io/styleguide/go/decisions#initialisms):
Words in names that are initialisms or acronyms (e.g.,
URL
andNATO
) should have the same case.URL
should appear asURL
orurl
(as inurlPony
, orURLPony
), never asUrl
. This also applies to ID when it is short for “identifier”; writeappID
instead ofappId
.
This makes me die inside a bit, but as they say "when in Rome, do as the Romans do". So IO
is probably indeed more idiomatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interesting bit here is that we actually already generate some "public" names for these as per this code. This generates _IO
and I'm pretty sure it just doesn't work.
Personally, I don't have a strong preference — very generally, sticking to prescribed recommendations per language makes sense, especially if they exist and are clear.
Another piece of partially contributing to resolving kaitai-io/kaitai_struct#69 — this one handles ternary operator:
doIfExp()
implementation (+addextPrec
propagation into it) into CommonOps, which does a good job at parenthesizingcondition ? ifTrue ? ifFalse
rendering originating from C.