Implement optimal parenthesis for ternary operator + add tests #289

GreyCat · 2024-03-26T15:35:20Z

Another piece of partially contributing to resolving kaitai-io/kaitai_struct#69 — this one handles ternary operator:

Add common doIfExp() implementation (+add extPrec propagation into it) into CommonOps, which does a good job at parenthesizing
- This provides default implementation which is ok for most languages following popular condition ? ifTrue ? ifFalse rendering originating from C.
- Remove individual copy-pasted implementations in specific languages where we can rely on common one.
- Adjust Python rendering to follow the same correct strategy
- Don't touch special ad hoc implementations (Lua, Go)
Add/fix tests proving that it works in many contexts

Mingun

Current TranslatorSpec tests lack of support for NimCompiler and RustCompiler. I'll add them to existing tests in #290, but you should add them to the new tests.

You can use https://play.nim-lang.org and https://play.rust-lang.org to check the code.

Mingun · 2024-03-28T04:59:52Z

jvm/src/test/scala/io/kaitai/struct/translators/TranslatorSpec.scala

+    ))
+
+    // When evaluated, must be 12 + 34 = 46
+    everybodyExcept("2 < 3 ? 12 + 34 : 45 + 67", "2 < 3 ? 12 + 34 : 45 + 67", ResultMap(


I think, it is better to explicitly test all possible expressions in arms, not only +.

Not sure if I follow what you're suggesting?

What extra value it will bring or rather what type of situation you anticipate to catch with that?

Also, what limit on "all possible expressions" you want to put in here? "All possible" literally is infinite set, we won't be able to test infinite permutations.

What extra value it will bring or rather what type of situation you anticipate to catch with that?

Different operators have different priority, so in some cases we need to wrap them in parenthesis. I cannot be sure that in all languages it is handled correctly until it is not tested.

For example, I'm working on bringing Rust tests to TranslatorSpec and found a bug where .to_s on expression does not generate parenthesis. In all languages them was generated, except Rust. Such mistakes can arise in the future, so I think it is better to protect from them preventively.

"All possible" literally is infinite set, we won't be able to test infinite permutations.

I don't think so. KS expression language has a limited set of operators, just test them all here. +, -, *, / (float and integer), string concatenation, boolean operators, etc., all from Ast.operators.

If I understood your correctly, what you're suggesting is effectively testing all permutations of all pairs:

ternary operator, int addition

ternary operator, int subtraction

ternary operator, int multiplication

etc

I don't think it's a great strategy to move forward with, because of two reasons:

It's very bulky: we're talking about ~20-30 more tests. It will be very hard to maintain and keep updated manually.

It doesn't really provide 100% coverage you're looking to, and it quickly gets us into exponential explosion:

For instance, where do you put the operation? There are three places to put it in ternary: "condition", "ifTrue", "ifFalse", so even for one addition you'll be looking into three tests:

1 + 2 ? 3 : 4

1 ? 2 + 3 : 4

1 ? 2 : 3 + 4

But it's not enough, we also need to test all permutations in arguments (thus 9 tests total):

1 + 2 ? 3 + 4 : 5

1 + 2 ? 3 : 4 + 5

1 + 2 ? 3 + 4 : 5 + 6

1 ? 3 + 4 : 5

1 ? 3 : 4 + 5

1 ? 3 + 4 : 5 + 6

If we repeat only that for 30 operators, it will be 9 * 30 = 270 tests.

But that's not enough, we also need to test combinations of additions + multiplications or addition + bit shift, e.g. 1 << 2 ? 3 + 4 : 5 * 6. Simple binomial coefficient calculation of C(30)(3) = 30! / (3! * 27!) gives us 4060 combinations.

But that is also not enough, as we likely need to test combinations of combinations, e.g. more than one operator in "condition", "ifTrue" and "ifFalse" parts.

I don't think it's practical approach to follow. Instead, what I propose is to be aware of the fact that in all languages we track this operator is lower precedence than anything and test only corner cases.

No, just 1 x 2 ? 3 x 4 : 5 x 6 is enough (where x is operator). Each arm of ternary operator is independent of each other, so it does not require to test all possible variations that differs only in arms. The same is applied to arm and condition. This is only linear from count of operators and will provide 100% coverage. It is not much, actually, just additional 20-30 everybody/everybodyExcept calls.

What extra value it will bring or rather what type of situation you anticipate to catch with that?

Different operators have different priority, so in some cases we need to wrap them in parenthesis. I cannot be sure that in all languages it is handled correctly until it is not tested.

My 2 cents: I don't think it makes sense to go super crazy with these manual TranslatorSpec cases - trying to cover all permutations by hand is not only tedious, but also unmaintainable and in the end it doesn't test the thing it should, i.e. whether all translated expressions will give the (same) expected result when evaluated in each language (to really be sure of that, you would have to manually take each translated expression and paste it into some interactive shell in every target language, which is insane).

In my view, these tests should only be complementary to normal .ksy + .kst tests, not replace them (that would be a step back). They can provide some value that .ksy/.kst tests can't - they can be used to catch that we have unnecessarily too many parentheses, which makes the generated code ugly, but doesn't affect functionality. But IMHO they shouldn't be meant to catch too few parentheses - .ksy/.kst tests are much better suited for this, because then there's always a set of values for which the actual evaluation of the translated expression in the target language will result in a different value than expected, which proves that we really have a problem, and we'll instantly know in what target languages exactly.

I wrote a very immature proof-of-concept Python script that demonstrates what automatic generation of such .ksy/.kst pair could look like: https://gist.github.com/generalmimon/a24d65d1890e73b4036fdd48e7d6772f

To give you an idea, the generated files look something like this:

# expr_gen.ksy meta: id: expr_gen enums: fruit: 9: apple 6: banana instances: l0_r0_lsig0_rsig0: value: (0 + 7) + 5 l0_r0_lsig0_rsig1: value: (0 + 7) + 13.7 l0_r0_lsig1_rsig2: value: (3 + 4.4) + 13 l0_r0_lsig1_rsig3: value: (3 + 4.4) + 1.1 # ... l0_r4_lsig0_rsig0: value: (3 + 12) % 6 l0_r5_lsig0_rsig0: value: (11 + 13) | 8 l0_r6_lsig0_rsig0: value: (7 + 13) ^ 14 l0_r7_lsig0_rsig0: value: (6 + 1) & 12 # ... l11_r0_lsig3_rsig0: value: fruit::banana.to_i + 7 l11_r0_lsig3_rsig1: value: fruit::banana.to_i + 5.3 l11_r0_lsig4_rsig0: value: false.to_i + 1 l11_r0_lsig4_rsig1: value: false.to_i + 11.1 # ... l20_r10_lsig5_rsig1: value: '(true ? [50, 48, 52] : [56, 57, 51, 48]).to_s(''ASCII'')' # ...

# expr_gen.kst id: expr_gen data: enum_negative.bin asserts: - actual: l0_r0_lsig0_rsig0 expected: '12' - actual: l0_r0_lsig0_rsig1 expected: '20.7' - actual: l0_r0_lsig1_rsig2 expected: '20.4' - actual: l0_r0_lsig1_rsig3 expected: '8.5' # ... - actual: l0_r4_lsig0_rsig0 expected: '3' - actual: l0_r5_lsig0_rsig0 expected: '24' - actual: l0_r6_lsig0_rsig0 expected: '26' - actual: l0_r7_lsig0_rsig0 expected: '4' # ... - actual: l11_r0_lsig3_rsig0 expected: '13' - actual: l11_r0_lsig3_rsig1 expected: '11.3' - actual: l11_r0_lsig4_rsig0 expected: '1' - actual: l11_r0_lsig4_rsig1 expected: '11.1' # ... - actual: l20_r10_lsig5_rsig1 expected: '''204''' # ...

It's not a silver bullet (and the script needs some serious refactoring, comparison operators need to be added, the right-side grouping like X left_op (Y right_op Z) should be tested too etc.), but I believe it's a much better approach than trying to cover all of these (currently 531 value instances) in TranslatorSpec.

And I've already found some bugs with this that I shared in #277 (comment).

The problem with KST is that they does not run in PRs and I do not know how to run them locally. So it is hard to change code which potentially can break some other languages when I'm working on a language that's interest for me. I do not know many Kaitai languages and I wouldn't be happy that I should blindly change their or common code.

If you could write a documentation how to do that on Windows (and even better also on Windows 7) and setup corresponding checks on CI, it would be great simplification!

The problem with KST is that they does not run in PRs and I do not know how to run them locally.

Happy to help you with that (I'm on Windows too), but not here - are you on Gitter? We can talk privately/publicly as you wish.

If you could write a documentation how to do that on Windows (and even better also on Windows 7) and setup corresponding checks on CI

Both things would be nice, but as with everything, the problem is that someone has to do it.

@generalmimon Love the idea, that's actually a very good separation line that you've drawn here!

Would you mind creating a PR with that script to tests repo?

Mingun · 2024-03-28T06:31:37Z

shared/src/main/scala/io/kaitai/struct/translators/CommonOps.scala

@@ -26,6 +26,8 @@ trait CommonOps extends AbstractTranslator {
    Ast.operator.BitOr -> 80,
  )

+  val IF_EXP_PRECEDENCE = 30


I think, this field needs documentation, in what cases it should be used

Agree, let me add that.

Mingun · 2024-03-28T06:35:16Z

shared/src/main/scala/io/kaitai/struct/translators/NimTranslator.scala

@@ -37,7 +37,7 @@ class NimTranslator(provider: TypeProvider, importList: ImportList) extends Base
      case Identifier.ROOT => s"${ksToNim(provider.determineType(Identifier.ROOT))}(this.${doName(s)})"
      case _ => s"this.${doName(s)}"
    }
-  override def doIfExp(condition: expr, ifTrue: expr, ifFalse: expr): String =
+  override def doIfExp(condition: expr, ifTrue: expr, ifFalse: expr, extPrec: Int): String =
    s"(if ${translate(condition)}: ${translate(ifTrue)} else: ${translate(ifFalse)})"


Need to rewrite that part too. You can check result at https://play.nim-lang.org

Mingun · 2024-03-28T06:38:27Z

shared/src/main/scala/io/kaitai/struct/translators/RustTranslator.scala

    "if " + translate(condition) +
    	" { " + translate(ifTrue) + " } else { " +
 	translate(ifFalse) + "}"


Need to rewrite this part too. You can check result at https://play.rust-lang.org

generalmimon · 2024-03-28T12:42:26Z

jvm/src/test/scala/io/kaitai/struct/translators/TranslatorSpec.scala

+      GoCompiler ->
+        """var tmp1 int8;
+          |if (2 < 3) {
+          |  tmp1 = 123
+          |} else {
+          |  tmp1 = 456


456 clearly doesn't fit into int8, so I wouldn't approve this as the "expected" correct translation just to make the tests green and provide a false sense of "everything works fine" when it doesn't.

This doesn't even compile, see https://go.dev/play/p/tZ81LHWP5bM:

./prog.go:10:10: cannot use 456 (untyped int constant) as int8 value in assignment (overflows)

6 var tmp1 int8 7 if 2 < 3 { 8 tmp1 = 123 9 } else { * 10 tmp1 = 456 11 } 12 fmt.Println(tmp1)

That's actually perfect catch, thanks for noticing it!

I don't think there's anything wrong with the expression, but actually we should fix GoTranslator implementation, so it should use more generic CalcIntType.

generalmimon · 2024-03-28T12:50:32Z

jvm/src/test/scala/io/kaitai/struct/translators/TranslatorSpec.scala

+      GoCompiler ->
+        """var tmp1 interface{};
+          |if (true) {
+          |  tmp1 = this.Foo
+          |} else {
+          |  tmp1 = this.Bar
+          |}
+          |tmp1._io""".stripMargin,


This too can hardly be approved as correct. Accessing _io on interface{} doesn't make sense. See https://go.dev/tour/methods/14:

An empty interface may hold values of any type. (Every type implements at least zero methods.)

The code snippet on the linked page also demonstrates that interface{} can be anything, e.g. an int or a string:

func main() { var i interface{} describe(i) // => (<nil>, <nil>) i = 42 describe(i) // => (42, int) i = "hello" describe(i) // => (hello, string) } func describe(i interface{}) { fmt.Printf("(%v, %T)\n", i, i) }

To implement this correctly in Go, interface{} will presumably have to be replaced with a concrete "KaitaiStruct" interface, which will define a method Io() (so tmp1._io will become tmp1.Io()), or maybe a different name than Io should be used to avoid name conflicts with user-defined fields, but it must start with an uppercase letter so that it's accessible from outside the kaitai_struct_go_runtime package, see https://go.dev/tour/basics/3 (perhaps M_Io()?)

Thanks for catching this too, I was too naive to suppose it will just work.

Looks like we indeed just don't have that functionality at all. As far as I can tell, there's no inheritance in Go, thus we can't have common ancestor KaitaiStruct which will implement storage of a stream, but we indeed can have a common interface promising that every KaitaiStruct type we define has to implement a method that will return what we can use for ._io.

Let me actually give it a try separately.

The hardest thing here, as you've pointed out, is to decide on a naming. I really don't like M_Io — it's kind of copying ugly twist we've done in C# on not-even-a-standard-but-convention from a totally different language (C++), with it's m_foo prefixing.

But, generally speaking, given that our Golang name rendition is UpperCamelCase, which guarantees no underscores, I believe putting in the underscore is a way to go.

I wonder what do you folks think about Kaitai_IO()? It's a bit verbose, but generally puts a strong emphasis on ownership of the field.

As far as I'm concerned, both M_Io and Kaitai_IO work the same (both satisfy the Go capital letter requirement, and that's the only thing that matters in the end), I don't have any strong feelings here. Perhaps it indeed doesn't hurt to have the very descriptive Kaitai_ prefix, at least everyone sees that it's a special implicit field coming from the Kaitai_ "namespace", and it pretty much can't clash with anything ever.

More interesting to me is another naming problem :), which is whether to pick Io or IO. Personally, as my heart gravitates more towards Rust than Go (and I believe that XmlHttpRequest is the only spelling that makes sense, the actual name XMLHttpRequest feels like it can't pick one consistent style of writing acronyms even within itself), Io feels more natural to me, but since it stands for "I/O", which can be considered an acronym, it seems that the Go convention is to keep all letters in an initialism the same case, which suggests IO (see https://google.github.io/styleguide/go/decisions#initialisms):

Words in names that are initialisms or acronyms (e.g., URL and NATO) should have the same case. URL should appear as URL or url (as in urlPony, or URLPony), never as Url. This also applies to ID when it is short for “identifier”; write appID instead of appId.

This makes me die inside a bit, but as they say "when in Rome, do as the Romans do". So IO is probably indeed more idiomatic.

The interesting bit here is that we actually already generate some "public" names for these as per this code. This generates _IO and I'm pretty sure it just doesn't work.

Personally, I don't have a strong preference — very generally, sticking to prescribed recommendations per language makes sense, especially if they exist and are clear.

Implement optimal parenthesis for ternary operator + add tests

a28b2aa

Mingun suggested changes Mar 28, 2024

View reviewed changes

generalmimon requested changes Mar 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement optimal parenthesis for ternary operator + add tests #289

Implement optimal parenthesis for ternary operator + add tests #289

GreyCat commented Mar 26, 2024

Mingun left a comment

Mingun Mar 28, 2024

GreyCat Mar 28, 2024

Mingun Mar 28, 2024

GreyCat Mar 29, 2024

Mingun Mar 29, 2024

generalmimon Mar 29, 2024 •

edited

Loading

Mingun Mar 29, 2024

generalmimon Mar 29, 2024

GreyCat Mar 29, 2024

Mingun Mar 28, 2024

GreyCat Mar 28, 2024

Mingun Mar 28, 2024

Mingun Mar 28, 2024

generalmimon Mar 28, 2024

GreyCat Mar 28, 2024

generalmimon Mar 28, 2024 •

edited

Loading

GreyCat Mar 28, 2024

generalmimon Mar 28, 2024 •

edited

Loading

GreyCat Mar 28, 2024 •

edited by generalmimon

Loading

Implement optimal parenthesis for ternary operator + add tests #289

Are you sure you want to change the base?

Implement optimal parenthesis for ternary operator + add tests #289

Conversation

GreyCat commented Mar 26, 2024

Mingun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

generalmimon Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

generalmimon Mar 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

generalmimon Mar 28, 2024 • edited Loading

Choose a reason for hiding this comment

GreyCat Mar 28, 2024 • edited by generalmimon Loading

Choose a reason for hiding this comment

generalmimon Mar 29, 2024 •

edited

Loading

generalmimon Mar 28, 2024 •

edited

Loading

generalmimon Mar 28, 2024 •

edited

Loading

GreyCat Mar 28, 2024 •

edited by generalmimon

Loading