Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement optimal parenthesis for ternary operator + add tests #289

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

GreyCat
Copy link
Member

@GreyCat GreyCat commented Mar 26, 2024

Another piece of partially contributing to resolving kaitai-io/kaitai_struct#69 — this one handles ternary operator:

  • Add common doIfExp() implementation (+add extPrec propagation into it) into CommonOps, which does a good job at parenthesizing
    • This provides default implementation which is ok for most languages following popular condition ? ifTrue ? ifFalse rendering originating from C.
    • Remove individual copy-pasted implementations in specific languages where we can rely on common one.
    • Adjust Python rendering to follow the same correct strategy
    • Don't touch special ad hoc implementations (Lua, Go)
  • Add/fix tests proving that it works in many contexts

Copy link
Contributor

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current TranslatorSpec tests lack of support for NimCompiler and RustCompiler. I'll add them to existing tests in #290, but you should add them to the new tests.

You can use https://play.nim-lang.org and https://play.rust-lang.org to check the code.

))

// When evaluated, must be 12 + 34 = 46
everybodyExcept("2 < 3 ? 12 + 34 : 45 + 67", "2 < 3 ? 12 + 34 : 45 + 67", ResultMap(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it is better to explicitly test all possible expressions in arms, not only +.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I follow what you're suggesting?

What extra value it will bring or rather what type of situation you anticipate to catch with that?

Also, what limit on "all possible expressions" you want to put in here? "All possible" literally is infinite set, we won't be able to test infinite permutations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What extra value it will bring or rather what type of situation you anticipate to catch with that?

Different operators have different priority, so in some cases we need to wrap them in parenthesis. I cannot be sure that in all languages it is handled correctly until it is not tested.

For example, I'm working on bringing Rust tests to TranslatorSpec and found a bug where .to_s on expression does not generate parenthesis. In all languages them was generated, except Rust. Such mistakes can arise in the future, so I think it is better to protect from them preventively.

"All possible" literally is infinite set, we won't be able to test infinite permutations.

I don't think so. KS expression language has a limited set of operators, just test them all here. +, -, *, / (float and integer), string concatenation, boolean operators, etc., all from Ast.operators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood your correctly, what you're suggesting is effectively testing all permutations of all pairs:

  • ternary operator, int addition
  • ternary operator, int subtraction
  • ternary operator, int multiplication
  • etc

I don't think it's a great strategy to move forward with, because of two reasons:

  • It's very bulky: we're talking about ~20-30 more tests. It will be very hard to maintain and keep updated manually.
  • It doesn't really provide 100% coverage you're looking to, and it quickly gets us into exponential explosion:
    • For instance, where do you put the operation? There are three places to put it in ternary: "condition", "ifTrue", "ifFalse", so even for one addition you'll be looking into three tests:
      • 1 + 2 ? 3 : 4
      • 1 ? 2 + 3 : 4
      • 1 ? 2 : 3 + 4
    • But it's not enough, we also need to test all permutations in arguments (thus 9 tests total):
      • 1 + 2 ? 3 + 4 : 5
      • 1 + 2 ? 3 : 4 + 5
      • 1 + 2 ? 3 + 4 : 5 + 6
      • 1 ? 3 + 4 : 5
      • 1 ? 3 : 4 + 5
      • 1 ? 3 + 4 : 5 + 6
    • If we repeat only that for 30 operators, it will be 9 * 30 = 270 tests.
    • But that's not enough, we also need to test combinations of additions + multiplications or addition + bit shift, e.g. 1 << 2 ? 3 + 4 : 5 * 6. Simple binomial coefficient calculation of C(30)(3) = 30! / (3! * 27!) gives us 4060 combinations.
    • But that is also not enough, as we likely need to test combinations of combinations, e.g. more than one operator in "condition", "ifTrue" and "ifFalse" parts.

I don't think it's practical approach to follow. Instead, what I propose is to be aware of the fact that in all languages we track this operator is lower precedence than anything and test only corner cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just 1 x 2 ? 3 x 4 : 5 x 6 is enough (where x is operator). Each arm of ternary operator is independent of each other, so it does not require to test all possible variations that differs only in arms. The same is applied to arm and condition. This is only linear from count of operators and will provide 100% coverage. It is not much, actually, just additional 20-30 everybody/everybodyExcept calls.

Copy link
Member

@generalmimon generalmimon Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What extra value it will bring or rather what type of situation you anticipate to catch with that?

Different operators have different priority, so in some cases we need to wrap them in parenthesis. I cannot be sure that in all languages it is handled correctly until it is not tested.

My 2 cents: I don't think it makes sense to go super crazy with these manual TranslatorSpec cases - trying to cover all permutations by hand is not only tedious, but also unmaintainable and in the end it doesn't test the thing it should, i.e. whether all translated expressions will give the (same) expected result when evaluated in each language (to really be sure of that, you would have to manually take each translated expression and paste it into some interactive shell in every target language, which is insane).

In my view, these tests should only be complementary to normal .ksy + .kst tests, not replace them (that would be a step back). They can provide some value that .ksy/.kst tests can't - they can be used to catch that we have unnecessarily too many parentheses, which makes the generated code ugly, but doesn't affect functionality. But IMHO they shouldn't be meant to catch too few parentheses - .ksy/.kst tests are much better suited for this, because then there's always a set of values for which the actual evaluation of the translated expression in the target language will result in a different value than expected, which proves that we really have a problem, and we'll instantly know in what target languages exactly.

I wrote a very immature proof-of-concept Python script that demonstrates what automatic generation of such .ksy/.kst pair could look like: https://gist.github.com/generalmimon/a24d65d1890e73b4036fdd48e7d6772f

To give you an idea, the generated files look something like this:

# expr_gen.ksy
meta:
  id: expr_gen
enums:
  fruit:
    9: apple
    6: banana
instances:
  l0_r0_lsig0_rsig0:
    value: (0 + 7) + 5
  l0_r0_lsig0_rsig1:
    value: (0 + 7) + 13.7
  l0_r0_lsig1_rsig2:
    value: (3 + 4.4) + 13
  l0_r0_lsig1_rsig3:
    value: (3 + 4.4) + 1.1
  # ...
  l0_r4_lsig0_rsig0:
    value: (3 + 12) % 6
  l0_r5_lsig0_rsig0:
    value: (11 + 13) | 8
  l0_r6_lsig0_rsig0:
    value: (7 + 13) ^ 14
  l0_r7_lsig0_rsig0:
    value: (6 + 1) & 12
  # ...
  l11_r0_lsig3_rsig0:
    value: fruit::banana.to_i + 7
  l11_r0_lsig3_rsig1:
    value: fruit::banana.to_i + 5.3
  l11_r0_lsig4_rsig0:
    value: false.to_i + 1
  l11_r0_lsig4_rsig1:
    value: false.to_i + 11.1
  # ...
  l20_r10_lsig5_rsig1:
    value: '(true ? [50, 48, 52] : [56, 57, 51, 48]).to_s(''ASCII'')'
  # ...
# expr_gen.kst
id: expr_gen
data: enum_negative.bin
asserts:
- actual: l0_r0_lsig0_rsig0
  expected: '12'
- actual: l0_r0_lsig0_rsig1
  expected: '20.7'
- actual: l0_r0_lsig1_rsig2
  expected: '20.4'
- actual: l0_r0_lsig1_rsig3
  expected: '8.5'
# ...
- actual: l0_r4_lsig0_rsig0
  expected: '3'
- actual: l0_r5_lsig0_rsig0
  expected: '24'
- actual: l0_r6_lsig0_rsig0
  expected: '26'
- actual: l0_r7_lsig0_rsig0
  expected: '4'
# ...
- actual: l11_r0_lsig3_rsig0
  expected: '13'
- actual: l11_r0_lsig3_rsig1
  expected: '11.3'
- actual: l11_r0_lsig4_rsig0
  expected: '1'
- actual: l11_r0_lsig4_rsig1
  expected: '11.1'
# ...
- actual: l20_r10_lsig5_rsig1
  expected: '''204'''
# ...

It's not a silver bullet (and the script needs some serious refactoring, comparison operators need to be added, the right-side grouping like X left_op (Y right_op Z) should be tested too etc.), but I believe it's a much better approach than trying to cover all of these (currently 531 value instances) in TranslatorSpec.

And I've already found some bugs with this that I shared in #277 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with KST is that they does not run in PRs and I do not know how to run them locally. So it is hard to change code which potentially can break some other languages when I'm working on a language that's interest for me. I do not know many Kaitai languages and I wouldn't be happy that I should blindly change their or common code.

If you could write a documentation how to do that on Windows (and even better also on Windows 7) and setup corresponding checks on CI, it would be great simplification!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with KST is that they does not run in PRs and I do not know how to run them locally.

Happy to help you with that (I'm on Windows too), but not here - are you on Gitter? We can talk privately/publicly as you wish.

If you could write a documentation how to do that on Windows (and even better also on Windows 7) and setup corresponding checks on CI

Both things would be nice, but as with everything, the problem is that someone has to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@generalmimon Love the idea, that's actually a very good separation line that you've drawn here!

Would you mind creating a PR with that script to tests repo?

@@ -26,6 +26,8 @@ trait CommonOps extends AbstractTranslator {
Ast.operator.BitOr -> 80,
)

val IF_EXP_PRECEDENCE = 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this field needs documentation, in what cases it should be used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let me add that.

@@ -37,7 +37,7 @@ class NimTranslator(provider: TypeProvider, importList: ImportList) extends Base
case Identifier.ROOT => s"${ksToNim(provider.determineType(Identifier.ROOT))}(this.${doName(s)})"
case _ => s"this.${doName(s)}"
}
override def doIfExp(condition: expr, ifTrue: expr, ifFalse: expr): String =
override def doIfExp(condition: expr, ifTrue: expr, ifFalse: expr, extPrec: Int): String =
s"(if ${translate(condition)}: ${translate(ifTrue)} else: ${translate(ifFalse)})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to rewrite that part too. You can check result at https://play.nim-lang.org

Comment on lines 51 to 53
"if " + translate(condition) +
" { " + translate(ifTrue) + " } else { " +
translate(ifFalse) + "}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to rewrite this part too. You can check result at https://play.rust-lang.org

Comment on lines +230 to +235
GoCompiler ->
"""var tmp1 int8;
|if (2 < 3) {
| tmp1 = 123
|} else {
| tmp1 = 456
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

456 clearly doesn't fit into int8, so I wouldn't approve this as the "expected" correct translation just to make the tests green and provide a false sense of "everything works fine" when it doesn't.

This doesn't even compile, see https://go.dev/play/p/tZ81LHWP5bM:

./prog.go:10:10: cannot use 456 (untyped int constant) as int8 value in assignment (overflows)
      6         var tmp1 int8
      7         if 2 < 3 {
      8                 tmp1 = 123
      9         } else {
  *  10                 tmp1 = 456
     11         }
     12         fmt.Println(tmp1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually perfect catch, thanks for noticing it!

I don't think there's anything wrong with the expression, but actually we should fix GoTranslator implementation, so it should use more generic CalcIntType.

Comment on lines +313 to +320
GoCompiler ->
"""var tmp1 interface{};
|if (true) {
| tmp1 = this.Foo
|} else {
| tmp1 = this.Bar
|}
|tmp1._io""".stripMargin,
Copy link
Member

@generalmimon generalmimon Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too can hardly be approved as correct. Accessing _io on interface{} doesn't make sense. See https://go.dev/tour/methods/14:

An empty interface may hold values of any type. (Every type implements at least zero methods.)

The code snippet on the linked page also demonstrates that interface{} can be anything, e.g. an int or a string:

func main() {
	var i interface{}
	describe(i) // => (<nil>, <nil>)

	i = 42
	describe(i) // => (42, int)

	i = "hello"
	describe(i) // => (hello, string)
}

func describe(i interface{}) {
	fmt.Printf("(%v, %T)\n", i, i)
}

To implement this correctly in Go, interface{} will presumably have to be replaced with a concrete "KaitaiStruct" interface, which will define a method Io() (so tmp1._io will become tmp1.Io()), or maybe a different name than Io should be used to avoid name conflicts with user-defined fields, but it must start with an uppercase letter so that it's accessible from outside the kaitai_struct_go_runtime package, see https://go.dev/tour/basics/3 (perhaps M_Io()?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this too, I was too naive to suppose it will just work.

Looks like we indeed just don't have that functionality at all. As far as I can tell, there's no inheritance in Go, thus we can't have common ancestor KaitaiStruct which will implement storage of a stream, but we indeed can have a common interface promising that every KaitaiStruct type we define has to implement a method that will return what we can use for ._io.

Let me actually give it a try separately.

The hardest thing here, as you've pointed out, is to decide on a naming. I really don't like M_Io — it's kind of copying ugly twist we've done in C# on not-even-a-standard-but-convention from a totally different language (C++), with it's m_foo prefixing.

But, generally speaking, given that our Golang name rendition is UpperCamelCase, which guarantees no underscores, I believe putting in the underscore is a way to go.

I wonder what do you folks think about Kaitai_IO()? It's a bit verbose, but generally puts a strong emphasis on ownership of the field.

Copy link
Member

@generalmimon generalmimon Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I'm concerned, both M_Io and Kaitai_IO work the same (both satisfy the Go capital letter requirement, and that's the only thing that matters in the end), I don't have any strong feelings here. Perhaps it indeed doesn't hurt to have the very descriptive Kaitai_ prefix, at least everyone sees that it's a special implicit field coming from the Kaitai_ "namespace", and it pretty much can't clash with anything ever.

More interesting to me is another naming problem :), which is whether to pick Io or IO. Personally, as my heart gravitates more towards Rust than Go (and I believe that XmlHttpRequest is the only spelling that makes sense, the actual name XMLHttpRequest feels like it can't pick one consistent style of writing acronyms even within itself), Io feels more natural to me, but since it stands for "I/O", which can be considered an acronym, it seems that the Go convention is to keep all letters in an initialism the same case, which suggests IO (see https://google.github.io/styleguide/go/decisions#initialisms):

Words in names that are initialisms or acronyms (e.g., URL and NATO) should have the same case. URL should appear as URL or url (as in urlPony, or URLPony), never as Url. This also applies to ID when it is short for “identifier”; write appID instead of appId.

This makes me die inside a bit, but as they say "when in Rome, do as the Romans do". So IO is probably indeed more idiomatic.

Copy link
Member Author

@GreyCat GreyCat Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interesting bit here is that we actually already generate some "public" names for these as per this code. This generates _IO and I'm pretty sure it just doesn't work.

Personally, I don't have a strong preference — very generally, sticking to prescribed recommendations per language makes sense, especially if they exist and are clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants