Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make qcheck-lin and qcheck-stm available on OCaml 4.x #329

Merged
merged 4 commits into from
Apr 26, 2023

Conversation

edwintorok
Copy link
Contributor

I know this is the 'multicoretests' repo, but the Thread mode of qcheck-lin appears to be useful for testing OCaml 4.x code too :)

Unfortunately 'enabled_if' in 'test' stanza in dune doesn't work: it still attempts to build it, which would fail on OCaml 4, so have to use separate executable/runtest alias where 'enabled_if' works properly.

The following now compiles:

dune build -p qcheck-multicoretests-util,qcheck-stm,qcheck-lin

The tests run, but the output is currently different from OCaml 5:

dune runtest -p qcheck-multicoretests-util,qcheck-stm,qcheck-lin

I haven't added enabled_if to all the other tests that are part of the multicoretests package, too much churn, and most tests wouldn't run because they link both the thread and domain versions.

What do you think?

@jmid
Copy link
Collaborator

jmid commented Apr 18, 2023

This is great, many thanks! 😃 👍

  • I expect the output difference is due to using a different underlying RNG (it switched with OCaml5). Using OCaml5's RNG from https://github.com/ocaml/stdlib-random could help make the outputs agree across major versions.
  • As a first crude approximation, could one simply require OCaml5 for the multicoretests package? I'm hoping that would allow the other 3 packages to be available on 4.x and avoid us having to rewrite all existing (test ...) rules of the test suite...
  • Note: The Thread modes of both qcheck-lin and qcheck-stm don't work as well as the corresponding Domain modes. This would nevertheless allow for people to try it out on 4.x and also use STM.sequential...

@edwintorok
Copy link
Contributor Author

I moved the test that depended on OCaml 5's RNG to the multicoretests package. That package already depends on base-domains which implies OCaml 5, so should be fine now. This passes:

dune runtest -p qcheck-multicoretests-util,qcheck-stm,qcheck-lin --no-buffer

(it also doesn't seem to be actually running any tests, but thats fine I guess for now, this is well tested on OCaml 5...)

@jmid
Copy link
Collaborator

jmid commented Apr 18, 2023

Sorry, for the confusion - I was too brief and unclear: 😬

  • The test/ directory contains "internal tests" of qcheck-multicoretests-util, qcheck-stm, and qcheck-lin
  • The src/ directory contains "the OCaml5 torture test suite" aka the multicoretests package

In our CI we have a separate step to first run and check the former and subsequently the latter.
In essense, test/ contains the tests we would include if we decide to relocate qcheck-multicoretests-util, qcheck-stm, and qcheck-lin to a different repo.

With the above in mind, marking a test in test/ as belonging to multicoretests disturbs the above separation. As such, I would rather prefer to keep the mutable_set_v5 belonging to the qcheck-stm package. A quick-and-dirty way to fix it on 4.x would be to include .expected-files for the 4.x RNG on 64/32-bit too. Another way would be to add https://github.com/ocaml/stdlib-random as an overall dependency. However I'm open to other suggestions too.

This following comment was targeted at the src/ tests, i.e., suggesting that the torture test suite was limited to just OCaml5+ for a start:

As a first crude approximation, could one simply require OCaml5 for the multicoretests package? I'm hoping that would allow the other 3 packages to be available on 4.x and avoid us having to rewrite all existing (test ...) rules of the test suite...

@edwintorok
Copy link
Contributor Author

A quick-and-dirty way to fix it on 4.x would be to include .expected-files for the 4.x RNG on 64/32-bit too.

I'll take a look whether it is possible to use dune's select feature to choose a 4 vs 5 .expected files (I cannot do it based on compiler version, but I can use availability of 'base-domains' as a proxy for >= 5 which should suffice).
32/64 might be more complicated, I'll setup a 32-bit toolchain to check.

@jmid
Copy link
Collaborator

jmid commented Apr 18, 2023

Sounds good! There's yet another approach (using dune file generation) in https://github.com/c-cube/qcheck/blob/main/test/core/dune - mostly as an example of what I hope to avoid 😅

Their Thread mode is useful for testing OCaml 4.x code too.

Unfortunately 'enabled_if' in  'test' stanza in dune doesn't work: it still
attempts to build it, which would fail on OCaml 4, so have to use separate
executable/runtest alias where 'enabled_if' works properly.

The following now compiles:
```
dune build -p qcheck-multicoretests-util,qcheck-stm,qcheck-lin
```

The tests run, but the output is currently different from OCaml 5:
```
dune runtest -p qcheck-multicoretests-util,qcheck-stm,qcheck-lin
```

Signed-off-by: Edwin Török <[email protected]>
It uses a different RNG.

Signed-off-by: Edwin Török <[email protected]>
It uses some new functions from Seq.

Signed-off-by: Edwin Török <[email protected]>
@edwintorok
Copy link
Contributor Author

I've created 32 and 64-bit versions of the .expected files now (using opam switch create 4.14.0+32-bit ocaml-variants.4.14.0+options ocaml-option-32bit to get a 32-bit toolchain), the rules there already used enabled_if so I went with that instead of select.

@jmid
Copy link
Collaborator

jmid commented Apr 18, 2023

I've tried this out locally and it works nicely as intended 👍
Both the 4.x and 5.x Thread modes don't work as nicely as the 5.x Domain ones unfortunately.
There are some previous attempts to improve them in #99

Bonus points for realizing that base-domains are not required for qcheck-multicoretests-util 🥇😃

Q: I see this requires 4.14 now. What prevents us from lowering the required version?
Note: I'm not asking you to lower it! I'm just trying to understand if we are using some Stdlib bindings added in 4.14 (I've had my share of lower bound battles during QCheck releases... 😅 )

@edwintorok
Copy link
Contributor Author

edwintorok commented Apr 18, 2023 via email

@jmid
Copy link
Collaborator

jmid commented Apr 18, 2023

The CI is green modulo unrelated failures:

  • Two Cygwin part1's were cancelled
  • A Cygwin part2 trunk run timed out due to a long-running (9595.9s!) Lin ref int64 test with Thread
  • 3 Windows runs (5.1, 5.1 bytecode, trunk) triggered the threadomain timeout

A Changes entry would be nice to help document the change in the forthcoming release notes.

@jmid
Copy link
Collaborator

jmid commented Apr 25, 2023

I've now added a CHANGES entry in 04df34f.
The PR is still marked draft, but I think it is good to go.
Let me know if you want to include further edits.

@edwintorok edwintorok marked this pull request as ready for review April 26, 2023 08:15
@edwintorok
Copy link
Contributor Author

Sorry, I got distracted by trying out the code and writing some tests using it. It does work, although writing custom shrinking and printing functions is highly recommended: makes shrinking results a lot more useful and finish in a reasonable time.
Here is a snippet:

let is_small s = String.length s < 6

  let shrink_cmd =
    let open QCheck in
    let open KV in
    function
    | Get k ->
        Iter.map
          (fun k -> Get (Key.of_string_exn k))
          Shrink.(filter is_small string @@ Key.to_string k)

let truncated_str s =
  if String.length s < 6 then
    String.escaped s
  else
    Printf.sprintf "%s…(len:%d)"
      (String.sub s 0 6 |> String.escaped)
      (String.length s)

This relies on the assumption (that at least in my code) bugs have 2 origins: due to the byte values contained in the string (in which case small string should be able to fully exercise that), or due to the length of the string (in which case trying to generate different bytes, or different short strings won't reproduce the bug and make shrinking take a huge amount of time, just leave long strings as they were for now if short ones don't repro the problem).
Other improvements might be possible (e.g. I haven't checked whether shrinking would already do a binary search on length or not), although the non-determinism due to threads/domains does sometimes make shrinking or testing slower (for some reason STM seems ~10x slower than Lin). A custom shrinker makes it work wonderfully though.

I might submit a separate PR to add some examples to the docs about writing a shrink_cmd, IIUC that is needed to attempt to shrink the values, otherwise just the number of commands is being shrunk, and custom printers are needed to keep the printed results readable (especially when printing the 2 parallel branches side-by-side)

@jmid
Copy link
Collaborator

jmid commented Apr 26, 2023

This relies on the assumption (that at least in my code) bugs have 2 origins: due to the byte values contained in the string (in which case small string should be able to fully exercise that), or due to the length of the string (in which case trying to generate different bytes, or different short strings won't reproduce the bug and make shrinking take a huge amount of time, just leave long strings as they were for now if short ones don't repro the problem).

Conditional string shrinking based on its length seems like a good idea!
I agree that issues tend to be either caused by the length or a particular char content.
Potentially, such improvements should go upstream into QCheck.
We have also experienced needlessly long runs due to long strings, and tend to use small_string generators as a quick (but unsatisfying) workaround.

I haven't checked whether shrinking would already do a binary search on length or not

A nice and tempting rabbit hole starts here: c-cube/qcheck#242 and continues here: c-cube/qcheck#268

Needlessly long strings being printed also sounds familiar. I created #308 to keep track of it.
Again, improved string printers may also be useful to push upstream to QCheck.

Other improvements might be possible (e.g. I haven't checked whether shrinking would already do a binary search on length or not), although the non-determinism due to threads/domains does sometimes make shrinking or testing slower (for some reason STM seems ~10x slower than Lin). A custom shrinker makes it work wonderfully though.

In contrast to Lin, STM has to perform precond checks for each shrinking candidate, as each of them could potentially contain a simplification that breaks a precondition. Following the original QuickCheck race-condition paper we therefore only accept 'shrink candidates' that satisfy all preconds under all possible interleavings... This is exponential in the length of the 'parallel tails', and so may likely explain how a more economical shrinker triggers fewer such checks and thus exhibits better performance...

I might submit a separate PR to add some examples to the docs about writing a shrink_cmd, IIUC that is needed to attempt to shrink the values, otherwise just the number of commands is being shrunk,

Indeed, without supplying cmd shrinkers only the "spine" of the cmd list triples are reduced.

and custom printers are needed to keep the printed results readable (especially when printing the 2 parallel branches side-by-side)

Yep, the horror cabinet starts in #308 😅

@jmid jmid merged commit 761fbf3 into ocaml-multicore:main Apr 26, 2023
@edwintorok
Copy link
Contributor Author

Hmm,interruptible shrinking might be useful, e.g. if I happen to notice that shrinking is taking a long time (how long, will it take hours or days to finish, and meanwhile seeing even the potentially tiny bit longer error might already be useful...), dumping some sort of "state" that can be printed and further shrunk might be useful although that would require 'cmd' to be both serializable and deserializable.
Some form of 'dump current state' to a file upon a signal (^C or ^I/^T) might be useful so I can inspect it and potentially improve the shrinker. Anyway as you say all these improvements can go directly to qcheck upstream, so I'll start there.

@edwintorok
Copy link
Contributor Author

all preconds under all possible interleavings... This is exponential in the length of the 'parallel tails

Limiting the length of the generated commands might help, and is what Jepsen recommends in their tutorials

I'll try out some ideas and report back (hopefully in the form of some PRs...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants