wasm-split: Add fuzzer support #7014

kripken · 2024-10-16T22:41:17Z

The support is added but not enabled as this is still finding bugs, but I'd
like to land it now as it will conflict with other work I am doing.

The first part here is to add Split testcase handler to the fuzzer,
which runs a wasm, then runs it again after splitting it and then
linking it at runtime, and checking for different results.

The second part is support for linking two modules at runtime
in the fuzzer's JS code, that works in tandem with the first part.
New options are added to load and link a second wasm, and to
pick which exports to run.

tlively · 2024-10-18T01:00:14Z

scripts/fuzz_opt.py

+        # get the list of function names, some of which we will decide to split
+        # out
+        wat = run([in_bin('wasm-dis'), wasm] + FEATURE_OPTS)
+        all_funcs = re.findall(r'\n [(]func [$](\S+)', wat)


Interesting, I haven't seen square brackets used as a form of escape before. I think backslashes would work, too, and might be less surprising. It also might be good to stash this regex in compiled form in a global.

Compiling the regex sgtm. About clarity, though, I feel that [(] is better than \( because the slash is similar to \S etc. where it is not escaping anything. The list of things that need escaping differs between regex impls AFAIK, but doing [X] always works.

tlively · 2024-10-18T01:02:55Z

scripts/fuzz_opt.py

+        # find the names of the exports. we need this because when we split the
+        # module then new exports appear to connect the two halves of the
+        # original module. we do not want to call all the exports on the new
+        # primary module, but only the original ones.
+        exports = []


It might be simpler to use the --export-prefix option to add a unique prefix to the new exports that we can filter out directly in the JS wrapper.

Hmm, then we'd need to hardcode some special prefix in the JS wrapper, but I'm not sure there is a fixed prefix we can use: we don't want to overlap with existing export names.

FWIW Emscripten uses % as the prefix. Maybe we can just choose an arbitrary one like that and it would be good enough?

I think the difference is that Emscripten has full control over the export names. In the fuzzer we do want to be able to fuzz initial content from anywhere. I suppose we could sanitize that content before fuzzing it, but that seems more complicated to me.

Ok, the currently solution LGTM, then.

tlively · 2024-10-18T01:05:32Z

scripts/fuzz_opt.py

+            opts = ['-O3']
+            new_name = name + '.opt.wasm'
+            run([in_bin('wasm-opt'), name, '-o', new_name, '-all'] + opts + split_feature_opts)
+            nonlocal optimized


Neat, I've never seen nonlocal before.

tlively · 2024-10-18T01:11:37Z

scripts/fuzz_shell.js

+if (secondBinary) {
+  imports['placeholder'] = new Proxy({}, {
+    get(target, prop, receiver) {
+      // Return a function that does an indirect call using the exported table.
+      return (...args) => exports['table'].get(+prop)(...args);
+    }
+  });
+}


This proxy can just throw an error if it is ever called. Since the secondary module is eagerly loaded, it should be impossible to end up in a placeholder function.

Good point!

tlively · 2024-10-18T01:15:41Z

scripts/fuzz_shell.js

+  var combinedImports = Object.assign({}, imports);
+  combinedImports['primary'] = {};
+  for (var e in exports) {
+    combinedImports['primary'][e] = exports[e];
+  }


This can just be something like var combinedImports = {'primary': exports};. See for example https://github.com/emscripten-core/emscripten/blob/bdde3cd337ed4ac72a641f862b5371048e049946/src/preamble.js#L686.

Oh, thanks, I didn't realize wasm-split routes all secondary module imports through the primary module... does that not add any inefficiency? I guess not as the primary module just imports and exports them, unchanged.

Right, I don't think it should have any overhead.

tlively

LGTM if we decide not to use --export-prefix.

kripken added 4 commits October 16, 2024 10:05

work

fa8a95f

bettr

0e74570

clean

f8cb484

work

3d71854

kripken requested a review from tlively October 16, 2024 22:41

kripken added 4 commits October 16, 2024 15:45

format

b5df612

nans

4f07dae

oops^2

8406b24

flake

6104c04

tlively reviewed Oct 18, 2024

View reviewed changes

kripken added 3 commits October 18, 2024 11:12

compile regex

d578928

simplify proxy stubs

fc15b5a

feedback

ed9c1ac

tlively approved these changes Oct 18, 2024

View reviewed changes

kripken merged commit 679c26f into WebAssembly:main Oct 18, 2024
13 checks passed

kripken deleted the fuzz.split.TRUE branch October 18, 2024 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasm-split: Add fuzzer support #7014

wasm-split: Add fuzzer support #7014

kripken commented Oct 16, 2024

tlively Oct 18, 2024

kripken Oct 18, 2024

tlively Oct 18, 2024

kripken Oct 18, 2024

tlively Oct 18, 2024

kripken Oct 18, 2024

tlively Oct 18, 2024

tlively Oct 18, 2024

tlively Oct 18, 2024

kripken Oct 18, 2024

tlively Oct 18, 2024

kripken Oct 18, 2024

tlively Oct 18, 2024

tlively left a comment

wasm-split: Add fuzzer support #7014

wasm-split: Add fuzzer support #7014

Conversation

kripken commented Oct 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlively left a comment

Choose a reason for hiding this comment