Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuzzgen: Statistics framework #4868

Merged
merged 5 commits into from
Sep 27, 2022
Merged

Conversation

afonso360
Copy link
Contributor

👋 Hey,

This PR Introduces a statistics module for fuzzgen. This allows us to measure stuff and get some info about what is going on.

I don't want to commit this in its current format since it has some issues, see the comments below.

Here's the output after 50000 inputs:

== FuzzGen Statistics  ====================
Total Inputs: 50000
Valid Inputs: 29130 (58.3%)
Total Runs: 42358
Successful Runs: 17564 (41.5% of Total Runs)
Timed out Runs: 12445 (29.4% of Total Runs)
Traps:
        user code: heap_oob: 0 (0.0% of Total Runs)
        user code: icall_null: 0 (0.0% of Total Runs)
        user code: int_ovf: 0 (0.0% of Total Runs)
        user code: int_divz: 12349 (29.2% of Total Runs)
        user debug: 0 (0.0% of Total Runs)
        user code: heap_misaligned: 0 (0.0% of Total Runs)
        user code: bad_sig: 0 (0.0% of Total Runs)
        user code: bad_toint: 0 (0.0% of Total Runs)
        user code: interrupt: 0 (0.0% of Total Runs)
        user code: unreachable: 0 (0.0% of Total Runs)
        user code: table_oob: 0 (0.0% of Total Runs)
        resumable: 0 (0.0% of Total Runs)
        user code: stk_ovf: 0 (0.0% of Total Runs)

Comment on lines 173 to 184
fuzz_target!(|bytes: &[u8]| {
STATISTICS.print();
STATISTICS.total_inputs.fetch_add(1, Ordering::SeqCst);

let mut unstructured = Unstructured::new(bytes);
let testcase = match TestCase::arbitrary(&mut unstructured) {
Ok(t) => t,
Err(_) => {
return;
}
};

Copy link
Contributor Author

@afonso360 afonso360 Sep 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing this loses us the nice cargo fuzz fmt output which is not great. Do we have some other way of getting the amount of inputs that the fuzzer tried, while using the Arbitrary version of the macro?

Without total_inputs we can't calculate how many inputs we are dropping while generating a TestCase which is currently the most important metric to optimize. But I really don't want to lose cargo fuzz fmt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a crude hack, we could duplicate the logic in the libfuzzer crate that's triggered by the RUST_LIBFUZZER_DEBUG_PATH environment variable: https://github.com/rust-fuzz/libfuzzer/blob/63b922685a0e714d2e7d2af9050e8860d5f6dc09/src/lib.rs#L203-L218

I've been trying to work out how to gather statistics like this from all the fuzz targets, and the other problem that bothers me is how to report results at the end of the run. At the moment the option I find most appealing is if somebody sorts out rust-fuzz/libfuzzer#46. As I noted over there, it's also possible to use libc::atexit, but only if you're careful not to use any Rust I/O functions in the at-exit handler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a variety of fuzz targets I've written in the past, we have something like

static ITER: AtomicUsize = AtomicUsize::new(0);

fuzz_target!(|data| {
    let n = ITER.fetch_add(1, Ordering::Relaxed);
    if log::is_enabled(log::Level::Info) && n % 1024 == 0 {
        dump_statistics();
    }

    // ...
});

and this has worked out pretty well, with adjustments to 1024 as needed for the fuzz target's throughput, of course.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's approximately what Afonso has done in this PR, and what Andrew did in the differential fuzz target. It's a pattern that has already helped us find important issues in both fuzzers, so that's great!

An alternative is what the metrics_printer crate does: kick off a thread and dump metrics at a periodic interval. That avoids having to separately tune for each fuzz target's throughput.

But one thing I'd like to be able to do is get these statistics for OSS-Fuzz runs. I had the thought that instead of slowing down fuzzing, maybe we could fetch the corpus from OSS-Fuzz and replay it locally with stats collection turned on. Those corpora currently range from under 2k up to 25k tests. It's possible each corpus is heavily biased in terms of the statistics we care about, compared to the full run which produced it. So maybe this isn't an effective alternative to just dumping stats to the OSS-Fuzz stderr log. But if it is: I don't see any way right now to make metrics printing useful with so few tests, except by setting an environment variable with the number of inputs and having the fuzz target print after it has processed that many inputs.

Afonso's recent testing has a related issue. I noticed the test plan was "ask libfuzzer to run 100,010 inputs" and the statistics are printed after a hardcoded number of inputs (this PR says 50k but I'm guessing it was 100k for these experiments), so there are about 10 tests that get run at the end without being counted in the stats. If you try to make these two numbers equal, it's tedious to reason about whether the current test is included in the count or not, so adding a few extra tests on the end just to be sure makes total sense.

Again, an environment variable could help since we know how many tests we're trying to run. But in both cases I'd really rather just be able to print the statistics when libFuzzer is done with its run, and not have to tune anything at all.

Copy link
Contributor Author

@afonso360 afonso360 Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a crude hack, we could duplicate the logic in the libfuzzer crate that's triggered by the RUST_LIBFUZZER_DEBUG_PATH environment variable: https://github.com/rust-fuzz/libfuzzer/blob/63b922685a0e714d2e7d2af9050e8860d5f6dc09/src/lib.rs#L203-L218

Huh so that's how they do it! I never knew. I think there's a better solution, see below.

I've been trying to work out how to gather statistics like this from all the fuzz targets, and the other problem that bothers me is how to report results at the end of the run.

I think switching to iter % n == 0 for printing is a great idea! And when I'm measuring stuff for other PR's I can always adjust how many runs I do so that it prints at the end of the run. i.e. pick n=10k and run 50k when benchmarking or something like that.

It doesn't fix that exact issue, but at least for my use case it works, and it sounds like it would be useful for OSS-Fuzz as well.


And I've also been thinking maybe we can do without the total_inputs metric. total_inputs is only used for the "Valid Inputs %" and we can get that metric by other means. When looking at the fuzz log we can check the run number that printed this statistics and do some math (run 50k has 25k valid inputs, so its 50% valid). When benchmarking its also easy to calculate manually when presenting results.

Additionally we should soon be at 100% Valid inputs on this target (only 1 PR left! I hope), so it would become really obvious if we are missing some runs, since Valid inputs would no longer be 50000 or 100000 it would be something like 41273 and that would be something that would probably stand out.

It would still be nice to have a 100% there, but while we figure out some solution I think it would be worth merging this. (I'm also tired of cherry-picking these 3 commits everywhere 😄).

Edit:

Here's an example of what I'm proposing:

#46795  NEW    cov: 31167 ft: 170205 corp: 5611/1461Kb lim: 1024 exec/s: 74 rss: 631Mb L: 804/1024 MS: 1 InsertRepeatedBytes-
== FuzzGen Statistics  ====================
Valid Inputs: 20000
Total Runs: 199404
Successful Runs: 197351 (99.0% of Total Runs)
Timed out Runs: 189 (0.1% of Total Runs)
Traps:
        user code: int_divz: 1864 (0.9% of Total Runs)
#46979  NEW    cov: 31167 ft: 170206 corp: 5612/1461Kb lim: 1024 exec/s: 74 rss: 631Mb L: 294/1024 MS: 4 ChangeBinInt-EraseBytes-CopyPart-PersAutoDict- DE: "\xff\xff\xff\xff\xff\xff\xff\xff"-

With this we know we had 20k valid inputs at run ~46795 (there is some uncertainty here) so we have roughly a 42.7% valid rate.

Now that I'm thinking about it this might not work out as I thought it would, since we don't have the guarantee that this prints at the end. So it wouldn't work very well for benchmarking unless we start measuring in terms of valid inputs, which is also doable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing the "total inputs" metric is a good plan. It at least lets us get something merged, even if it isn't perfect for all our use cases. I'll be happy to review and approve that!

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator fuzzing Issues related to our fuzzing infrastructure labels Sep 4, 2022
@github-actions
Copy link

github-actions bot commented Sep 4, 2022

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "cranelift", "fuzzing"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: fuzzing

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@afonso360
Copy link
Contributor Author

afonso360 commented Sep 24, 2022

I've made some changes since this was introduced:

  • We now filter out traps that were never triggered, and sort based on the count of traps.
  • As mentioned above, we now print based on "Valid Inputs" instead of "Total Inputs". Its not ideal but hopefully a temporary solution.

Here's the example output:

#99422  NEW    cov: 32426 ft: 189477 corp: 6858/3354Kb lim: 20182 exec/s: 50 rss: 912Mb L: 264/20154 MS: 2 PersAutoDict-InsertRepeatedBytes- DE: "=\x00I\x0a"-
== FuzzGen Statistics  ====================
Valid Inputs: 50000
Total Runs: 752563
Successful Runs: 736535 (97.9% of Total Runs)
Timed out Runs: 1076 (0.1% of Total Runs)
Traps:
        user code: int_divz: 10656 (1.4% of Total Runs)
        user code: int_ovf: 3131 (0.4% of Total Runs)
        user code: bad_toint: 1165 (0.2% of Total Runs)
#99528  NEW    cov: 32426 ft: 189488 corp: 6859/3359Kb lim: 20182 exec/s: 49 rss: 912Mb L: 4759/20154 MS: 1 CopyPart-

@afonso360 afonso360 marked this pull request as ready for review September 24, 2022 05:47
Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just needs a rebase because fuzz/Cargo.toml changed.

@jameysharp jameysharp enabled auto-merge (squash) September 27, 2022 16:01
@jameysharp jameysharp merged commit 65a3af7 into bytecodealliance:main Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator fuzzing Issues related to our fuzzing infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants