DSP Performance benchmark #22

JayFoxRox · 2018-12-26T05:34:11Z

This can be used to benchmark the (GP) DSP.

Example output:

1 frames: 6287446 ns elapsed (47 ns dryrun, 6287446 ns per frame [3.976193% realtime]), -6037445 ns of budget left

Settings

         unsigned int frames = 1; // Target should be ~15
         unsigned int cycles = 40000; // Target should be ~106666, minimum 40000

frames:

We only run every 15th APU frame, so if you want to batch all frames, you must set frames = 15. If frames is any less, we can't produce stable audio output.
For measuring performance, only doing every 15th frame is fine.

cycles:

Running 40000 cycles should be enough for some games, but the real DSP will run around 106k (probably). The more we can do, the better.
22000 cycles should be enough to finish a single APU frame in DirectSound.
1000 cycles is what XQEMU master revision uses.

If you keep increasing this value, you might see a performance improvement at some point.
That is because the DSP will be stuck on a simple loop instruction which is computationally less expensive than some others (= lower CPU usage / faster). If you pick a lower value the DSP might still be busy doing more complicated tasks when hitting the limit (= high CPU usage / slower).

There's certain combinations where the Xbox will refuse to boot. Either because the host CPU is busy and won't find time to update the UI, or because something in the DSP went wrong as it was spammed by frames / lacked frames.
It's also possible that there's bugs in our DSP which affect this test.

How to interpret the output

{A} frames: {B} ns elapsed ({C} ns dryrun, {D} ns per frame [{E} % realtime]), {F} ns of budget left

(ns = nanoseconds)

{A} frames: The number of frames that we measured
{B} ns elapsed: The time it took to emulate those APU frames using the DSP
{C} ns dryrun: How long an empty measure is (this can be used to see the timer resolution; if this is large or about as large as {B}, then your system timer resolutions are bad)
{D} ns per frame: elapsed time ({B}) divided by number of frames ({A}), so you know how long each frame took; this must be <= 666666 ns for realtime audio output.
{E} % realtime: We have 0.666ms per frame (160 MHz ~> 106666 cycles). However, if we take twice as long to compute 106666 cycles, we only run at 50% etc.; this must be >= 100% for realtime audio output.
{F} ns of budget left: You have a budget of 0.666ms per frame ({A}). If we run faster than realtime, then we have budget left (time we didn't need). If we run slower than realtime, we have a negative budget (the amount of time we spent, to catch up with realtime); this must be >= 0 for realtime audio output.

tl;dr: Make sure {C} is tiny, to verify timing works, optimize so {E} reaches 100%

Add hacky code to benchmark DSP

59b6d71

JayFoxRox added the experiments label Dec 26, 2018

JayFoxRox mentioned this pull request Dec 26, 2018

DSP interpreter is slow xqemu/xqemu#155

Open

JayFoxRox mentioned this pull request Jul 14, 2019

Audio PR #2: Move APU frame processing to new thread xqemu/xqemu#250

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DSP Performance benchmark #22

DSP Performance benchmark #22

JayFoxRox commented Dec 26, 2018 •

edited

Loading

DSP Performance benchmark #22

Are you sure you want to change the base?

DSP Performance benchmark #22

Conversation

JayFoxRox commented Dec 26, 2018 • edited Loading

Settings

How to interpret the output

JayFoxRox commented Dec 26, 2018 •

edited

Loading