Skip to content

C++20 inference for Spotify's basic-pitch AMT/MIDI generator model with ONNXRuntime and libremidi

License

Notifications You must be signed in to change notification settings

sevagh/basicpitch.cpp

Repository files navigation

basicpitch.cpp

C++20 inference for the Spotify basic-pitch automatic music transcription/MIDI generator neural network with ONNXRuntime, Eigen, and libremidi. Demo apps are provided for WebAssembly/Emscripten and a cli app.

I use ONNXRuntime and scripts from the excellent ort-builder project to implement the neural network inference like so:

  • Convert the ONNX model to ORT (onnxruntime)
  • Include only the operations and types needed for the specific neural network, cutting down code size
  • Compile the model weights to a .c and .h file to include it in the built binaries

After the neural network inference, I use libremidi to replicate the end-to-end MIDI file creation of the real basic-pitch project. I didn't run any official measurements but the WASM demo site is much faster than Spotify's own web demo.

Project design

  • ort-model contains the model in ONNX form, ORT form, and the generated h and c file
  • scripts contain the ORT model build scripts
  • src is the shared inference and MIDI creation code
  • src_wasm is the main WASM function, used in the web demo
  • src_cli is a Linux cli app (for debugging purposes) that uses libnyquist to load the audio files
  • vendor contains third-party/vendored libraries
  • web contains basic HTML/Javascript code to host the WASM demo

Usage

I recommend the tool midicsv for inspecting MIDI events in CSV format without more complicated MIDI software, to compare the files output by basicpitch.cpp to the real basic-pitch.

Python

To run Spotify's own inference code and the original Python inference code with ONNX, use the included inference script:

$ python scripts/python_inference.py --dest-dir ./midi-out-python/ ~/Downloads/clip.wav
...
Using model: /home/sevagh/repos/basicpitch.cpp/ort-model/model.onnx
Writing MIDI outputs to ./midi-out-python/

Predicting MIDI for /home/sevagh/Downloads/clip.wav...
...

CLI app

After following the build instructions below:

$ ./build/build-cli/basicpitch ~/Downloads/clip.wav ./midi-out-cpp
basicpitch.cpp Main driver program
Predicting MIDI for: /home/sevagh/Downloads/clip.wav
Input samples: 441000
Length in seconds: 10
Number of channels: 2
Resampling from 44100 Hz to 22050 Hz
output_to_notes_polyphonic
note_events_to_midi
Before iterating over note events
After iterating over note events
Now creating instrument track
done!
MIDI data size: 889
Wrote MIDI file to: "./midi-out-cpp/clip.mid"

WebAssembly/web demo

For web testing, serve the web static contents with the Python HTTP server:

$ cd web && python -m http.server 8000

Use the website:

web-screenshot

Build instructions

(Only tested on Linux, Pop!_OS 22.04). I'm assuming you have a typical C/C++ toolchain e.g. make, cmake, gcc/g++, for your OS. You also need to set up the Emscripten SDK for compiling to WebAssembly.

Clone the repo with submodules:

$ git clone --recurse-submodules https://github.com/sevagh/demucs.cpp

Create a Python venv (or conda env) and install the requirements:

$ pip install -r ./scripts/requirements.txt

Activate your venv and run the ONNXRuntime builder scripts:

$ activate my-env
$ ./scripts/build-ort-linux.sh
$ ./scripts/build-ort-wasm.sh

Check the outputs:

$ ls build/build-ort-*/MinSizeRel/libonnx.a
build/build-ort-linux/MinSizeRel/libonnx.a  build/build-ort-wasm/MinSizeRel/libonnx.a

Optional: if you want to re-convert the ONNX model to ORT in the ort-model directory, use scripts/convert-model-to-ort.sh ./ort-models/model.onnx. The ONNX model is copied from ./vendor/basic-pitch/basic_pitch/saved_models/icassp_2022/nmp.onnx

Build cli app:

$ make cli
$ ls build/build-cli/basicpitch
build/build-cli/basicpitch

For WebAssembly, first, set up the Emscripten SDK. Then, build the WASM app with your EMSDK env script:

$ export EMSDK_ENV_PATH=/path/to/emsdk/emsdk_env.sh
$ make wasm
$ ls build/build-wasm/basicpitch.wasm
build/build-wasm/basicpitch.wasm

This also copies the updated basicpitch.{wasm,js} to the ./web directory.

About

C++20 inference for Spotify's basic-pitch AMT/MIDI generator model with ONNXRuntime and libremidi

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published