This is the starter kit for the Music XAI Project.
A big thank you to Jongho Kim for providing the codebase!
We recommend creating a conda environment:
conda create -n music_xai python=3.7.13 # Create a conda environment
conda activate music_xai # Activate the conda environment
which python # Make sure its activated. Otherwise, do deactivate then activate again
pip install -r requirements # Install required python packages
pip install protobuf==3.20.*
If you run into errors You may also need to do the following before installing:
sudo apt-get install build-essential python3-dev \
libldap2-dev libsasl2-dev slapd ldap-utils tox \
lcov valgrind
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training, by Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu, ACL 2021, is a large-scale pre-trained model for symbolic music understanding. It has several mechanisms including OctupleMIDI encoding and bar-level masking strategy that are specifically designed for symbolic music data, and achieves state-of-the-art accuracy on several music understanding tasks, including melody completion, accompaniment suggestion, genre classification, and style classification.
Projects using MusicBERT:
- midiformers: a customized MIDI music remixing tool with easy interface for users. (notebook)
pip install -r requirements.txt
- you should use fairseq version that is in
requirements.txt
file - install your own pytorch version, regarding the GPU.
For checking fairseq version, do:
import fairseq
fairseq.__version__
>>> '1.0.0a0+3369427'
- In
processed
directory.cd processed
- please use the provided segmented midi file
total.csv
segment_midi.zip
(inprocessed
dir) since there is file name error in original Google Drive file. - other data
ex) metadata of annotators, original files, ...
are in the drive
- It processes
total.csv
file to json filepython map_midi_to_label.py
- File
midi_label_map_apex_reg_cls.json
is generated.
- File
- Currently, peak value from kernel density estimation is used as label.
- You can also try: use all data / mean / median ... etc
- You can implement custom mapping function to filter unrelated or corrupted labels.
-
Generate XAI for music dataset in OctupleMIDI format using the midi to label mapping file with
gen_xai.py
.python -u gen_xai.py xai
- train / test set is splitted randomly
- please check
JSON_PATH
andSUFFIX
ingen_xai.py
before run it.
-
Binarize the raw text format dataset. (this script will read
xai_data_raw_apex_reg_cls
folder and outputxai_data_bin_apex_reg_cls
)bash scripts/binarize_xai.sh xai
- Download our pre-trained checkpoints here: small and base, and save in the
checkpoints
folder. (a newer version of fairseq is needed for using provided checkpoints: see issue-37 or issue-45)
-
you should modify hyperparameters, checkpoint path, etc in sh file.
-
using pre-trained model
-
for regression task,
bash scripts/regression/train_xai_base_small.sh # checkpoints/checkpoint_last_musicbert_base.pt, checkpoints/checkpoint_last_musicbert_base.pt
-
for classification task or multitask, check
scripts/classification
,scripts/reg_cls
-
If file path error, try
export PYTHONPATH=`pwd`
-
To custom the model, check
musicbert/__init__.py
- Some custom arguments are provided
- Check fairseq for detailed information.
-
Sample script for Regression task using LSTM
bash scripts/train_lstm.sh