NLP Music Project

This is the starter kit for the Music XAI Project.

A big thank you to Jongho Kim for providing the codebase!

Installation

Conda environment

We recommend creating a conda environment:

conda create -n music_xai python=3.7.13 # Create a conda environment
conda activate music_xai # Activate the conda environment
which python # Make sure its activated. Otherwise, do deactivate then activate again
pip install -r requirements # Install required python packages
pip install protobuf==3.20.*

If you run into errors You may also need to do the following before installing:

sudo apt-get install build-essential python3-dev \
    libldap2-dev libsasl2-dev slapd ldap-utils tox \
    lcov valgrind

MusicBERT

MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training, by Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu, ACL 2021, is a large-scale pre-trained model for symbolic music understanding. It has several mechanisms including OctupleMIDI encoding and bar-level masking strategy that are specifically designed for symbolic music data, and achieves state-of-the-art accuracy on several music understanding tasks, including melody completion, accompaniment suggestion, genre classification, and style classification.

Projects using MusicBERT:

midiformers: a customized MIDI music remixing tool with easy interface for users. (notebook)

Model structure of MusicBERT

OctupleMIDI encoding

0. before start..

Google drive link

https://drive.google.com/drive/folders/1Rzncw8syf__TE5Fb1415P9V5zOcztQ5o

installation

pip install -r requirements.txt
you should use fairseq version that is in requirements.txt file
install your own pytorch version, regarding the GPU.

For checking fairseq version, do:

import fairseq
fairseq.__version__
>>> '1.0.0a0+3369427'

1. Preparing datasets

1.1 Pre-processing datasets

In processed directory.
```
cd processed
```
please use the provided segmented midi file total.csv segment_midi.zip (in processed dir) since there is file name error in original Google Drive file.
other data ex) metadata of annotators, original files, ... are in the drive

map_midi_to_label.py

It processes total.csv file to json file
```
python map_midi_to_label.py
```
- File midi_label_map_apex_reg_cls.json is generated.
Currently, peak value from kernel density estimation is used as label.
You can also try: use all data / mean / median ... etc
You can implement custom mapping function to filter unrelated or corrupted labels.

gen_xai.py

Generate XAI for music dataset in OctupleMIDI format using the midi to label mapping file with gen_xai.py.
```
python -u gen_xai.py xai
```
- train / test set is splitted randomly
- please check JSON_PATH and SUFFIX in gen_xai.py before run it.
Binarize the raw text format dataset. (this script will read xai_data_raw_apex_reg_cls folder and output xai_data_bin_apex_reg_cls)
```
bash scripts/binarize_xai.sh xai
```

2. Training

Download our pre-trained checkpoints here: small and base, and save in the checkpoints folder. (a newer version of fairseq is needed for using provided checkpoints: see issue-37 or issue-45)

2.1 Fine-tuning on XAI music regression task

you should modify hyperparameters, checkpoint path, etc in sh file.
using pre-trained model

for regression task,

bash scripts/regression/train_xai_base_small.sh # checkpoints/checkpoint_last_musicbert_base.pt, checkpoints/checkpoint_last_musicbert_base.pt

for classification task or multitask, check scripts/classification, scripts/reg_cls
If file path error, try export PYTHONPATH=`pwd`
To custom the model, check musicbert/__init__.py
- Some custom arguments are provided
- Check fairseq for detailed information.
Sample script for Regression task using LSTM
```
bash scripts/train_lstm.sh
```

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
__pycache__		__pycache__
analysis		analysis
experiments/checkpoints		experiments/checkpoints
musicbert		musicbert
processed		processed
scripts		scripts
vat-pytorch		vat-pytorch
README.md		README.md
[ARCHIVED]_MusicBERT_mask_prediction.ipynb		[ARCHIVED]_MusicBERT_mask_prediction.ipynb
binarize_genre.sh		binarize_genre.sh
binarize_nsp.sh		binarize_nsp.sh
binarize_pretrain.sh		binarize_pretrain.sh
eval_genre.py		eval_genre.py
eval_nsp.py		eval_nsp.py
eval_xai.py		eval_xai.py
gen_genre.py		gen_genre.py
gen_nsp.py		gen_nsp.py
lstm_classifier.py		lstm_classifier.py
requirements.txt		requirements.txt
train_genre.sh		train_genre.sh
train_mask.sh		train_mask.sh
train_nsp.sh		train_nsp.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Music Project

Installation

Conda environment

MusicBERT

0. before start..

Google drive link

installation

1. Preparing datasets

1.1 Pre-processing datasets

map_midi_to_label.py

gen_xai.py

2. Training

2.1 Fine-tuning on XAI music regression task

About

Releases

Packages

Languages

mnskim/nlp_project_music_ai

Folders and files

Latest commit

History

Repository files navigation

NLP Music Project

Installation

Conda environment

MusicBERT

0. before start..

Google drive link

installation

1. Preparing datasets

1.1 Pre-processing datasets

map_midi_to_label.py

gen_xai.py

2. Training

2.1 Fine-tuning on XAI music regression task

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages