Skip to content

mnskim/nlp_project_music_ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP Music Project

This is the starter kit for the Music XAI Project.

A big thank you to Jongho Kim for providing the codebase!

Installation

Conda environment

We recommend creating a conda environment:

conda create -n music_xai python=3.7.13 # Create a conda environment
conda activate music_xai # Activate the conda environment
which python # Make sure its activated. Otherwise, do deactivate then activate again
pip install -r requirements # Install required python packages
pip install protobuf==3.20.*

If you run into errors You may also need to do the following before installing:

sudo apt-get install build-essential python3-dev \
    libldap2-dev libsasl2-dev slapd ldap-utils tox \
    lcov valgrind

MusicBERT

MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training, by Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu, ACL 2021, is a large-scale pre-trained model for symbolic music understanding. It has several mechanisms including OctupleMIDI encoding and bar-level masking strategy that are specifically designed for symbolic music data, and achieves state-of-the-art accuracy on several music understanding tasks, including melody completion, accompaniment suggestion, genre classification, and style classification.

Projects using MusicBERT:


Model structure of MusicBERT


OctupleMIDI encoding

0. before start..

Google drive link

installation

  • pip install -r requirements.txt
  • you should use fairseq version that is in requirements.txt file
  • install your own pytorch version, regarding the GPU.

For checking fairseq version, do:

import fairseq
fairseq.__version__
>>> '1.0.0a0+3369427'

1. Preparing datasets

1.1 Pre-processing datasets

  • In processed directory.
    cd processed
    
  • please use the provided segmented midi file total.csv segment_midi.zip (in processed dir) since there is file name error in original Google Drive file.
  • other data ex) metadata of annotators, original files, ... are in the drive

map_midi_to_label.py

  • It processes total.csv file to json file
    python map_midi_to_label.py
    • File midi_label_map_apex_reg_cls.json is generated.
  • Currently, peak value from kernel density estimation is used as label.
  • You can also try: use all data / mean / median ... etc
  • You can implement custom mapping function to filter unrelated or corrupted labels.

gen_xai.py

  • Generate XAI for music dataset in OctupleMIDI format using the midi to label mapping file with gen_xai.py.

    python -u gen_xai.py xai
    • train / test set is splitted randomly
    • please check JSON_PATH and SUFFIX in gen_xai.py before run it.
  • Binarize the raw text format dataset. (this script will read xai_data_raw_apex_reg_cls folder and output xai_data_bin_apex_reg_cls)

    bash scripts/binarize_xai.sh xai

2. Training

  • Download our pre-trained checkpoints here: small and base, and save in the checkpoints folder. (a newer version of fairseq is needed for using provided checkpoints: see issue-37 or issue-45)

2.1 Fine-tuning on XAI music regression task

  • you should modify hyperparameters, checkpoint path, etc in sh file.

  • using pre-trained model

  • for regression task,

    bash scripts/regression/train_xai_base_small.sh # checkpoints/checkpoint_last_musicbert_base.pt, checkpoints/checkpoint_last_musicbert_base.pt
  • for classification task or multitask, check scripts/classification, scripts/reg_cls

  • If file path error, try export PYTHONPATH=`pwd`

  • To custom the model, check musicbert/__init__.py

    • Some custom arguments are provided
    • Check fairseq for detailed information.
  • Sample script for Regression task using LSTM

    bash scripts/train_lstm.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published