Skip to content

junjinyong/nlp_project_music_ai

 
 

Repository files navigation

MusicBERT

MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training, by Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu, ACL 2021, is a large-scale pre-trained model for symbolic music understanding. It has several mechanisms including OctupleMIDI encoding and bar-level masking strategy that are specifically designed for symbolic music data, and achieves state-of-the-art accuracy on several music understanding tasks, including melody completion, accompaniment suggestion, genre classification, and style classification.

Projects using MusicBERT:


Model structure of MusicBERT


OctupleMIDI encoding

0. Google drive link

1. Preparing datasets

1.1 Pre-processing datasets

  • please use the provided segmented midi file total.csv segment_midi.zip since there is file name error in original Google Drive file.

  • other data ex) metadata of annotators, original files, ... are in the drive

  • process total.csv file to json file.

    python map_midi_to_label.py
    • File midi_label_map_apex_reg_cls.json is generated.
    • Currently, peak value from kernel density estimation is used as label.
    • You can also try: use all data / mean / median ... etc
  • Generate XAI for music dataset in OctupleMIDI format using the midi to label mapping file with gen_xai.py.

    python -u gen_xai.py
    • Currently, train / test set is splitted by segments
  • Binarize the raw text format dataset. (this script will read xai_data_raw_apex_reg_cls folder and output xai_data_bin_apex_reg_cls)

    bash scripts/binarize_xai.sh xai

2. Training

  • Download our pre-trained checkpoints here: small and base, and save in the checkpoints folder. (a newer version of fairseq is needed for using provided checkpoints: see issue-37 or issue-45)

2.1 Fine-tuning on XAI music regression task

  • you should modify hyperparameters, checkpoint path, etc in sh file.

  • using pre-trained model

    bash scripts/train_xai_base_small.sh # checkpoints/checkpoint_last_musicbert_base.pt, checkpoints/checkpoint_last_musicbert_base.pt
  • If file path error, try export PYTHONPATH=`pwd`

  • To custom the model, check musicbert/__init__.py

    • Some custom arguments are provided
    • Check fairseq for more information.
  • Sample script for Regression task using LSTM

    bash scripts/train_lstm.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 83.1%
  • Shell 16.9%