BPE update #2

VladPetk · 2023-12-07T15:06:48Z

VladPetk
Dec 7, 2023

Hey Alex,

I've finally gotten the first results from training a solo piano model with byte-pair encoding.

First, thanks for all your work. I've used quite a bit of your code in my project.

I tried out several approaches but settled on using the x-transformers model trained on a REMI-encoded subset of MMD data.
In short, the model with BPE (vocab size of 2000 vs 363 for without BPE), performed better based on my subjective evaluations: i.e., listening to the generated output. The output was generally less confused and just more musical, so to speak. That is despite the BPE model achieving somewhat lower accuracy than the non-BPe one (70% vs 80%).

Also, I found that using REMI encoding (vs strcutured, a version of which I believe you use - at least in this repo) performs better in terms of rhythm. That's probably due to it having bar tokens and specifying the relative positions of notes in a bar. Of course, to achieve that I used only quantized MIDIs, which in itself probably also improved the rhythmic structure.

I started creating a repo, it has a more detailed description of my results. I will try to add more output to it/ add more details soon. https://github.com/VladPetk/Piano_music_transformer/tree/main

I was not really interested in creating whole pieces. I was after generating nice ideas/ continuations (as I like to dabble in composition in my free time). So a max_seq_len of 1024 was more than enough for me for now. But if you're interested in generating full compositions, using BPE might have an advantage there, too, as it effictively compresses the data and you can fit more of it into the same max_seq_len.

Hope this info is helpful!

Vlad

asigalov61 · 2023-12-08T00:09:04Z

asigalov61
Dec 8, 2023
Maintainer

@VladPetk Hey Vlad!

Thank you so much for the update and for sharing your work and results on GitHub. It is very helpful and useful :)

I wanted to give you couple of suggestions to improve your work and results:

You can greatly improve the loss and accuracy of your models if you will train on the piano datasets which are split in two or more parts/channels.

I have three that I made just for that purpose:
https://github.com/asigalov61/Tegridy-MIDI-Dataset/tree/master/Misc

All three datasets at this link are great for BPE IMHO as they are much more balanced and homogenous.

It is a well known fact that training on music which is separated into channels produces much better results. Solo instruments do not produce good results with auti-regresive models. So try that first.

I have not reviewd your code in detail but you can also improve performance if you will feed music sequentially and by a composition, rather than random sampling from mixed datasets. In other words, you can try to use 8192 seq_len with efficient encoding so that each composition fits into the seq_len, and therefore does not overlap and/or confuse the model.
Use more efficient and more relaxed encoding. REMI and other standardized encoding schemes often change the music or use some form of alignment/pre-processing. This can degrade performane and make it less human-like as the result. So I would recommend you to invest a bit of time to create a custom encoding which will be better suited for BPE than REMI or OCTOMIDI and such...
I wanted to show you my draft colab which I recently made which uses custom BPE encoding. It is a bit rough and maybe hard to read at first but check it out as it may give you some ideas about encoding and training the models. Please see attached colab/zip.

BPE_DRAFT_DEMO_DATA_AND_TRAIN.zip

To be honest, BPE does not work well with music for many reasons (from my experience). Primarily because numerical; complexity of music is much greater than that of text or images. For example, a triplet encoding 128 for time, 128 for durs, and 128 for pitches, would reqiure 128"3 combinations, which with BPE produces very large dictionary for current models/architectures to handle. So the main and most important step is to develop the most compressed encoding for music if you want to use BPE.

I can also recoomend you not to use mixed datasets such as MMD. Instead try to use more homogenous datasets, like POP909+POP1k7 for POP music, or ASAP+GiantMIDI+ATEPP for classical music. This should also help to improve the results. Howver, as I said before, it is best to use music split in parts/channels, so you can also try to make a custom dataset from MMD or LAKH by selecting MIDIs that have piano in parts/channels.

Last but not least, here is a sample I made long time ago which used BPE and POP909 dataset. While it played well, the music was not really beautiful or memorable, so there is also work that needs to be done to fix that.

https://soundcloud.com/aleksandr-sigalov-61/ov-sample-4-20210417170114-aac?in=aleksandr-sigalov-61/sets/exclusive-preview-optimus-virtuoso&si=318a3c372aa144fc910b7b1a65ee0ce7&utm_source=clipboard&utm_medium=text&utm_campaign=social_sharing

Anyway, if you have any questions about any of it or if you need help/advice from me about any of that stuff, feel free to reach out at any time.

And thank you again for sharing the BPE work/results :)

Happy Holidays to you!

Alex.

PS. For long seq_len I recommend to use torch.amp (fp16 precision) and torch.cuda.sdp_kernel(mem_efficient or flash attention).
They would allow you long seq_len and should fit into your equipment nicely at reasonable model sizes.
Ask me if you need help with this.

0 replies

VladPetk · 2023-12-22T17:36:00Z

VladPetk
Dec 22, 2023
Author

Hi, Alex,

Apologies for the belated reply. I wanted to try out your suggestions first :)

First of all, the idea about using more homogenous datasets definitely worked, thanks for that! I constructed a classical piano dataset (approx. 3200 pieces only), and actually managed to train a model on it that performed better in terms of accuracy (~84%) compared to the one trained on 60K piano pieces (~82%). It's strange, I didn't expect a large model to effectively learn from such a small amount of data, but here we are. Though I guess the improvement in accuracy is not only due to the similar(ish) genres of the pieces - I also have more confidence in the quality of those MIDIs.
If you want to check the samples: link to G drive folder

About feeding the data to the model sequantially: I ran a few tests, and doing it sequantially led to a lot of instability in training (as might be expected in ML generally, afaik). But perhaps you did it differently? I also tried using padding instead of separator tokens, but both approaches exhibited pretty much the same performance.

About REMI. You're absolutely right that an encoding like that can alter a piece, That's why I only used quantized pieces, then REMI changes pretty much nothing (except for chosen velocity bins, etc., of course). The motivation behind using REMI was to give the model more info about the timing of the piece (bar positions and time signatures), and it does seem to work as the generated output stays within bars and fits neatly if transcribed into notes. This bit was important for me, as I envision my model as a kind of helper/ inspiration in making music, so being able to effortlessly transfer the output into DAW was very important. Now that I think of it though, I haven't tried using a more relaxed encoding on quantized input- perhaps it would work just as well and even reduce the number of tokens needed,

And about BPE. You might very well be right. After doing some more training on various models, it does seem that no-BPE may be just as good. Still worth a shot, I think, if you have lots of data and are going for very long sequences.

And finally, I've also been using mixed precision - it's great! Though for my purposes I don't need long sequences.

Happy holidays to you! And thanks again for all this.

Vlad

0 replies

asigalov61 · 2024-01-03T14:20:13Z

asigalov61
Jan 3, 2024
Maintainer

@VladPetk I am glad my suggestions were helpful and useful :)

If you want to discuss all this further, feel free to write at any time :)

Alex

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BPE update #2

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

BPE update #2

VladPetk Dec 7, 2023

Replies: 3 comments

asigalov61 Dec 8, 2023 Maintainer

VladPetk Dec 22, 2023 Author

asigalov61 Jan 3, 2024 Maintainer

VladPetk
Dec 7, 2023

asigalov61
Dec 8, 2023
Maintainer

VladPetk
Dec 22, 2023
Author

asigalov61
Jan 3, 2024
Maintainer