The Best of ISMIR 2023

Fri, Dec 8, 2023 8-minute read

ISMIR is back in full swing with over 400 registrations, most of them in-person attendees in the beautiful city of Milan. Once again, ISMIR proves that it is home to the most incredible research community I’ve witnessed, and I feel so lucky to be part of it.

Overall, this year’s conversation has significantly shifted from last year’s. Generative AI was, unsurprisingly, the hottest topic, and that lead to several interesting discussions on ethics and the fair usage of training data. This is particularly interesting under the music domain, where training data are typically audio tracks that artists have not necessarily given their consent to be used to train ML models with. Given how the vast majority of artists struggle to make enough for a living, there might be something potentially interesting in how research institutions could license some of their music so that artists get more rewarded.

Additionally, there were several interesting approaches in the symbolic domain, which differs from the trend of past ISMIRs where audio-based approaches tended to dominate.

It’s hard to pick the best papers, but here’s my personal top 10 (in no particular order). Here we go, titans!

Best 10 Papers

VampNet: Music Generation Via Masked Acoustic Token Modeling - Flores García et al.

What a fantastic usage of discrete tokens for music generation. Not only the approach is highly interesting, where one can condition the model to inpaint, to change timbre, rhythm, finish/start songs, etc., but the paper comes with a fantastically written open source implementation, a Max patch plugin, and several incredible examples. On top of all of this, Hugo and Bryan gave a fun performance using VampNet during the music program!

Data Collection In Music Generation Training Sets: A Critical Analysis - Morreale et al.

This paper brings up highly relevant questions in today’s generative AI world where ethically-soruced data are key: do dataset creators have the artists’ consent to use their content for genAI purposes? Did the dataset curators hire musicians/license their music to put together such datasets? Unsurprisingly, the results after investigating over a hundred datasets used in ISMIR throughout the years show that most datasets were gathered by scraping the Internets, without asking for any consent to the original artists. So happy to see that this type of research is finally making a big splash in the ISMIR community. This work won the best paper Universal Music Group award (kind of ironically, given how much the music industry has historically underpaid the actual musicians).

Towards Building A Phylogeny Of Gregorian Chant Melodies - Hajic jr et al.

I truly loved this one. The authors applied phylogeny (i.e., the analysis of the evolution of protein of nucleo-acid sequences) to gregorian chant melodies. This allows to understand the small differences that were introduced in such melodies throughout the centuries. They link their findings with the belief systems that the different monks/sects had around that time. Certain fractions believed that chants had to be extensively reviewed as it was requested by higher powers, while others were more laxed about it. The latter group shows how their melodies added much more variation across several decades.

MoisesDB: A Dataset For Source Separation Beyond 4-Stems - Morreale et al.

A dataset with stems curated and recorded specifically for this purpose. 240 new tracks that represent a total of 14 hours of music. The dataset is offered free of charge for non-commercial purposes. The way to go to curate and release datasets, good job Moises.AI! You can download it from their research site.

PESTO: Pitch Estimation With Self-supervised Transposition-equivariant Objective - Riou et al.

Completely self-supervised method for Pitch Estimation (i.e., no labeled data are required for training). To do so, they transpose the signal and feed both versions to their model to estimate the pitches. They model it as a classification problem, and they use both probability densities (the output of the model) to come up with a new loss to minimize the differences between the two versions of the signal. They train it in a siamese manner to minimize this new “Transposition Equivariant” loss, and they yield results that are very close to SOTA (ie, CREPE), but without using any labeled data. Very impressive! Winner of the best paper award 🎉.

A Review of Validity and Its Relationship to Music Information Research - Sturm & Flexer

Time to leave your horses behind and jump on the casual inference wagon. Sturm & Flexer take a spin on the classic book on casual inference, so that it becomes more approachable and familiar to the MIR community. I really enjoy this type of more “divulgative” works that try to bridge fields together. The MIR community has a long history of aiming at validating its often hard to validate results (e.g., what to do when a musical attribute can be annotated in several different ways and all of them are valid?). This might be a step towards a more formal approach to doing so.

LyricWhiz: Robust Multilingual Lyrics Transcription by Whispering to ChatGPT - Zhou et a.

Finally a lyrics transcriber that seems to work well. The authors leverage two key models from OpenAI: Whisper (for transcription) and chatGPT (for correction/supervision of the transcription). To do so, they introduce several little tricks to transcribe and then correct the actual lyrics, involving querying Whisper several times, and prompting chatGPT with rather sofisticated queries. Smart to use robots to clean up the mess from other robots! They show how this method yields SOTA results. Finally, they further introduce MulJam, a dataset of close to 400h of music for multilingual lyrics transcription. Dataset and code to reprocude experiments can be found here.

The Games We Play: Exploring The Impact of ISMIR on Musicology

Do ISMIR papers influence musicology? 100+ ISMIR papers claim that they are “useful” to the musicological community. This work investigates if that is really true, and turns out… it isn’t. Besides pseudo-self-citations, the authors have not found a single example of an “independent” (that is, not self-cited) musicological application in the last 10 years of ISMIR in traditional musicological venues! So happy to see this type of papers in ISMIR. It is important to challenge the ISMIR community and spark the discussion to try to better bridge related fields like musicology with MIR.

CLaMP: Contrastive Language-Music Pre-Training for Cross-Modal Symbolic Music Information Retrieval

Fantastic work of applying contrastive learning to the analysis of symbolic music. Given that symbolic music is much easier to tokenize than actual audio samples, the usage of contrastive learning between text and symbolic music fits perfectly well. On top of that, they introduce tricks to make the training more effective and efficient: bar patching and masking for pre-training. And finally, they do that with the help of a 1.4M music-text pairs as dataset (which can be downloaded from huggingface). This work won the best student paper award.

SingStyle111: A Multilingual Singing Dataset With Style Transfer - Dai et al.

Finally a dataset specifically focused on singing style transfer! I think this can really help fine-tune generation models to learn different styles when synthesizing singing voice. The dataset (available here) contains over 12h of audio with over 100 songs and 8 different singers. The only thing I miss is some death metal singing, to make the dream come true of converting pitched vocals into Opeth-like screams \m/

Conclusion

ISMIR is home to such an incredibly talented set of researchers. One of the things I truly love about this community is its passionate and collaborative nature. As opposed to other huge communities like Computer Vision, general Machine Learning, or Audio Processing, there is no apparent hierarchy at ISMIR. Big names in the field talk to newcomers in a completely flat manner, which helps in terms of making everyone feel included and considered. On top of all of that, it is apparent our pure love for music, which surfaces on every aspect of the conference: from the keynotes (which were, the three of them, fantastic) to the music program passing through the satellite events and the serendipitious conversations with the attendees. Watching some of the biggest names in the field playing their instruments masterfully along with one of the keynote speakers during the jam session was pretty mindblowing and inspiring!

Despite one of the General Chairs (Augusto Sarti) having had a serious health issue, and Fabio Antonacci being a newcomer to the community, this has been an incredible ISMIR. Everything ran really smoothly, from the banquet to the technical sessions, passing through the jam session! A word on that: some of us were a bit worried that there was not a lot of information about the ISMIR jam session this year (or rather, the available information was very different from what the ISMIR community is used to). So the community stepped up and, after awesome performances with Mark d’Inverno and Joey Stuckey, Matthias Mauch ended up leading a really fun jam session where I had the pleasure to scream for a little bit :)

Overall, congratulations to the organizers, volunteers, authors, and all participants for making this ISMIR 2023 an unforgettable one. Hope to see you all in San Francisco next year!