Publications

For further stats and details, check out my Google Scholar Profile.

Peer Reviewed Articles

"AudioChat: Unified Audio Storytelling, Editing, and Understanding with Transfusion Forcing", William Chen, Prem Seetharaman, Rithesh Kumar, Oriol Nieto, Shinji Watanabe, Justin Salamon, Zeyu Jin, (under review), 2026. arXiv Demo
"TAC: Timestamped Audio Captioning", Sonal Kumar, Prem Seetharaman, Ke Chen, Oriol Nieto, Jiaqi Su, Zhepei Wang, Rithesh Kumar, Dinesh Manocha, Nicholas J. Bryan, Zeyu Jin, Justin Salamon, (under review), 2026. arXiv Demo
"Generative Audio Extension and Morphing", Prem Seetharaman,* Oriol Nieto,* Justin Salamon, Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
"Mix2Morph: Learning Sound Morphing From Noisy Mixes", Annie Chu, Hugo Flores García, Oriol Nieto, Justin Salamon, Bryan Pardo, Prem Seetharaman, Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
"PromptSep: Generative Audio Separation Via Multimodal Prompting", Yutong Wen, Ke Chen, Prem Seetharaman, Oriol Nieto, Jiaqi Su, Rithesh Kumar, Minje Kim, Paris Smaragdis, Zeyu Jin, Justin Salamon, Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
"AudioCards: Structured Metadata Improves Audio Language Models For Sound Design", Sripathi Sridhar, Prem Seetharaman, Oriol Nieto, Mark Cartwright, Justin Salamon, Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
"Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning", Chao-Han Huck Yang, Sreyan Ghosh, Qing Wang, Jaeyeon Kim, Hengyi Hong, Sonal Kumar, Guirui Zhong, Zhifeng Kong, S. Sakshi, Vaibhavi Lokegaonkar, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha, Gunhee Kim, Jun Du, Rafael Valle, Bryan Catanzaro, Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv
"SoundStager: Interactive Design of Story-Driven GenAI Soundscapes for Video", Suhyeon Yoo, Adolfo Hernandez-Sebastian, Prem Seetharaman, Justin Salamon, Oriol Nieto, Anh Truong, Proc. of the ACM Conference on Human Factors in Computing Systems (CHI). Barcelona, Spain, 2026. PDF Video
"SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation", Sonal Kumar, Prem Seetharaman, Justin Salamon, Dinesh Manocha, Oriol Nieto, Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Tahoe City, CA, USA, 2025. arXiv Demo
"FLAM: Frame-Wise Language-Audio Modeling", Yusong Wu, Christos Tsirigotis, Ke Chen, Cheng-Zhi Anna Huang, Aaron Courville, Oriol Nieto, Prem Seetharaman, Justin Salamon, Proc. of the 47th International Conference on Machine Learning (ICML). Vancouver, BC, Canada, 2025. arXiv Code Demo
"Video-Guided Foley Sound Generation with Multimodal Controls", Ziyang Chen, Prem Seetharaman, Bryan Russell, Oriol Nieto, David Bourgin, Andrew Owens, Justin Salamon, The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). Nashville, TN, USA, 2025. arXiv Demo
"Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs", Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Utkarsh Tyagi, Oriol Nieto, Zeyu Jin, Dinesh Manocha, Proc. of the 13th International Conference on Learning Representations (ICLR). Singapore, 2025. arXiv Demo
🏆 Top 5.1% conference paper (spotlighted)
"MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark", S. Sakshi, Utkarsh Tyagi, Sonal Kumar, Ashish Seth, Ramaneswaran Selvakumar, Oriol Nieto, Ramani Duraiswami, Sreyan Ghosh, Dinesh Manocha, Proc. of the 13th International Conference on Learning Representations (ICLR). Singapore, 2025. arXiv Code Demo
"Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations", Hugo Flores García, Oriol Nieto, Justin Salamon, Bryan Pardo, Prem Seetharaman, Proc. of the 50th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Hyderabad, India, 2025. arXiv Demo
"ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds", Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha, Proc. of the 50th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Hyderabad, India, 2025. arXiv Code
"Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning", Ilaria Manco, Justin Salamon, Oriol Nieto, Proc. of the 25th International Society for Music Information Retrieval Conference (ISMIR). San Francisco, CA, USA, 2024. arXiv
🏆 Top 5% conference paper (oral)
"GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities", Sreyan Ghosh, Sonal Kumar, Ashish Seth, Chandra Kiran Reddy Evuru, Utkarsh Tyagi, S. Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha, Proc. of the 19th Empirical Methods in Natural Language Processing Conference (EMNLP). Miami, Florida, USA, 2024. arXiv Code Demo
"CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models", Sreyan Ghosh, Ashish Seth, Sonal Kumar, Utkarsh Tyagi, Chandra Kiran Evuru, S. Ramaneswaran, S. Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha, Proc. of the 12th International Conference on Learning Representations (ICLR). Vienna, Austria, 2024. arXiv Code Demo
"Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries", Julia Wilkins, Justin Salamon, Magdalena Fuentes, Juan Pablo Bello, Oriol Nieto, Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, USA, 2023. arXiv Demo
"Efficient Spoken Language Recognition via Multilabel Classification", Oriol Nieto, Zeyu Jin, Franck Dernoncourt, Justin Salamon, Proc. of the 24th InterSpeech Conference. Dublin, Ireland, 2023. arXiv
🏆 Top 10% conference paper (highlighted)
"Language-Guided Audio-Visual Source Separation via Trimodal Consistency", Reuben Tan, Andrea Burns, Arijit Ray, Bryan A. Plummer, Oriol Nieto, Justin Salamon, Bryan Russell, Kate Saenko, Proc. of the IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). Vancouver, BC, Canada, 2023. arXiv Code
"Audio-Text Models Do Not Yet Leverage Natural Language", Ho-Hsiang Wu, Oriol Nieto, Juan Pablo Bello, Justin Salamon, Proc. of the 48th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Rhodes, Greece, 2023. arXiv
"Music Enhancement Via Image Translation and Vocoding", Nikhil Kandpal, Oriol Nieto, Zeyu Jin, Proc. of the 47th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Singapore, 2022. arXiv Code
"Deep Embeddings and Section Fusion Improve Music Segmentation", Justin Salamon, Oriol Nieto, Nicholas J. Bryan, Proc. of the 22nd International Society for Music Information Retrieval Conference (ISMIR), pp. 594-601, 2021. PDF
"Multimodal Metric Learning for Tag-Based Music Retrieval", Minz Won, Sergio Oramas, Oriol Nieto, Fabien Gouyon, Xavier Serra, Proc. of the 46th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Toronto, Canada, 2021. arXiv
"Audio-Based Music Structure Analysis: Current Trends, Open Challenges, and Applications", Oriol Nieto, Gautham J. Mysore, Cheng-i Wang, Jordan B. L. Smith, Jan Schlüter, Thomas Grill, Brian McFee, Transactions of the International Society for Music Information Retrieval (TISMIR), 3(1), pp. 246-263, 2020. DOI: 10.5334/tismir.54. PDF
"Mood Classification Using Listening Data", Filip Korzeniowski, Oriol Nieto, Matthew McCallum, Minz Won, Sergio Oramas, Erik Schmidt, Proc. of the 21st International Society for Music Information Retrieval Conference (ISMIR). Montreal, Quebec, Canada, 2020. arXiv
"Data-Driven Harmonic Filters For Audio Representation Learning", Minz Won, Sanghyuk Chun, Oriol Nieto, Xavier Serra, Proc. of the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2020. PDF
"The Harmonix Set: Beats, Downbeats, and Functional Segment Annotations of Western Popular Music", Oriol Nieto, Matthew McCallum, Matthew Davies, Andrew Robertson, Adam Stark, Eran Egozy, Proc. of the 20th International Society for Music Information Retrieval Conference (ISMIR). Delft, The Netherlands, 2019. PDF Code
"Investigating Musical Pattern Ambiguity in a Human Annotated Dataset", Iris Yuping Ren, Oriol Nieto, Hendrik Vincent Koops, Anja Volk, Wouter Swierstra, Proc. of the 15th International Conference on Music Perception and Cognition (ICMPC). Graz, Austria, 2018. PDF
🏆 Best Student Paper
"End-to-End Learning for Music Audio Tagging at Scale", Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, Xavier Serra, Proc. of the 19th International Society for Music Information Retrieval Conference (ISMIR). Paris, France, 2018. arXiv
"Multimodal Deep Learning for Music Genre Classification", Sergio Oramas, Francesco Barbieri, Oriol Nieto, Xavier Serra, Transactions of the International Society for Music Information Retrieval (TISMIR), 2018. arXiv
"Predicting Audio Advertisement Quality", Samaneh Ebrahimi, Hossein Vahabi, Matthew Prockup, Oriol Nieto, Proc. of the 11th ACM International Conference on Web Search and Data Mining (WSDM), 2018. arXiv
"A Deep Multimodal Approach for Cold-start Music Recommendation", Sergio Oramas, Oriol Nieto, Mohamed Sordo, Xavier Serra, Proc. of the 2nd Workshop on Deep Learning for Recommender Systems (DLRS), at RecSys. Como, Italy, 2017. arXiv
"Evaluating Hierarchical Structure in Music Annotations", Brian McFee, Oriol Nieto, Morwaread M. Farbood, Juan Pablo Bello, Frontiers in Psychology, 8, 2017. DOI: 10.3389/fpsyg.2017.01337. PDF
🏆 Best Presentation
"Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features", Sergio Oramas, Oriol Nieto, Francesco Barbieri, Xavier Serra, Proc. of the 18th International Society of Music Information Retrieval Conference (ISMIR). Suzhou, China, 2017. arXiv
"Systematic Exploration of Computational Music Structure Research", Oriol Nieto, Juan Pablo Bello, Proc. of the 17th International Society for Music Information Retrieval Conference (ISMIR). New York City, NY, USA, 2016. PDF Code
"Hierarchical Evaluation of Segment Boundary Detection", Brian McFee, Oriol Nieto, Juan Pablo Bello, Proc. of the 16th International Society for Music Information Retrieval Conference (ISMIR). Málaga, Spain, 2015. PDF
"librosa: Audio and Music Signal Analysis in Python", Brian McFee, Colin Raffel, Dawen Liang, Daniel P. W. Ellis, Matt McVicar, Eric Battenberg, Oriol Nieto, Proc. of the 14th Python in Science Conference (SciPy). Austin, TX, USA, 2015. PDF
"Music Segment Similarity Using 2D-Fourier Magnitude Coefficients", Oriol Nieto, Juan Pablo Bello, Proc. of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Florence, Italy, 2014. PDF
🏆 Best Poster Presentation
"MIR_EVAL: A Transparent Implementation of Common MIR Metrics.", Colin Raffel, Brian McFee, Eric J. Humphrey, Justin Salamon, Oriol Nieto, Dawen Liang, Daniel P. W. Ellis, Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014. PDF
"Identifying Polyphonic Patterns from Audio Recordings Using Music Segmentation Techniques", Oriol Nieto, Morwaread M. Farbood, Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014. PDF
"Embodying Theoretical Research in Music Cognition: Four Proposals for Theory-Driven Experimentation", Andreu Ballús, Eric Arnau, Oriol Nieto, Frederic Font, Alba G. Torrents, Proc. of the Annual Meeting of the Cognitive Science Society. Quebec City, Quebec, Canada, 2014. PDF
"Convex Non-Negative Matrix Factorization for Automatic Music Structure Identification", Oriol Nieto, Tristan Jehan, Proc. of the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vancouver, BC, Canada, 2013. PDF
"Data Driven and Discriminative Projections for Large-Scale Cover Song Identification", Eric J. Humphrey, Oriol Nieto, Juan Pablo Bello, Proc. of the 14th International Society for Music Information Retrieval Conference (ISMIR). Curitiba, Brazil, 2013. PDF
"Unsupervised Clustering of Extreme Vocal Effects", Oriol Nieto, Proc. of the 10th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research (AQL). Cincinnati, OH, USA, 2013. PDF
"Fortissimo: Force-Feedback for Mobile Devices", Tae Hong Park, Oriol Nieto, Proc. of the 13th International Conference on New Interfaces for Musical Expression (NIME). Daejeon and Seoul, Korea, 2013. PDF
"Even More Tactile Feedback for Mobile Devices", Tae Hong Park, Langdon Crawford, Oriol Nieto, Proc. of the 39th International Computer Music Conference (ICMC). Perth, Australia, 2013. PDF
"Perceptual Evaluation of Automatically Extracted Musical Motives", Oriol Nieto, Morwaread M. Farbood, Proc. of the 12th International Conference on Music Perception and Cognition (ICMPC), pp. 723-727. Thessaloniki, Greece, 2012. PDF
"Compressing Music Recordings Into Audio Summaries", Oriol Nieto, Eric J. Humphrey, Juan Pablo Bello, Proc. of the 13th International Society for Music Information Retrieval Conference (ISMIR), pp. 313-318. Porto, Portugal, 2012. PDF

Algorithms

"MIREX 2016 Entry: MSAF V0.1.0 Submission", Oriol Nieto, Music Information Retrieval Evaluation eXchange (MIREX). New York City, NY, USA, 2016. PDF Code
"MIREX 2014 Entry: 2D Fourier Magnitude Coefficients", Oriol Nieto, Juan Pablo Bello, Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
"MIREX 2014 Entry: Music Segmentation Techniques and Greedy Path Finder Algorithm to Discover Musical Patterns", Oriol Nieto, Morwaread M. Farbood, Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
"MIREX 2014 Entry: Convex Non-negative Matrix Factorization", Oriol Nieto, Tristan Jehan, Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
"MIREX 2013: Discovering Musical Patterns Using Audio Structural Segmentation Techniques", Oriol Nieto, Morwaread M. Farbood, Music Information Retrieval Evaluation eXchange (MIREX). Curitiba, Brazil, 2013. PDF

Theses

"Discovering Structure in Music: Automatic Approaches and Perceptual Evaluations", Oriol Nieto, New York University. PhD Dissertation, 2015. PDF Slides Video
"Voice Transformations for Extreme Vocal Effects", Oriol Nieto, Pompeu Fabra University. Master's Thesis, 2008. PDF
"Desenvolupament Open Source per a E-Learning-II", Oriol Nieto, Polytechnic University of Catalonia. Undergrad's Thesis, 2007. PDF

Selected Talks

"Project Sound Stager", Oriol Nieto, Adobe MAX Sneaks 2025. Los Angeles, CA, USA, 2025. Video
"GenAI for Sound Design", Oriol Nieto, Conversational AI Reading Group at Mila. Montreal, Quebec, Canada, 2025. Video
"Overview, Challenges, and Applications of Audio-based Music Structure Analysis", Oriol Nieto, Women in Music Information Retrieval Workshop (ISMIR). Virtual, 2021. Slides
"Music Recommendation with Waveform-based Architectures", Oriol Nieto, 4th Global AI Conference. Santa Clara, CA, USA, 2020. Slides
"Spectral Analysis and Detection of Extreme Vocal Effects (with CNNs)", Oriol Nieto, Research Seminar. Universitat Pompeu Fabra. Barcelona, Spain, 2019. Slides
"Spectral Analysis and Detection of Extreme Vocal Effects", Oriol Nieto, 2nd International Symposium on Distorted Voices. São Paulo, Brazil, 2019. Slides
"Recommending Music with Waveform Architectures at Scale (Extended Version)", Oriol Nieto, Seminar Series in Data Science. University of San Francisco. San Francisco, CA, USA, 2019. Slides
"Recommending Music with Waveform Architectures at Scale", Oriol Nieto, Deep Learning Barcelona Symposium. Pompeu Fabra University. Barcelona, Spain, 2018. Slides Video
"Cold-Start Music Recommendation Using Multimodal Deep Architectures", Oriol Nieto, Systematic Approaches to Deep Learning Methods for Audio. Erwin Schrödinger Institute, University of Vienna. Vienna, Austria, 2017. PDF
"Long Tail Music Recommendation Using Deep Architectures", Oriol Nieto, International Workshop on Deep Learning for Music (IJCNN). Anchorage, AK, USA, 2017. PDF
"Deep Learning for Large-Scale Music Recommendation", Oriol Nieto, Data-Driven Research in Music Cognition. Stanford University. Stanford, CA, USA, 2017. PDF
"Deep Learning for Music Recommendation: Machine Listening and Collaborative Filtering", Oriol Nieto, Seminar on Music Knowledge Extraction Using Machine Learning. Pompeu Fabra University. Barcelona, Spain, 2016. PDF
"Deep Learning for Large Scale Music Recommendation", Oriol Nieto, Biostat Seminar. Stanford University. Stanford, CA, USA, 2016. PDF
"Multiple Annotations and Subjectivity in the Identification of Segment Boundaries in Music", Oriol Nieto, Morwaread M. Farbood, Cognitive Music Information Retrieval (CogMIR). Toronto, ON, Canada, 2014. PDF
"Music Segment Similarity Using 2D-Fourier Magnitude Coefficients", Oriol Nieto, Juan Pablo Bello, North East Music Information Special Interest Group (NEMISIG). New York, NY, USA, 2014. PDF
"A Perceptually Based Evaluation of Music Boundaries", Oriol Nieto, Morwaread M. Farbood, Juan Pablo Bello, Cognitive Music Information Retrieval (CogMIR). Toronto, ON, Canada, 2013. PDF
"Music Structure Analysis and New Musical Interfaces", Oriol Nieto, Pompeu Fabra University. Barcelona, Spain, 2013. PDF
"Music Structure Analysis by Matrix Factorization", Oriol Nieto, Tristan Jehan, North East Music Information Special Interest Group (NEMISIG). Boston, MA, USA, 2013. PDF

Music

"La Bossa d'Urina: El Primer Disc", Daniel Bolsa, Oriol Nieto, Published by Record Union, 2022. Pandora Spotify Amazon
"Rumbahía: Casi al Compás", Luis Carlos Cobo, Javier Cardona, Erin E. Grant, Diego Melendo, Oriol Nieto, Published by CDBaby, 2021. Pandora Spotify Amazon
"Rumbahía: Aprendiendo", Luis Carlos Cobo, Javier Cardona, Jess Gallegos, Diego Melendo, Oriol Nieto, Published by CDBaby, 2019. Pandora Spotify Amazon
"La Bossa d'Urina: Merda Fina", Daniel Bolsa, Oriol Nieto, Published by Record Union, 2018. Pandora iTunes Spotify Amazon
"Arkaen: Arkaen", Oriol Nieto, Sean Henson, Joey Nuñez, Eli Remas, Garey Rickher, Published by Record Union, 2017. Pandora iTunes Spotify Amazon
"La Bossa d'Urina: La Bossa d'Urina", Daniel Bolsa, Oriol Nieto, Published by Cydonia Records, 2015. Pandora iTunes Spotify Amazon
"Sargon: Vida", Carles Ferreiro, Jordi Llobet, Oriol Nieto, Marc Prim, Album edited by Weight Recordings, 2009. Pandora iTunes Spotify Amazon
"Sargon: Transcriptions", Carles Ferreiro, Jordi Llobet, Oriol Nieto, Marc Prim, Album edited by Big Bang Records, 2005. iTunes Spotify

Other

"Automatic Music Tagging with Harmonic CNN", Minz Won, Sanghyuk Chun, Oriol Nieto, Xavier Serra, Late Breaking Session of the International Society for Music Information Retrieval Conference (ISMIR). Delft, The Netherlands, 2019. PDF
"MSAF: Music Structure Analysis Framework", Oriol Nieto, Juan Pablo Bello, International Society for Music Information Retrieval Conference (ISMIR). Málaga, Spain, 2015. PDF
"2013 Late Break Session on Music Segmentation", Oriol Nieto, Jordan B. L. Smith, Proc. of the 14th International Society for Music Information Retrieval Conference (ISMIR). Curitiba, Brazil, 2013. PDF
"Late-break Session on Music Structure Analysis", Bruno Rocha, Jordan B. L. Smith, Geoffroy Peeters, Joe Cheri Ross, Oriol Nieto, Jan Van Balen, Proc. of the 13th International Society for Music Information Retrieval Conference (ISMIR). Porto, Portugal, 2012. PDF
"Sistemas Operativos: Cuaderno de Laboratorio", Oriol Nieto, Àlex Pajuelo, David López, Amador Millan, A. Heredero, Alex Duran, José R. Herrero, Xavier Verdú, Yolanda Becerra, Enric Morancho, Department of Computer Architecture. Polytechnic University of Catalonia, 2007.

Oriol (Uri) Nieto

Publications

Peer Reviewed Articles

Algorithms

Theses

Selected Talks

Music

Other