Publications
For further stats and details, check out my Google Scholar Profile.
Peer Reviewed Articles
- "AudioChat: Unified Audio Storytelling, Editing, and Understanding with Transfusion Forcing", , (under review), 2026. arXiv Demo
- "TAC: Timestamped Audio Captioning", , (under review), 2026. arXiv Demo
- "Generative Audio Extension and Morphing", , Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
- "Mix2Morph: Learning Sound Morphing From Noisy Mixes", , Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
- "PromptSep: Generative Audio Separation Via Multimodal Prompting", , Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
- "AudioCards: Structured Metadata Improves Audio Language Models For Sound Design", , Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv Demo
- "Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning", , Proc. of the 51st International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2026. arXiv
- "SoundStager: Interactive Design of Story-Driven GenAI Soundscapes for Video", , Proc. of the ACM Conference on Human Factors in Computing Systems (CHI). Barcelona, Spain, 2026. PDF Video
- "SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation", , Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Tahoe City, CA, USA, 2025. arXiv Demo
- "FLAM: Frame-Wise Language-Audio Modeling", , Proc. of the 47th International Conference on Machine Learning (ICML). Vancouver, BC, Canada, 2025. arXiv Code Demo
- "Video-Guided Foley Sound Generation with Multimodal Controls", , The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). Nashville, TN, USA, 2025. arXiv Demo
- "Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs", , Proc. of the 13th International Conference on Learning Representations (ICLR). Singapore, 2025. arXiv Demo
-
🏆 Top 5.1% conference paper (spotlighted)"MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark", , Proc. of the 13th International Conference on Learning Representations (ICLR). Singapore, 2025. arXiv Code Demo
- "Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations", , Proc. of the 50th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Hyderabad, India, 2025. arXiv Demo
- "ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds", , Proc. of the 50th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Hyderabad, India, 2025. arXiv Code
- "Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning", , Proc. of the 25th International Society for Music Information Retrieval Conference (ISMIR). San Francisco, CA, USA, 2024. arXiv
-
🏆 Top 5% conference paper (oral)"GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities", , Proc. of the 19th Empirical Methods in Natural Language Processing Conference (EMNLP). Miami, Florida, USA, 2024. arXiv Code Demo
- "CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models", , Proc. of the 12th International Conference on Learning Representations (ICLR). Vienna, Austria, 2024. arXiv Code Demo
- "Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries", , Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, USA, 2023. arXiv Demo
- "Efficient Spoken Language Recognition via Multilabel Classification", , Proc. of the 24th InterSpeech Conference. Dublin, Ireland, 2023. arXiv
-
🏆 Top 10% conference paper (highlighted)"Language-Guided Audio-Visual Source Separation via Trimodal Consistency", , Proc. of the IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). Vancouver, BC, Canada, 2023. arXiv Code
- "Audio-Text Models Do Not Yet Leverage Natural Language", , Proc. of the 48th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Rhodes, Greece, 2023. arXiv
- "Music Enhancement Via Image Translation and Vocoding", , Proc. of the 47th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Singapore, 2022. arXiv Code
- "Deep Embeddings and Section Fusion Improve Music Segmentation", , Proc. of the 22nd International Society for Music Information Retrieval Conference (ISMIR), pp. 594-601, 2021. PDF
- "Multimodal Metric Learning for Tag-Based Music Retrieval", , Proc. of the 46th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Toronto, Canada, 2021. arXiv
- "Audio-Based Music Structure Analysis: Current Trends, Open Challenges, and Applications", , Transactions of the International Society for Music Information Retrieval (TISMIR), 3(1), pp. 246-263, 2020. DOI: 10.5334/tismir.54. PDF
- "Mood Classification Using Listening Data", , Proc. of the 21st International Society for Music Information Retrieval Conference (ISMIR). Montreal, Quebec, Canada, 2020. arXiv
- "Data-Driven Harmonic Filters For Audio Representation Learning", , Proc. of the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, 2020. PDF
- "The Harmonix Set: Beats, Downbeats, and Functional Segment Annotations of Western Popular Music", , Proc. of the 20th International Society for Music Information Retrieval Conference (ISMIR). Delft, The Netherlands, 2019. PDF Code
- "Investigating Musical Pattern Ambiguity in a Human Annotated Dataset", , Proc. of the 15th International Conference on Music Perception and Cognition (ICMPC). Graz, Austria, 2018. PDF
-
🏆 Best Student Paper"End-to-End Learning for Music Audio Tagging at Scale", , Proc. of the 19th International Society for Music Information Retrieval Conference (ISMIR). Paris, France, 2018. arXiv
- "Multimodal Deep Learning for Music Genre Classification", , Transactions of the International Society for Music Information Retrieval (TISMIR), 2018. arXiv
- "Predicting Audio Advertisement Quality", , Proc. of the 11th ACM International Conference on Web Search and Data Mining (WSDM), 2018. arXiv
- "A Deep Multimodal Approach for Cold-start Music Recommendation", , Proc. of the 2nd Workshop on Deep Learning for Recommender Systems (DLRS), at RecSys. Como, Italy, 2017. arXiv
- "Evaluating Hierarchical Structure in Music Annotations", , Frontiers in Psychology, 8, 2017. DOI: 10.3389/fpsyg.2017.01337. PDF
-
🏆 Best Presentation"Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features", , Proc. of the 18th International Society of Music Information Retrieval Conference (ISMIR). Suzhou, China, 2017. arXiv
- "Systematic Exploration of Computational Music Structure Research", , Proc. of the 17th International Society for Music Information Retrieval Conference (ISMIR). New York City, NY, USA, 2016. PDF Code
- "Hierarchical Evaluation of Segment Boundary Detection", , Proc. of the 16th International Society for Music Information Retrieval Conference (ISMIR). Málaga, Spain, 2015. PDF
- "librosa: Audio and Music Signal Analysis in Python", , Proc. of the 14th Python in Science Conference (SciPy). Austin, TX, USA, 2015. PDF
- "Music Segment Similarity Using 2D-Fourier Magnitude Coefficients", , Proc. of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Florence, Italy, 2014. PDF
-
🏆 Best Poster Presentation"MIR_EVAL: A Transparent Implementation of Common MIR Metrics.", , Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014. PDF
- "Identifying Polyphonic Patterns from Audio Recordings Using Music Segmentation Techniques", , Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014. PDF
- "Embodying Theoretical Research in Music Cognition: Four Proposals for Theory-Driven Experimentation", , Proc. of the Annual Meeting of the Cognitive Science Society. Quebec City, Quebec, Canada, 2014. PDF
- "Convex Non-Negative Matrix Factorization for Automatic Music Structure Identification", , Proc. of the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vancouver, BC, Canada, 2013. PDF
- "Data Driven and Discriminative Projections for Large-Scale Cover Song Identification", , Proc. of the 14th International Society for Music Information Retrieval Conference (ISMIR). Curitiba, Brazil, 2013. PDF
- "Unsupervised Clustering of Extreme Vocal Effects", , Proc. of the 10th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research (AQL). Cincinnati, OH, USA, 2013. PDF
- "Fortissimo: Force-Feedback for Mobile Devices", , Proc. of the 13th International Conference on New Interfaces for Musical Expression (NIME). Daejeon and Seoul, Korea, 2013. PDF
- "Even More Tactile Feedback for Mobile Devices", , Proc. of the 39th International Computer Music Conference (ICMC). Perth, Australia, 2013. PDF
- "Perceptual Evaluation of Automatically Extracted Musical Motives", , Proc. of the 12th International Conference on Music Perception and Cognition (ICMPC), pp. 723-727. Thessaloniki, Greece, 2012. PDF
- "Compressing Music Recordings Into Audio Summaries", , Proc. of the 13th International Society for Music Information Retrieval Conference (ISMIR), pp. 313-318. Porto, Portugal, 2012. PDF
Algorithms
- "MIREX 2016 Entry: MSAF V0.1.0 Submission", , Music Information Retrieval Evaluation eXchange (MIREX). New York City, NY, USA, 2016. PDF Code
- "MIREX 2014 Entry: 2D Fourier Magnitude Coefficients", , Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
- "MIREX 2014 Entry: Music Segmentation Techniques and Greedy Path Finder Algorithm to Discover Musical Patterns", , Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
- "MIREX 2014 Entry: Convex Non-negative Matrix Factorization", , Music Information Retrieval Evaluation eXchange (MIREX). Taipei, Taiwan, 2014. PDF Code
- "MIREX 2013: Discovering Musical Patterns Using Audio Structural Segmentation Techniques", , Music Information Retrieval Evaluation eXchange (MIREX). Curitiba, Brazil, 2013. PDF
Theses
- "Discovering Structure in Music: Automatic Approaches and Perceptual Evaluations", , New York University. PhD Dissertation, 2015. PDF Slides Video
- "Voice Transformations for Extreme Vocal Effects", , Pompeu Fabra University. Master's Thesis, 2008. PDF
- "Desenvolupament Open Source per a E-Learning-II", , Polytechnic University of Catalonia. Undergrad's Thesis, 2007. PDF
Selected Talks
- "Project Sound Stager", , Adobe MAX Sneaks 2025. Los Angeles, CA, USA, 2025. Video
- "GenAI for Sound Design", , Conversational AI Reading Group at Mila. Montreal, Quebec, Canada, 2025. Video
- "Overview, Challenges, and Applications of Audio-based Music Structure Analysis", , Women in Music Information Retrieval Workshop (ISMIR). Virtual, 2021. Slides
- "Music Recommendation with Waveform-based Architectures", , 4th Global AI Conference. Santa Clara, CA, USA, 2020. Slides
- "Spectral Analysis and Detection of Extreme Vocal Effects (with CNNs)", , Research Seminar. Universitat Pompeu Fabra. Barcelona, Spain, 2019. Slides
- "Spectral Analysis and Detection of Extreme Vocal Effects", , 2nd International Symposium on Distorted Voices. São Paulo, Brazil, 2019. Slides
- "Recommending Music with Waveform Architectures at Scale (Extended Version)", , Seminar Series in Data Science. University of San Francisco. San Francisco, CA, USA, 2019. Slides
- "Recommending Music with Waveform Architectures at Scale", , Deep Learning Barcelona Symposium. Pompeu Fabra University. Barcelona, Spain, 2018. Slides Video
- "Cold-Start Music Recommendation Using Multimodal Deep Architectures", , Systematic Approaches to Deep Learning Methods for Audio. Erwin Schrödinger Institute, University of Vienna. Vienna, Austria, 2017. PDF
- "Long Tail Music Recommendation Using Deep Architectures", , International Workshop on Deep Learning for Music (IJCNN). Anchorage, AK, USA, 2017. PDF
- "Deep Learning for Large-Scale Music Recommendation", , Data-Driven Research in Music Cognition. Stanford University. Stanford, CA, USA, 2017. PDF
- "Deep Learning for Music Recommendation: Machine Listening and Collaborative Filtering", , Seminar on Music Knowledge Extraction Using Machine Learning. Pompeu Fabra University. Barcelona, Spain, 2016. PDF
- "Deep Learning for Large Scale Music Recommendation", , Biostat Seminar. Stanford University. Stanford, CA, USA, 2016. PDF
- "Multiple Annotations and Subjectivity in the Identification of Segment Boundaries in Music", , Cognitive Music Information Retrieval (CogMIR). Toronto, ON, Canada, 2014. PDF
- "Music Segment Similarity Using 2D-Fourier Magnitude Coefficients", , North East Music Information Special Interest Group (NEMISIG). New York, NY, USA, 2014. PDF
- "A Perceptually Based Evaluation of Music Boundaries", , Cognitive Music Information Retrieval (CogMIR). Toronto, ON, Canada, 2013. PDF
- "Music Structure Analysis and New Musical Interfaces", , Pompeu Fabra University. Barcelona, Spain, 2013. PDF
- "Music Structure Analysis by Matrix Factorization", , North East Music Information Special Interest Group (NEMISIG). Boston, MA, USA, 2013. PDF
Music
- "La Bossa d'Urina: El Primer Disc", , Published by Record Union, 2022. Pandora Spotify Amazon
- "Rumbahía: Casi al Compás", , Published by CDBaby, 2021. Pandora Spotify Amazon
- "Rumbahía: Aprendiendo", , Published by CDBaby, 2019. Pandora Spotify Amazon
- "La Bossa d'Urina: Merda Fina", , Published by Record Union, 2018. Pandora iTunes Spotify Amazon
- "Arkaen: Arkaen", , Published by Record Union, 2017. Pandora iTunes Spotify Amazon
- "La Bossa d'Urina: La Bossa d'Urina", , Published by Cydonia Records, 2015. Pandora iTunes Spotify Amazon
- "Sargon: Vida", , Album edited by Weight Recordings, 2009. Pandora iTunes Spotify Amazon
- "Sargon: Transcriptions", , Album edited by Big Bang Records, 2005. iTunes Spotify
Other
- "Automatic Music Tagging with Harmonic CNN", , Late Breaking Session of the International Society for Music Information Retrieval Conference (ISMIR). Delft, The Netherlands, 2019. PDF
- "MSAF: Music Structure Analysis Framework", , International Society for Music Information Retrieval Conference (ISMIR). Málaga, Spain, 2015. PDF
- "2013 Late Break Session on Music Segmentation", , Proc. of the 14th International Society for Music Information Retrieval Conference (ISMIR). Curitiba, Brazil, 2013. PDF
- "Late-break Session on Music Structure Analysis", , Proc. of the 13th International Society for Music Information Retrieval Conference (ISMIR). Porto, Portugal, 2012. PDF
- "Sistemas Operativos: Cuaderno de Laboratorio", , Department of Computer Architecture. Polytechnic University of Catalonia, 2007.