What Remains is the ECHO

The project „ECHO: Explaining Composition, Harmony & Orchestration“ was initiated by Prof. Dr. Oliver Bendel and implemented by Lucas Chingis Marty. The final presentation took place on February 19, 2026, at the FHNW Campus Brugg-Windisch. The bachelor’s thesis „ECHO: Explaining Composition, Harmony & Orchestration – A Multimodal AI System for Music Analysis and Education“ develops a local multimodal AI system for the analysis and accessible explanation of musical structures based on audio data. The objective is to bridge the gap between music information retrieval (MIR, an automatic audio analysis) and natural language explanation through large language models. The system combines multiple analysis components (tempo, key, chord, instrument, and melody recognition) with a locally operated large language model (Llama 3.1-8B) that translates the extracted data into understandable explanations for beginners and intermediate users. Retrieval-augmented generation (RAG), guardrails to reduce hallucinations, and a feedback and evaluation system are employed as part of the approach. The implementation is realized as a desktop application without cloud dependency. The evaluation includes technical measurements on datasets comprising several hundred music tracks as well as a small user study. The thesis demonstrates that a locally operated system can in principle present musical analysis in an understandable way, although clear accuracy limitations of the applied MIR methods remain. Opportunities could open up not only for music education, but also for the preservation of endangered music.

Fig.: A multimodal AI system for music analysis and education