KonKIS 24 - Konferenz der deutschen KI-Servicezentren 2024 (Conference of the German AI Service Centers 2024)

Name: KonKIS 24 - Konferenz der deutschen KI-Servicezentren 2024 (Conference of the German AI Service Centers 2024)
Start: 2024-09-18T11:00:00+02:00
End: 2024-09-19T17:40:00+02:00
Location: Göttingen, Alte Mensa

18–19 Sept 2024

Göttingen, Alte Mensa

Europe/Berlin timezone

High Fidelity Video Background Music Generation with Transformers

19 Sept 2024, 12:34

13m

Hannah-Vogt-Saal

Session 6. Multi-Modal Foundation Models 🇬🇧 Session 6. Multi-Modal Foundation Models

Yongli Mou (RWTH Aachen University)

The recent success of large language models (LLMs) like GPT and BERT has demonstrated the immense capabilities of transformer-based architectures on natural language processing (NLP) tasks, such as text generation, translation, and summarization, setting new benchmarks in Artificial Intelligence (AI) performance. Building on this momentum, the AI research community is increasingly focusing on extending the capabilities of LLMs to multi-modal data, giving rise to multimodal foundation models. The use of generative models for music generation has also been gaining popularity. In this study, we present the application of multi-modal foundation models in the domain of video background music generation. Current music generation models are predominantly controlled by a single input modality: text. Video input is one such modality, with remarkably different requirements for the generation of background music accompanying it. Even though alternative methods for generating video background music exist, none achieve a music quality and diversity that is comparable to the text-based models. We adapt the text-based models to accept video as an alternative input modality for the control of the audio generation process and evaluate our approach quantitatively and qualitatively through the analysis of exemplary results in terms of audio quality and through a case study to determine the users’ perspective on the video-audio correspondence of our results.

Yongli Mou (RWTH Aachen University)

Mr Niklas Schulte (RWTH Aachen University) Prof. Stefan Decker (RWTH Aachen University)

There are no materials yet.

KonKIS 24 - Konferenz der deutschen KI-Servicezentren 2024 (Conference of the German AI Service Centers 2024)

High Fidelity Video Background Music Generation with Transformers

Hannah-Vogt-Saal

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

KonKIS 24 - Konferenz der deutschen KI-Servicezentren 2024 (Conference of the German AI Service Centers 2024)

Speaker

Description

Author

Co-authors

Presentation materials