Description
Chair: Dr. Joachim Köhler (Fraunhofer IAIS, WestAI)
Moderation: Laszlo Friedmann (Fraunhofer IAIS, WestAI)
Content / Abstract:
With the rapid progress in the field of large language models in recent years, the transfer of the un-derlying technology, i.e. foundation models, to new modalities has become one of the most im-portant research topics in artificial intelligence. With the introduction of CLIP at latest, this develop-ment has been extended to multi-modal foundation models which are able to process different types of modalities, such as images or text. Due to their outstanding properties, such as their excellent ze-ro- or few-shot capability, and their ability to process different modalities, multi-modal foundation models offer huge potential across domains and applications. The overall scope of this session is therefore intended to be broad and to cover all topics related to multi-modal foundation models.
Topics of interest include:
- Vision / sound / language / ... models in any possible combination
- Data- and energy-efficient pre-training
- Methodologies for efficient transfer and model compression
- Application to specific domains
- Ethics, risks, and fairness
- Securing private data
Breakthroughs in strongly transferable learning were achieved by training models that use simple, generic losses and large amounts of generic, diverse web-scale data. Crucial for the progress was increasing pre-training scales, that is model, compute and dataset scales employed in the training. Derived scaling laws suggest that generalization and transferability improve when increasing scales...
The recent success of large language models (LLMs) like GPT and BERT has demonstrated the immense capabilities of transformer-based architectures on natural language processing (NLP) tasks, such as text generation, translation, and summarization, setting new benchmarks in Artificial Intelligence (AI) performance. Building on this momentum, the AI research community is increasingly focusing on...
The emergence of multi-modal foundation models, i.e., large AI models pre-trained on vast amounts of data of different modalities, which show emergent behavior and generalization ability over a set of different tasks, brings enormous possibilities across industries. Specific use cases in the production sector around this technology are however currently scarcely available. Therefore, this...
Scholarly knowledge curation encounters challenges due to the diverse methodologies across scientific fields. Tailored approaches and protocols are essential to address these challenges, considering each domain's unique characteristics. Machine assistance, particularly through Large Language Models (LLMs) such as GPT-3.5, offers significant potential to navigate these complexities and enhance...