Speaker
Description
During pre-training a Large Language Model (LLM) learns language features such as syntax, grammar and, to a certain degree, semantics. In this process, the model not only acquires language, but also implicitly acquires knowledge. In this sense knowledge is a byproduct of language acquisition. This characteristic is inherent in the architecture of modern LLMs. Hence, much like language features, knowledge is also learned in a fuzzy manner by these models, leading to difficulties in understanding highly specific concepts and a tendency to generate hallucinations.
This talk will explore various techniques and strategies for sharpening fuzzy knowledge in LLMs with domain-specific information. Here, our focus is on lightweight methods, that require no further pre-training. We will examine direct text injection strategies for LLM encoders (such as triple injection and K-BERT), integration of additional features using multilayer perceptrons (MLPs), and the use of modular lightweight components with knowledge-based adapters. Furthermore, we will investigate injection techniques for LLM decoders that go beyond simple prompting, such as RAGs, and how these can be enhanced by a multi-agent architecture.