3. Text+ Plenary 2024

Name: 3. Text+ Plenary 2024
Start: 2024-10-09T16:00:00+02:00
End: 2024-10-11T14:00:00+02:00
Location: Mannheim, Schloss

9–11 Oct 2024

Mannheim, Schloss

Europe/Berlin timezone

Contact

office@text-plus.org

Steering Large Language Models (LLMs) using Sparse Autoencoders (SAEs)

10 Oct 2024, 15:30

30m

Aula (Mannheim, Schloss)

Aula

Mannheim, Schloss

Schloss 68161 Mannheim

Text+ Plenary Tag 1

Lalith Manjunath (TU Dresden)

A brief into on how Sparse Autoencoders (SAE) can be leveraged to extract interpretable, monosemantic features from the opaque intermediate activations of LLMs, providing a window into their internal representations. And we hope to initiate discussions on the methodology of training SAEs on LLM activations, the resulting sparse and high-dimensional representations, and how these can be utilized for model steering tasks."
We’ll examine a case study demonstrating the effectiveness of this approach in changing the level of model “proficiency”. This discussion aims to highlight the potential of SAEs as a scalable, unsupervised method for disentangling LLM behaviors, contributing to the broader goals of AI interpretability and alignment.

L. Manjunath: Steering LLMs with Sparse Autoencoders - A Path Towards More Explainable and Safer AI

3. Text+ Plenary 2024

Contact

Steering Large Language Models (LLMs) using Sparse Autoencoders (SAEs)

Aula

Mannheim, Schloss

Speaker

Description

Presentation materials

Choose timezone

3. Text+ Plenary 2024

Contact

Speaker

Description

Presentation materials