KonKIS 24 - Konferenz der deutschen KI-Servicezentren 2024 (Conference of the German AI Service Centers 2024)

Name: KonKIS 24 - Konferenz der deutschen KI-Servicezentren 2024 (Conference of the German AI Service Centers 2024)
Start: 2024-09-18T11:00:00+02:00
End: 2024-09-19T17:40:00+02:00
Location: Göttingen, Alte Mensa

18–19 Sept 2024

Göttingen, Alte Mensa

Europe/Berlin timezone

Steering Large Language Models (LLMs) using Sparse Autoencoders (SAEs)

19 Sept 2024, 11:00

15m

Hannah-Vogt-Saal

Session 3. Large Language Models 🇬🇧 Session 3. Large Language Models

Lalith Manjunath (GNOI)

A brief into on how Sparse Autoencoders (SAE) can be leveraged to extract interpretable, monosemantic features from the opaque intermediate activations of LLMs, providing a window into their internal representations. And we hope to initiate discussions on the methodology of training SAEs on LLM activations, the resulting sparse and high-dimensional representations, and how these can be utilized for model steering tasks.
We’ll examine a case study demonstrating the effectiveness of this approach in changing the level of model “proficiency”. This discussion aims to highlight the potential of SAEs as a scalable, unsupervised method for disentangling LLM behaviors, contributing to the broader goals of AI interpretability and alignment.

Lalith Manjunath (GNOI)

There are no materials yet.

KonKIS 24 - Konferenz der deutschen KI-Servicezentren 2024 (Conference of the German AI Service Centers 2024)

Steering Large Language Models (LLMs) using Sparse Autoencoders (SAEs)

Hannah-Vogt-Saal

Speaker

Description

Author

Presentation materials

Choose timezone

KonKIS 24 - Konferenz der deutschen KI-Servicezentren 2024 (Conference of the German AI Service Centers 2024)

Speaker

Description

Author

Presentation materials