19–21 Sept 2023
Alte Mensa
Europe/Berlin timezone

Vision and language models meet visual abductive reasoning for predicting driving hazards

20 Sept 2023, 11:00
45m
Emmy Noether Room (Alte Mensa)

Emmy Noether Room

Alte Mensa

Wilhelmsplatz 3 37073 Göttingen
Data Science Keynotes

Speaker

Masanori Suganuma (Tohoku University)

Description

We stand at the threshold of a transformative period, defined by the remarkable advancements in large language models (LLMs). Given their prowess, there's a burgeoning interest in expanding LLMs to vision and language (VL) tasks, where models harness the capabilities of LLMs to analyze both visual and textual data concurrently.
In this talk, I will introduce our research that delves into utilizing VL models, fortified with LLMs, to predict driving hazards that drivers may encounter while driving a car. This challenge compels VL models to forecast and reason about imminent events from ambiguous observations—a task characterized as visual abductive reasoning. Recognizing the nascent state of this domain, I will also unveil our novel dataset and the baseline methods we've developed to catalyze further inquiry.

Primary author

Masanori Suganuma (Tohoku University)

Presentation materials