Speaker
Description
We stand at the threshold of a transformative period, defined by the remarkable advancements in large language models (LLMs). Given their prowess, there's a burgeoning interest in expanding LLMs to vision and language (VL) tasks, where models harness the capabilities of LLMs to analyze both visual and textual data concurrently.
In this talk, I will introduce our research that delves into utilizing VL models, fortified with LLMs, to predict driving hazards that drivers may encounter while driving a car. This challenge compels VL models to forecast and reason about imminent events from ambiguous observations—a task characterized as visual abductive reasoning. Recognizing the nascent state of this domain, I will also unveil our novel dataset and the baseline methods we've developed to catalyze further inquiry.