Talk to Me, Jarvis: Teaching Autonomous Driving Software to Speak with the Help of LLMs

Institut
Lehrstuhl für Fahrzeugtechnik
Typ
Semesterarbeit / Masterarbeit /
Inhalt
experimentell / theoretisch /  
Beschreibung

Background

As autonomous driving technology advances, vehicles are achieving unprecedented levels of autonomy. However, a significant gap remains in their ability to explain their decisions and actions in real-time. Transparent communication between autonomous vehicles and their users is critical for trust, safety, and effective human-robot interaction. By leveraging Large Language Models (LLMs), we aim to develop a framework that enables an autonomous driving software to explain its current actions, future predictions, and underlying reasoning in human-understandable language.

Objectives

This thesis aims to create a communication framework for autonomous driving software that translates complex internal processes into natural language explanations, thus improving transparency. Key objectives include:

  1. Enhanced System Transparency: Enable an autonomous driving system to articulate its real-time actions, planned maneuvers, and potential risks.
  2. Multi-modal Data Integration for Narration: Link perception, prediction, and planning data to create a seamless narration that reflects both what the system is doing and why.
  3. Speech-to-Command Interface: Evaluate the possibility of a translation layer using LLMs that converts spoken instructions into high-level commands via a specefied interface, allowing for natural language interaction with the vehicle.

Literature and Related Work

Recent advancements in autonomous driving have focused on enhancing transparency through interpretability and explainability frameworks. These efforts leverage LLMs for generating understandable narratives, helping users grasp the “why” and “how” behind autonomous actions. Innovations in linking AD systems' intermediate outputs to natural language, as well as efforts in creating adaptable language models, show promise in enabling real-time, contextually appropriate explanations that support trust and improve user interaction. These advancements are foundational for aligning LLM-generated language with AD systems and suggest that such integration could enhance both interpretability and responsiveness in autonomous driving.

Methodology

  1. LLM-based Real-Time Narration:

    • Develop a framework that links internal communication of the software stack to an LLM and create an event based intermediate layer for real-time narration. 
  2. LLM-Based Speech-to-Command Interface:

    • Leverage the LLM to interpret spoken instructions and translate them into structured, high-level software commands. This involves designing a translation layer that can parse speech inputs, map them to system-recognized commands, and interact with the ADS interface for execution.
  3. Experiment Design and Evaluation Metrics:

    • Truthfulness Metric: Develop a truthfulness metric to evaluate the accuracy of the narration against the ADS’s actual states, predictions, and actions. This metric will assess whether the LLM-based narration aligns accurately with what the ADS is doing and planning.
    • Execution Accuracy Metric: Design a metric to assess the correctness of speech-to-command translations. This will measure the successful execution of user instructions, verifying that commands are accurately interpreted and implemented by the ADS.

 

 

Voraussetzungen

Requirements

Candidates should have:

  • Strong programming skills in Python.
  • First experience with ROS2 would be a bonus
  • Experience in autonomous driving, machine learning, or large language models.
  • A creative, self-motivated approach to working on cutting-edge problems in autonomous systems transparency.

Teamwork

Applications for this thesis project from individuals or teams of up to two people are welcome.

Please attach a CV and a grade sheet to your application

Industriepartner
Mercedes AMG
Verwendete Technologien
Python, ROS2, Programming, Autonomous Driving, Machine Learning, Deep Learning, Large Language Models
Tags
FTM Studienarbeit, FTM AV, FTM Werner, FTM Informatik
Möglicher Beginn
sofort
Kontakt
Frederik Werner, M.Sc.
Raum: MW 3508
Tel.: +49.89.289.10341
frederik.wernertum.de
Ausschreibung