Fachschaft Maschinenbau: BaSaMa & HiWi

Speech-Enabled VLA for Naturalistic Human-Vehicle Interaction

Institut

Typ

Semesterarbeit / Masterarbeit /

Inhalt

experimentell / theoretisch / konstruktiv /

Beschreibung

Join us in developing Vision-Language-Action (VLA) models that directly understand and act upon spoken instructions, paving the way for intuitive and highly personalized autonomous driving experiences!

Are you excited about creating human-like AI interfaces and pushing the boundaries of multimodal interaction? This project aims to move beyond clunky, text-based commands to enable natural, voice-driven control of autonomous vehicles.

Traditional speech-to-text systems fall short because they discard crucial non-semantic information (like voice tone or urgency) embedded in raw speech. This lost information can be vital for customized and nuanced responses. By integrating speech directly into the VLA's "robot policy," we can capture these subtle cues. Imagine an autonomous vehicle that can not only understand "drive more cautiously" but also interpret the urgency in your voice or adapt its style based on your personalized preferences conveyed through speech. This opens up exciting avenues for affective computing and personalized AI in autonomous vehicles, leading to a more empathetic and adaptable driving agent.

You'll extend cutting-edge VLA concepts, originally for robot manipulation, to the complex domain of autonomous driving. This will involve designing and training a VLA to interpret spoken commands alongside real-time visual inputs to generate appropriate driving actions. Your work will directly address a significant gap in human-AV interaction, holding the potential to revolutionize user experience, improve accessibility, and increase safety through more intuitive and responsive control.

Example Thesis Topics

End-to-End Speech-to-Action VLA for Driving: Develop and train a novel VLA architecture that directly maps raw speech commands and visual observations to driving actions (e.g., steering, acceleration, braking), bypassing intermediate text representations.
Personalized Driving Styles via Voiceprint Analysis: Investigate how to integrate voiceprint features into VLA training to allow drivers to express and adapt autonomous driving styles based on their unique voice profiles or emotional states.
Resolving Ambiguity in Spoken Driving Commands: Research and implement techniques within a speech-enabled VLA to identify and resolve ambiguities in natural language driving instructions (e.g., "turn left" in a complex intersection), possibly through visual context or follow-up clarification.
Synthesizing Multi-Modal Datasets for Speech-Enabled AVs: Develop methodologies for generating or augmenting datasets that pair driving scenarios with realistic, varied spoken commands, including different intonations and linguistic nuances.
Integrating Affective Computing with VLA Control: Explore how to interpret emotional cues from a driver's speech (e.g., stress, discomfort) and integrate this information into the VLA's decision-making process to enhance passenger comfort and safety.

&nbs

Technologies Used

Python, PyTorch/TensorFlow, Autonomous Driving, Vision-Language-Action Models (VLAs), Speech Recognition, Natural Language Processing (NLP), Multimodal AI, Deep Learning, Affective Computing, Personalization, Robotics, Human-Robot Interaction (HRI), Sensor Fusion, Simulation (e.g., CARLA, Waymo Open Dataset, nuScenes), Voice AI libraries.

Voraussetzungen

We're looking for students with a strong background in deep learning and a passion for robotics and autonomous systems.

Solid understanding of deep learning frameworks (PyTorch/TensorFlow).
Experience with natural language processing (NLP), speech recognition, or multimodal AI.
Familiarity with computer vision concepts.
Proficiency in Python.
Motivation to work on real-world applications and potentially with robotics/driving simulators.

If you're ready to make a tangible impact on the future of autonomous vehicles, send us an initiative application.

Please include:

A short motivation letter highlighting your interest in efficient AI, VLMs/VLAs, and autonomous driving.
Your CV.
A recent transcript of records.
(Optional) Any relevant project work or code samples demonstrating your experience in relevant fields.

Tags

AVS Brusnicki

Möglicher Beginn

sofort

Kontakt

Roberto Brusnicki
roberto.brusnickitum.de

Navigation

Navigation

Speech-Enabled VLA for Naturalistic Human-Vehicle Interaction

Example Thesis Topics

Technologies Used