BaSaMa - Fachschaft Maschinenbau: BaSaMa & HiWi

Development of a Benchmarking Framework for Vision-Language-Action Models in Autonomous Driving

Institute

Professur für autonome Fahrzeugsysteme (TUM-ED)

Type

Bachelor's Thesis / Semester Thesis / Master's Thesis /

Content

experimental / theoretical /

Description

[For English version see below]

Motivation

Die Entwicklung des autonomen Fahrens wird maßgeblich von dem Ziel getrieben, den Verkehr sicherer, effizienter und nachhaltiger zu gestalten. Da menschliche Fehler eine der Hauptursachen für Unfälle darstellen und das steigende Verkehrsaufkommen zusätzliche Herausforderungen mit sich bringt, bieten automatisierte Systeme ein erhebliches Potenzial zur Verbesserung der Sicherheit und des Verkehrsflusses.

Traditionell basieren autonome Fahrfunktionen auf modularen Architekturen mit getrennten Komponenten für Wahrnehmung, Prädiktion und Planung. Diese Trennung führt jedoch häufig zu komplexen Schnittstellen und erschwert die ganzheitliche Optimierung des Systems. Vor diesem Hintergrund gewinnen zunehmend sogenannte Foundation Models, insbesondere Vision-Language-Action-Modelle (VLAs), an Bedeutung. Sie verfolgen einen End-to-End-Ansatz, bei dem Fahrentscheidungen direkt aus multimodalen Eingaben gelernt werden, und versprechen eine bessere Generalisierung sowie eine vereinfachte Systemarchitektur.

Trotz dieser Vorteile ist die praktische Anwendung solcher Modelle auf realen Fahrzeugen bislang nur unzureichend untersucht. Insbesondere fehlen standardisierte Methoden, um unterschiedliche Ansätze reproduzierbar und vergleichbar zu evaluieren. Unterschiede in der Sensorik, der Plattform und der Einsatzumgebung erschweren zudem die Übertragbarkeit bestehender Modelle.

Diese Arbeit adressiert genau diese Lücke: Ziel ist die Entwicklung eines Frameworks, das die Integration und das systematische Benchmarking verschiedener VLAs auf dem Forschungsfahrzeug EDGAR ermöglicht. Damit wird eine Grundlage geschaffen, um moderne End-to-End-Ansätze unter realen Bedingungen vergleichbar zu bewerten und ihre Weiterentwicklung gezielt zu unterstützen.

Arbeitspakete

Literaturrecherche zu VLAs für autonomes Fahren und Real-World Testing
Auswahl und Spezifikation geeigneter Modelle
Konzeption des Benchmarking-Frameworks
Implementierung & Integration auf EDGAR
Szenariodesign & Versuchsplanung
Durchführung der Experimente & Benchmarking
Schriftliche Dokumentation der Vorgehensweise und der Ergebnisse

-------------------------------------------ENGLISH VERSION-------------------------------------------------

Motivation

The development of autonomous driving is largely driven by the goal of making transportation safer, more efficient, and more sustainable. Since human error remains one of the leading causes of traffic accidents, and increasing traffic volumes pose additional challenges, automated systems offer significant potential to improve both safety and traffic flow.

Traditionally, autonomous driving systems are based on modular architectures with separate components for perception, prediction, and planning. However, this separation often leads to complex interfaces, making holistic system optimization difficult. Against this backdrop, so-called foundation models, particularly Vision-Language-Action (VLA) models, are gaining increasing attention. These approaches follow an end-to-end paradigm, where driving decisions are learned directly from multimodal inputs, promising improved generalization and a simplified system architecture.

Despite these advantages, the practical deployment of such models on real vehicles has not yet been sufficiently explored. In particular, there is a lack of standardized methodologies for reproducible and comparable evaluation of different approaches. Furthermore, differences in sensor setups, platforms, and operating environments make the transfer of existing models challenging.

This thesis addresses exactly this gap: The goal is to develop a framework that enables the integration and systematic benchmarking of different VLA models on the research vehicle EDGAR. This provides a foundation for evaluating modern end-to-end approaches under real-world conditions and supports their targeted further development.

Work packages

Literature review on VLAs for autonomous driving and real-world testing
Selection and specification of suitable models
Design of a benchmarking framework
Implementation and integration on EDGAR
Scenario design and experiment planning
Execution of experiments and benchmarking
Documentation of methodology and results

Requirements

Voraussetzungen

Pflichtanforderungen

Selbstständige und strukturierte Arbeitsweise
Programmiererfahrung (vorzugsweise in Python, ROS2, C++)
Neugierige und analytische Denkweise
Teamfähigkeit
Grundkenntnisse im Bereich KI, insbesondere im Kontext von VLMs/VLAs
Gute Deutsch- und/oder Englischkenntnisse

Von Vorteil

Kenntnisse im Bereich autonomes Fahren und/oder Robotik
Erfahrung mit dem Deployment von VLMs/VLAs
Erfahrung mit Autoware

Bereit, Foundation Models auf den realen Straßenverkehr loszulassen? Die Arbeit kann ab sofort beginnen und auf Deutsch oder Englisch durchgeführt werden.

Bitte sende Deinen aktuellen Lebenslauf, Deinen Notenauszug und eine kurze Beschreibung Deiner Motivation an christian.oefinger@tum.de.

-------------------------------------------ENGLISH VERSION-------------------------------------------------

Requirements

Must-have

Independent and structured working style
Programming experience (preferably in Python, ROS2, or C++)
Curious and analytical mindset
Ability to work in a team
Basic knowledge in AI, particularly in the context of VLMs/VLAs
Good command of German and/or English

Nice-to-have

Knowledge in autonomous driving and/or robotics
Experience with deploying VLMs/VLAs
Experience with Autoware

Ready to bring foundation models to real-world traffic? The thesis can be started immediately and may be conducted in German or English.

Please send your CV, transcript of records, and a short statement of motivation to christian.oefinger@tum.de.

Tags

AVS Oefinger, AVS Informatik

Possible start

sofort

Contact

Christian Oefinger
christian.oefingertum.de

Navigation

Navigation

Development of a Benchmarking Framework for Vision-Language-Action Models in Autonomous Driving

Motivation

Arbeitspakete

Motivation

Work packages

Voraussetzungen

Requirements