Integration and Evaluation of Vision-Language-Action Models for Teleoperation
- Institut
- Lehrstuhl für Fahrzeugtechnik (TUM-ED)
- Typ
- Masterarbeit
- Inhalt
- Beschreibung
The aim of this work is to integrate NVIDIA's Alpamayo model into the teleoperation software for the research vehicle EDGAR, to design a novel interaction concept, and to evaluate its performance in real-world edge cases.
MOTIVATION
Recently, Vision-Language-Action Models (VLAMs) have proven to be highly capable drivers for autonomous vehicles. NVIDIA's Alpamayo model, for instance, can successfully navigate based on textual instructions, odometry, and camera information. While impressive, a major drawback of these end-to-end methods is their "black box" nature, which leads to a lack of transparency, supervision, and verification abilities.Therefore, a highly promising use case is deploying these models within Teleoperation. Rather than acting completely autonomously, the end-to-end model can be used as an intelligent fallback in common teleoperation scenarios. In this setup, a remote operator remains in the loop to actively instruct and supervise the model in resolving complex interactions. The goal of this work is to extend the initial integration of Alpamayo on the Chair's research vehicle EDGAR, build an interaction concept, and test it in the real world
- Voraussetzungen
YOUR ROLE
The work packages of this thesis contains:
- Literature research & Concept Design: Review existing literature on VLAMs in autonomous driving. Design a human-machine interaction concept that enables a remote operator to effectively supervise and instruct the VLM.
- Development & Integration: Extend the initial deployment of Alpamayo on the research vehicle EDGAR and seamlessly integrate it with the existing teleoperation software stack.
- Study Execution: Design and conduct an expert user study (N=10) in common real-world edge cases.
- Evaluation & Discussion: Assess the system based on familiarization time, usability (using the System Usability Scale - SUS), and driving task/situation performance. Compare the results against a standard and an existing Trajectory Guidance Concept (TG) to evaluate the use of the VLAM interaction concept.
WHAT YOU SHOULD BRING ALONG
- Strong interest and motivation for autonomous driving and human-machine interaction
- Initiative and an independent, structured way of working
- Solid programming skills (e.g., C++, Python)
- Experience in User Studies
- Optional but helpful: Previous experience with ROS (Robot Operating System), machine learning frameworks
If you are interested in joining this project, feel free to send me an application with your CV and transcript of records. I look forward to receiving your application!
<noscript>(at)</noscript>tum.de
EMail: niklas.krauss<script>document.write('@');</script>- Verwendete Technologien
- Python, Pytorch, VLAM, VLM, Teleoperation, Autonomous Vehicles, Autonomous Driving
- Tags
- FTM Studienarbeit, FTM Krauss, FTM AV, FTM AV Safe Operation, FTM Informatik, FTM Teleoperation
- Möglicher Beginn
- sofort
- Kontakt
-
Niklas Krauß
Raum: 3507
Tel.: +49172 1736882
niklas.krausstum.de - Ausschreibung
-