BaSaMa & HiWi

Integration and Evaluation of Vision-Language-Action Models for Teleoperation

Institut

Typ

Masterarbeit /

Inhalt

Beschreibung

The aim of this work is to integrate NVIDIA's Alpamayo model into the teleoperation software for the research vehicle EDGAR, to design a novel interaction concept, and to evaluate its performance in real-world edge cases.

MOTIVATION

Recently, Vision-Language-Action Models (VLAMs) have proven to be highly capable drivers for autonomous vehicles. NVIDIA's Alpamayo model, for instance, can successfully navigate based on textual instructions, odometry, and camera information. While impressive, a major drawback of these end-to-end methods is their "black box" nature, which leads to a lack of transparency, supervision, and verification abilities.

Therefore, a highly promising use case is deploying these models within Teleoperation. Rather than acting completely autonomously, the end-to-end model can be used as an intelligent fallback in common teleoperation scenarios. In this setup, a remote operator remains in the loop to actively instruct and supervise the model in resolving complex interactions. The goal of this work is to extend the initial integration of Alpamayo on the Chair's research vehicle EDGAR, build an interaction concept, and test it in the real world

Voraussetzungen

YOUR ROLE

The work packages of this thesis contains:

Literature research & Concept Design: Review existing literature on VLAMs in autonomous driving. Design a human-machine interaction concept that enables a remote operator to effectively supervise and instruct the VLM.
Development & Integration: Extend the initial deployment of Alpamayo on the research vehicle EDGAR and seamlessly integrate it with the existing teleoperation software stack.
Study Execution: Design and conduct an expert user study (N=10) in common real-world edge cases.
Evaluation & Discussion: Assess the system based on familiarization time, usability (using the System Usability Scale - SUS), and driving task/situation performance. Compare the results against a standard and an existing Trajectory Guidance Concept (TG) to evaluate the use of the VLAM interaction concept.

WHAT YOU SHOULD BRING ALONG

Strong interest and motivation for autonomous driving and human-machine interaction
Initiative and an independent, structured way of working
Solid programming skills (e.g., C++, Python)
Experience in User Studies
Optional but helpful: Previous experience with ROS (Robot Operating System), machine learning frameworks

If you are interested in joining this project, feel free to send me an application with your CV and transcript of records. I look forward to receiving your application!

EMail: niklas.krauss<script>document.write('@');</script>

Verwendete Technologien

Python, Pytorch, VLAM, VLM, Teleoperation, Autonomous Vehicles, Autonomous Driving

Tags

FTM Studienarbeit, FTM Krauss, FTM AV, FTM AV Safe Operation, FTM Informatik, FTM Teleoperation

Möglicher Beginn

sofort

Kontakt

Niklas Krauß
Raum: 3507
Tel.: +49172 1736882
niklas.krausstum.de

Ausschreibung

Navigation

Navigation

Integration and Evaluation of Vision-Language-Action Models for Teleoperation