Fachschaft Maschinenbau: BaSaMa & HiWi

Precise and Efficient Scene Understanding for Autonomous Driving with VLMs

Institut

Typ

Masterarbeit /

Inhalt

experimentell / theoretisch /

Beschreibung

The Problem:
Vision-Language Models (VLMs) are AI that are great at understanding a scene in a general way, like describing what's in a picture. However, for a self-driving car, this isn't enough. These models have two big weaknesses:

They aren't very precise at pinpointing exactly where objects are.
They are large and slow, making them unsuitable for the split-second decisions needed for driving.

The Goal:
We want to create a detection system for self-driving cars that is both smart (understands context like a VLM) and precise & fast (can quickly and accurately locate objects).

The Plan:
We will adapt existing VLMs in two key steps:

Specialized Training: We will fine-tune a VLM using driving-specific data. This teaches it to be much better at the precise task of locating cars, pedestrians, and other critical objects on the road.
Model Compression: We will then use a technique called "knowledge distillation" to transfer the understanding from the large, slow VLM into a much smaller and faster model. Think of it as training a compact, efficient student model with the knowledge of a large, smart teacher.

The Result:
The final product will be a lightweight, real-time object detector that doesn't just see objects, but understands the scene with the intelligence of a VLM, all while being fast and accurate enough for safe autonomous driving.

Key Facts

Type:	MA, also for Informatics students
Starting Date:	Immediately
Supervisor:	Prof. Dr.-Ing. Johannes Betz
Advisor:	Yuchen Zhang, M.Sc
Programming Language:	Python
Language:	English
Required Knowledge:	Python + Computer Vision/Object Detection

Work can begin immediately. If you are interested, simply send an email with your CV and academic transcript to yuchen2.zhangtum.de ;)

Tags

AVS Zhang

Möglicher Beginn

sofort

Kontakt

Yuchen Zhang
yuchen2.zhangtum.de

Navigation

Navigation

Precise and Efficient Scene Understanding for Autonomous Driving with VLMs