Precise and Efficient Scene Understanding for Autonomous Driving with VLMs
- Institut
- Professur für autonome Fahrzeugsysteme
- Typ
- Masterarbeit
- Inhalt
- experimentell theoretisch
- Beschreibung
The Problem:
Vision-Language Models (VLMs) are AI that are great at understanding a scene in a general way, like describing what's in a picture. However, for a self-driving car, this isn't enough. These models have two big weaknesses:-
They aren't very precise at pinpointing exactly where objects are.
-
They are large and slow, making them unsuitable for the split-second decisions needed for driving.
The Goal:
We want to create a detection system for self-driving cars that is both smart (understands context like a VLM) and precise & fast (can quickly and accurately locate objects).The Plan:
We will adapt existing VLMs in two key steps:-
Specialized Training: We will fine-tune a VLM using driving-specific data. This teaches it to be much better at the precise task of locating cars, pedestrians, and other critical objects on the road.
-
Model Compression: We will then use a technique called "knowledge distillation" to transfer the understanding from the large, slow VLM into a much smaller and faster model. Think of it as training a compact, efficient student model with the knowledge of a large, smart teacher.
The Result:
The final product will be a lightweight, real-time object detector that doesn't just see objects, but understands the scene with the intelligence of a VLM, all while being fast and accurate enough for safe autonomous driving.Key Facts
Type: MA, also for Informatics students Starting Date: Immediately Supervisor: Prof. Dr.-Ing. Johannes Betz Advisor: Yuchen Zhang, M.Sc Programming Language: Python Language: English Required Knowledge: Python + Computer Vision/Object Detection Work can begin immediately. If you are interested, simply send an email with your CV and academic transcript to yuchen2.zhangtum.de ;)
-
- Tags
- AVS Zhang
- Möglicher Beginn
- sofort
- Kontakt
-
Yuchen Zhang
yuchen2.zhangtum.de