Fachschaft Maschinenbau: Theses & HiWi Jobs

Literature Study: Lightweight and Fast-Inference Vision-Language Models for Autonomous Driving

Institute

Type

Semester Thesis / Master's Thesis /

Content

experimental / theoretical /

Description

Background
Autonomous driving systems increasingly leverage multimodal AI models, particularly Vision-Language Models (VLMs), to enhance scene understanding, natural language interaction, and scenario interpretation. These models combine visual inputs from cameras with textual instructions or context to perform tasks such as describing traffic scenes, interpreting driver commands, or understanding regulatory signs. However, deploying VLMs in real-time driving environments poses significant challenges due to limited computational resources and strict latency requirements.

Recent research efforts have focused on developing lightweight VLMs and fast-inference techniques to address these constraints, using methods such as model pruning, knowledge distillation, quantization, and efficient transformer architectures. Despite these advances, there is a lack of comprehensive understanding of how such lightweight VLMs perform in the context of autonomous driving, particularly under real-time constraints and in diverse, dynamic environments.

To support the development of deployable AI systems for autonomous vehicles, we propose a literature study that surveys lightweight and fast-inference VLMs, focusing on their applicability to driving-related tasks.

Objective
The primary objective of this project is to conduct a thorough literature review of recent advances in lightweight and fast-inference Vision-Language Models for autonomous driving. The student will:

Analyze the architectural and algorithmic techniques used to optimize VLMs for resource-constrained environments.
Explore the application domains within autonomous driving where lightweight VLMs have been evaluated (e.g., scene understanding, visual question answering, language-guided navigation).
Identify key trade-offs between model size, inference latency, and task accuracy.
Highlight domain-specific datasets and benchmarks.
Suggest promising directions for future research and real-world deployment.

We Offer

An exciting and forward-looking research environment.
The opportunity to publish scientific results (subject to merit).
Flexible supervision and the option to conduct the work in either German or English.

Requirements (What You Should Bring)

Initiative and a creative, problem-solving mindset.
Excellent proficiency in either German or English.
Interest in autonomous driving, efficient deep learning models, or multimodal AI.
Familiarity with computer vision or machine learning frameworks (e.g., PyTorch, TensorFlow) is advantageous.
Work can begin immediately. If you are interested in this topic, please send an email with a brief cover letter explaining why you are fascinated by this subject, along with a current transcript of records and your resume, to: yuan_avs.gao@tum.de.
If you are interested in my research in general and would like to explore other topics, please have a look at our recent survey paper: https://arxiv.org/abs/2506.11526
Feel free to contact me to discuss further ideas.

Tags

AVS Gao

Possible start

sofort

Contact

Yuan Gao
yuan_avs.gaotum.de

Navigation

Navigation

Literature Study: Lightweight and Fast-Inference Vision-Language Models for Autonomous Driving