Fachschaft Maschinenbau: BaSaMa & HiWi

Compact VLM Architectures for On-Device Scene Understanding in Autonomous Vehicles

Institut

Typ

Semesterarbeit / Masterarbeit /

Inhalt

experimentell / theoretisch / konstruktiv /

Beschreibung

Join us in tackling one of the most critical challenges in autonomous driving: deploying advanced Vision-Language Models (VLMs) directly onto vehicles for real-time, safety-critical scene understanding!

Are you passionate about making cutting-edge AI efficient and practical? Do you want to work on optimizing large neural networks for deployment on edge devices, where every millisecond counts? This project offers a unique opportunity to build and evaluate novel compact VLM architectures designed specifically for the demanding environment of autonomous vehicles.

While large VLMs excel at general scene understanding, their immense computational needs currently prevent them from achieving the sub-second inference speeds essential for autonomous driving. This thesis will focus on developing task-specific compact VLMs that prioritize the rapid and precise extraction of critical, safety-related details (e.g., road geometry, intersections, other vehicles, pedestrians, traffic signs) over generating verbose, general descriptions.

You'll explore and apply advanced techniques such as quantization, pruning, knowledge distillation, and efficient attention mechanisms. Your work will involve designing architectures that are not only computationally lightweight but also maintain high accuracy for essential driving-related visual elements. This research directly addresses the critical issue of latency in autonomous driving systems, making advanced VLM capabilities viable for real-world deployment and significantly enhancing safety.

You'll contribute to high-impact research with a real chance of publication in top-tier conferences in robotics, computer vision, and AI for autonomous systems. Your work will push the boundaries of efficient artificial intelligence for embodied systems.

Example Thesis Topics (subject to availability):

Efficient VLM Architectures & Optimization

Quantization-Aware Training for Driving VLMs: Develop and evaluate novel quantization techniques (e.g., 4-bit, 2-bit) tailored for compact VLMs in autonomous driving, minimizing accuracy drop while maximizing hardware efficiency.
Pruning Strategies for Safety-Critical Features: Investigate structured and unstructured pruning methods to reduce VLM size, specifically retaining or enhancing performance on critical driving tasks (e.g., obstacle detection, lane keeping).
Knowledge Distillation from Large VLMs: Design effective knowledge distillation pipelines where a powerful, large VLM "teaches" a smaller, more efficient VLM to achieve comparable performance on specific autonomous driving tasks.
Hybrid VLM Architectures for Dual-Speed Reasoning: Explore hybrid models combining a highly compact VLM for high-frequency perception with a larger, conditionally activated VLM for complex, less time-critical reasoning (e.g., "on-demand" query for ambiguous situations).

Task-Specific VLM Design & Evaluation

Specialized Compact VLMs for Intersection Understanding: Develop a compact VLM explicitly optimized for rapid and accurate interpretation of intersection layouts, traffic light states, and pedestrian crossings.
Efficient VLM for Long-Tail Scenario Identification: Focus on building a compact VLM capable of quickly identifying and classifying rare or unusual events in driving scenes, prioritizing safety-critical anomalies.
Benchmarking Compact VLM Performance on Edge Devices: Conduct comprehensive empirical studies comparing various compact VLM optimization techniques (quantization, pruning, distillation) on real autonomous vehicle hardware, analyzing latency, power consumption, and accuracy tradeoffs.
Hardware-Aware Neural Architecture Search (NAS) for VLMs: Investigate automated methods to design optimal compact VLM architectures directly accounting for the target autonomous vehicle hardware constraints.

Technologies Used

Python, PyTorch/TensorFlow, C++, Autonomous Driving, Large Language Models (LLMs), Vision-Language Models (VLMs), Deep Learning, Model Compression (Quantization, Pruning, Distillation), Efficient Attention Mechanisms, On-Device Inference, Edge AI, Computer Vision, Robotics, Sensor Fusion, Simulation (e.g., CARLA, Waymo Open Dataset, nuScenes), NVIDIA Jetson / Drive AGX platforms (or similar).

Voraussetzungen

If you're ready to make a tangible impact on the future of autonomous vehicles, send us an initiative application.

Please include:

A short motivation letter highlighting your interest in efficient AI, VLMs, and autonomous driving.
Your CV.
A recent transcript of records.
(Optional) Any relevant project work or code samples demonstrating your experience in relevant fields.

Tags

AVS Brusnicki

Möglicher Beginn

sofort

Kontakt