BaSaMa & HiWi

Robust Sim-to-Real Reinforcement Learning for Dynamic Locomotion and Manipulation

Institut

Professur für autonome Fahrzeugsysteme (TUM-ED)

Typ

Masterarbeit /

Inhalt

experimentell / theoretisch / konstruktiv /

Beschreibung

Background

Join our team to develop highly robust Reinforcement Learning (RL) policies in simulation, a crucial step towards deploying intelligent behaviors on real-world humanoid and quadruped robots!

Are you passionate about reinforcement learning and physics simulations? Do you want to teach robots how to walk, balance, and grasp before they ever touch the real world? This project offers a unique opportunity to tackle the notoriously difficult "Sim-to-Real gap" using state-of-the-art simulation environments and our cutting-edge Unitree robotics hardware. Training complex robotic behaviors directly in the real world is slow, expensive, and potentially dangerous to the hardware. Simulators like MuJoCo and Isaac Sim provide safe, accelerated environments to train deep RL policies. However, simulators are imperfect approximations of reality. Policies that perform flawlessly in simulation often fail catastrophically on the physical robot due to unmodeled dynamics, friction variations, and sensor noise. We will use highly accurate digital twins of the Unitree B2 and G1 robots. We will design reward functions and train low-level RL policies for dynamic locomotion and object manipulation. Crucially, we will heavily research and implement advanced Domain Randomization and adaptation techniques to ensure that the policies you train on our NVIDIA DGX can survive the transition to the physical robots without manual tuning.

Example Thesis Topics (subject to availability):

Advanced Domain Randomization for Bipedal Locomotion: Develop systematic domain randomization strategies (varying mass, friction, motor latency, and sensor noise) in Isaac Sim to train robust walking policies for the Unitree G1 humanoid.
Sim-to-Sim vs. Sim-to-Real Transferability Analysis: Train identical RL policies in different physics engines (MuJoCo vs. Isaac Sim) and evaluate their respective zero-shot transfer capabilities to the real Unitree hardware.
Teacher-Student Distillation for Robust Control: Implement a privileged "teacher" policy in simulation (access to exact physical parameters) and distill it into a "student" policy (using only onboard proprioception) for deployment on the B2 or G1.
RL-Based Fall Recovery and Safe Exploration: Train policies specifically designed to handle unexpected perturbations, detect imminent falls, and execute safe recovery maneuvers for bipedal and quadrupedal systems.

Technologies Used Python, PyTorch, Reinforcement Learning (PPO, SAC), Isaac Sim, MuJoCo, OpenAI Gym / DeepMind Control Suite, Unitree SDK, NVIDIA DGX, Sim2Real.

Your Benefits: Join a High-Performance Robotics Team

Impactful Research: Work on a project where your code doesn't live in a silo; it is a critical gear in an end-to-end pipeline. Your results will directly enable robots to perform complex tasks.
Top-Tier Hardware Stack: Gain exclusive hands-on experience with NVIDIA DGX (training), Jetson Thor (inference), and Unitree Humanoids/Quadrupeds - very similar stack used by industry leaders like Tesla, Figure AI, and Physical Intelligence.
Scientific Publication: We aim for high-impact results. If your work meets the quality standards, we will co-author and submit a paper to top-tier robotics/AI conferences (e.g., ICRA, IROS, CoRL, or CVPR).
Professional Career Launchpad: This thesis is designed to mirror the workflow of elite AI labs. We provide dedicated mentorship and professional support to help you land roles at top-tier robotics startups or Big Tech AI labs.
Dynamic Lab Culture: You will be part of a "squad" of motivated Master’s students working in parallel, fostering a collaborative, fast-paced, and supportive environment.

Requirements

We are looking for students who know their thesis is not just as a degree requirement, but as a career-defining project.

Must-Have:

English Proficiency: High level of written and spoken English (the language of our research and documentation).
Proactive Mindset: You are comfortable with a "fail fast, learn fast" approach and is comfortable solving hands-on hardware/software integration challenges.
Independence: Ability to own a technical module and drive it forward while communicating effectively with the rest of the team.
Growth Path: A passion for Robotics/AI and an eagerness to learn new technologies.

Nice-to-Have (The "Plus"):

Technical Foundation: Proficiency in Python and/or C++.
Domain Experience: Prior exposure to PyTorch, ROS 2, or physics simulators (Isaac Sim/MuJoCo).
Hardware Skills: Experience working with robotic hardware, sensors, or VR systems.

Ready to build the future of Embodied AI? Send your CV, recent transcript, and a brief email on why you are the right fit for this specific "squad" and your career goals.

Möglicher Beginn

sofort

Kontakt

Roberto Brusnicki
roberto.brusnickitum.de

Navigation

Navigation

Robust Sim-to-Real Reinforcement Learning for Dynamic Locomotion and Manipulation

Background

Your Benefits: Join a High-Performance Robotics Team

Requirements