Generative Machine Learning for 3D Bio-Aggregate Modeling

Institute
Professur für Multiscale Modeling of Fluid Materials
Type
Semester Thesis / Master's Thesis /
Content
theoretical /  
Description

The self-assembly of peptides into nanostructures (e.g., fibers, tubes, vesicles) plays a vital role in biomaterials and synthetic biology. Generating these 3D aggregate structures directly from peptide sequence information (e.g., FASTA format) is a key challenge that could accelerate peptide design. While machine learning has shown success in protein folding and molecular modeling, there is limited understanding of how best to represent aggregate morphologies in a way that enables generative models to learn and synthesize them effectively.

This project focuses on the output representation problem: what is the most effective way to represent a self-assembled aggregate structure (e.g. composed of hundreds of peptides) in 3D so that ML models can learn to generate such configurations from sequences? To solve this, the student will survey existing generative ML approaches applied to 3D structure generation in soft-matter systems and molecular modeling - especially focusing on how these models encode and generate the 3D output structure. Then, using available datasets (e.g., synthetic data, polymer or lipid assemblies, and approx. 100 peptide aggregate trajectories), they will implement and benchmark multiple 3D representation strategies such as voxel grids, point clouds and implicit neural fields (e.g. occupancy or SDF).

Each method will be compared in terms of learnability and model performance (reconstruction fidelity, generation plausibility), data requirements, and suitability for peptide self-assembly use cases. The final outcome will be a critical evaluation of how generative models could best represent and produce aggregate morphologies, setting the foundation for future conditional generation from peptide sequences.

Requirements

Proficiency in Python and experience with machine learning with PyTorch
Familiarity or basic understanding of 3D data types (e.g. voxel grids, point clouds, meshes)
Basic understanding of generative modeling (e.g. autoencoders, GANs, or diffusion models)
Interest in biomolecular simulation (beneficial but not required)

Possible start
sofort
Contact
Nuno Costa
Room: 5501.02.129
nuno.costatum.de
Announcement