Encoding Chemistry: Molecular Embeddings for ML
- Institut
- Professur für Multiscale Modeling of Fluid Materials
- Typ
- Bachelorarbeit Semesterarbeit Masterarbeit
- Inhalt
- theoretisch
- Beschreibung
Machine learning models for molecular and materials systems rely heavily on how structures are represented. Different embedding strategies-from traditional fingerprints to graph-based message passing embeddings and continuous 3D representations-offer distinct advantages and limitations. Molecular representations are essential because they translate complex chemical structures into numerical forms that models can learn from, capturing key information about atoms, bonding, and geometry. They are widely used in property prediction, molecular design, reaction modeling, and materials screening, where the quality of the representation strongly influences accuracy, data efficiency, and generalization. This project aims to systematically investigate, compare, and develop molecular representation methods for downstream property prediction tasks.
- Voraussetzungen
-
Survey Existing Molecular Embedding Techniques
Identify and summarize the main ways molecules and materials can be represented numerically. This includes classical fingerprints, graph-based models, and 3D geometric descriptors.
-
Benchmark Performance for Direct Property Prediction
Test different representations by using them as inputs to machine learning models and evaluating how well they predict molecular or material properties (e.g., energy, stability, reactivity).
-
Develop or Improve a New Embedding Method
Design or refine a representation that addresses limitations found in existing methods, such as better capturing geometry, long-range interactions, or chemical environments.
-
- Möglicher Beginn
- sofort
- Kontakt
-
Michał Sanocki
Tel.: 783880014
m.sanockitum.de - Ausschreibung
-