Ensembling machine learning models to boost molecular affinity prediction

This study proposes a machine learning-based approach to predict protein-ligand binding affinities, using six popular machine learning methods. The pipeline consists of two ensembles, classification and regression, to assess both the binding class and the binding strength. By combining techniques such as Support Vector Machine, Random Forest, CatBoost, and neural networks, the study demonstrates that diverse methods improve prediction metrics. These models do not rely on atomic spatial coordinates, which allows for generalization to compounds with unknown conformations.

The methods focus on ligand features and avoid structural biases by leveraging physico-chemical properties and textual molecular representations. The first ensemble classifies whether a ligand binds to a receptor, while the second ensemble predicts the binding affinity. This approach can be adapted for different receptors, highlighting its versatility in drug discovery applications. The study shows that the combination of multiple models leads to improved prediction accuracy for molecular interactions.

Takeaways:

  1. Binding Affinity Prediction: The study introduces a machine learning pipeline combining six popular methods to predict protein-ligand binding affinity, enhancing predictive accuracy.
  2. Ensemble Approach: The pipeline consists of two ensembles: a classification ensemble to identify binding ligands and a regression ensemble to predict binding strength.
  3. Diverse ML Methods: The study integrates multiple machine learning techniques, including Support Vector Machine, Random Forest, CatBoost, and graph neural networks, to improve prediction performance.
  4. Ligand-Based Prediction: The methods focus on ligand features alone, avoiding biases from receptor structure and allowing the model to generalize to compounds with unknown conformations.
  5. Model Versatility: The approach is adaptable for different receptors, showcasing its potential for use in various drug discovery applications.
  6. No Structural Dependency: The models rely on physico-chemical properties and textual molecular representations, enabling accurate predictions without requiring atomic spatial coordinates.
Journal Image

Computational Biology and Chemistry


Volume 93

Pages: 107529


01.08.2021

Maksym Druchok, Dzvenymyra Yarish, Sofiya Garkot, Tymofii Nikolaienko, Alex Gurbych


Go to Publication

Straight to Business?

Chat With Our CEO

Alex Gurbych

PhD, CEO blackthorn.ai

Successful AI Software Projects Developed by Us

Learn more about the latest AI Software Projects developed by Blackthorn AI, where we showcased deep technical expertise and understanding of our clients’ businesses.

AI for Skincare Company

Pharmaceuticals and Biotech

Multi-Omics GenAI Platform

AI Software Development

Analysis of Drug Side Effects

Personalized Medicine
All Success Stories