Updated README
Browse files
README.md
CHANGED
|
@@ -4,54 +4,57 @@ license: cc-by-nc-4.0
|
|
| 4 |
|
| 5 |
# Matcha: Multi-Stage Riemannian Flow Matching for Accurate and Physically Valid Molecular Docking
|
| 6 |
|
| 7 |
-
This
|
| 8 |
-
|
| 9 |
-
|
|
|
|
|
|
|
| 10 |
|
| 11 |
## Abstract
|
| 12 |
-
|
|
|
|
| 13 |
|
| 14 |
## Overview
|
| 15 |
-
Matcha is a molecular docking pipeline that combines multi-stage flow matching with learned scoring and physical validity filtering. Our approach consists of three sequential stages applied consecutively to progressively refine docking predictions, each implemented as a flow matching model operating on appropriate geometric spaces (R^3, SO(3), and SO(2)).
|
| 16 |
-
We enhance the prediction quality through a dedicated scoring model and apply unsupervised physical validity filters to eliminate unrealistic poses.
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
| 19 |
|
| 20 |

|
| 21 |

|
| 22 |
|
| 23 |
-
Compared to various approaches, Matcha demonstrates superior performance on Astex and
|
| 24 |
|
| 25 |
<img src="https://github.com/LigandPro/Matcha/raw/main/data/img/time.png" alt="results" width="500"/>
|
| 26 |
|
| 27 |
## Installation
|
| 28 |
-
|
|
|
|
| 29 |
|
| 30 |
```bash
|
| 31 |
-
cd
|
| 32 |
-
|
|
|
|
| 33 |
```
|
| 34 |
|
| 35 |
-
|
| 36 |
-
To run inference with one script, computing all preprocessing steps and docking predictions, use the following command. Provide `--compute_final_metrics` if your dataset has true ligand positions, so we can compute RMSD metrics and PoseBusters filters.
|
| 37 |
-
Argument `-n inference_folder_name` is a name of a folder where to store inference results for dataset.
|
| 38 |
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
This script will provide a step-by-step computation of protein ESM embeddings, docking predictions, physically-aware unsupervised post-filtration, scoring and saving predictions to sdf.
|
| 43 |
|
| 44 |
## Citation
|
| 45 |
-
|
|
|
|
| 46 |
|
| 47 |
```bibtex
|
| 48 |
@misc{frolova2025matchamultistageriemannianflow,
|
| 49 |
-
title={Matcha: Multi-Stage Riemannian Flow Matching for Accurate and Physically Valid Molecular Docking},
|
| 50 |
-
author={Daria Frolova and Talgat Daulbaev and Egor
|
| 51 |
year={2025},
|
| 52 |
eprint={2510.14586},
|
| 53 |
archivePrefix={arXiv},
|
| 54 |
primaryClass={cs.LG},
|
| 55 |
-
url={https://arxiv.org/abs/2510.14586},
|
| 56 |
}
|
| 57 |
```
|
|
|
|
| 4 |
|
| 5 |
# Matcha: Multi-Stage Riemannian Flow Matching for Accurate and Physically Valid Molecular Docking
|
| 6 |
|
| 7 |
+
This repository hosts the Matcha pipeline checkpoints (three-stage flow matching models). The model is described in the paper [Matcha: Multi-Stage Riemannian Flow Matching for Accurate and Physically Valid Molecular Docking](https://arxiv.org/abs/2510.14586).
|
| 8 |
+
|
| 9 |
+
**Code and full documentation:** [GitHub – LigandPro/Matcha](https://github.com/LigandPro/Matcha). The previous release is tagged `v1` there.
|
| 10 |
+
|
| 11 |
+
Download the `matcha_pipeline` folder from the **Files and versions** tab and set `checkpoints_folder` in your paths config to the folder that contains it.
|
| 12 |
|
| 13 |
## Abstract
|
| 14 |
+
|
| 15 |
+
Accurate prediction of protein-ligand binding poses is crucial for structure-based drug design, yet existing methods struggle to balance speed, accuracy, and physical plausibility. We introduce Matcha, a novel molecular docking pipeline that combines multi-stage flow matching with learned scoring and physical validity filtering. Our approach consists of three sequential stages applied consecutively to refine docking predictions, each implemented as a flow matching model operating on appropriate geometric spaces (R³, SO(3), and SO(2)). We enhance the prediction quality through a dedicated scoring model and apply unsupervised physical validity filters to eliminate unrealistic poses. Compared to various approaches, Matcha demonstrates superior performance on Astex and PDBbind test sets in terms of docking success rate and physical plausibility. Moreover, our method works approximately 31× faster than modern large-scale co-folding models.
|
| 16 |
|
| 17 |
## Overview
|
|
|
|
|
|
|
| 18 |
|
| 19 |
+
Matcha is a molecular docking pipeline that combines multi-stage flow matching with learned scoring and physical validity filtering. Our approach consists of three sequential stages applied consecutively to progressively refine docking predictions, each implemented as a flow matching model operating on appropriate geometric spaces (R³, SO(3), and SO(2)). We enhance the prediction quality through a dedicated scoring model and apply unsupervised physical validity filters to eliminate unrealistic poses.
|
| 20 |
+
|
| 21 |
+
More details, CLI usage, and the full pipeline (including GNINA) are in the [GitHub repository](https://github.com/LigandPro/Matcha).
|
| 22 |
|
| 23 |

|
| 24 |

|
| 25 |
|
| 26 |
+
Compared to various approaches, Matcha demonstrates superior performance on Astex, PDBBind and PoseBusters test sets in terms of docking success rate and physical plausibility. Moreover, our method works approximately 31× faster than modern large-scale co-folding models.
|
| 27 |
|
| 28 |
<img src="https://github.com/LigandPro/Matcha/raw/main/data/img/time.png" alt="results" width="500"/>
|
| 29 |
|
| 30 |
## Installation
|
| 31 |
+
|
| 32 |
+
Clone the [Matcha repository](https://github.com/LigandPro/Matcha) and install:
|
| 33 |
|
| 34 |
```bash
|
| 35 |
+
cd Matcha
|
| 36 |
+
uv sync
|
| 37 |
+
# or: pip install -e .
|
| 38 |
```
|
| 39 |
|
| 40 |
+
Download the `matcha_pipeline` folder from this Hugging Face repo and set `checkpoints_folder` in `configs/paths/paths.yaml` to the parent folder of `matcha_pipeline`.
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
## Sample usage
|
| 43 |
+
|
| 44 |
+
To run docking, save scored predictions and compute metrics, see the instructions provided in [GitHub README](https://github.com/LigandPro/Matcha).
|
|
|
|
| 45 |
|
| 46 |
## Citation
|
| 47 |
+
|
| 48 |
+
If you use Matcha in your work, please cite:
|
| 49 |
|
| 50 |
```bibtex
|
| 51 |
@misc{frolova2025matchamultistageriemannianflow,
|
| 52 |
+
title={Matcha: Multi-Stage Riemannian Flow Matching for Accurate and Physically Valid Molecular Docking},
|
| 53 |
+
author={Daria Frolova and Talgat Daulbaev and Egor Sevriugov and Sergei A. Nikolenko and Dmitry N. Ivankov and Ivan Oseledets and Marina A. Pak},
|
| 54 |
year={2025},
|
| 55 |
eprint={2510.14586},
|
| 56 |
archivePrefix={arXiv},
|
| 57 |
primaryClass={cs.LG},
|
| 58 |
+
url={https://arxiv.org/abs/2510.14586},
|
| 59 |
}
|
| 60 |
```
|