This is the official implementation for the paper titled 'Harnessing Deep Statistical Potential for Biophysical Scoring of Protein-peptide Interactions'.
- DeepPpIScore
- Table of Contents
- Introduction
- Evaluation of Peptide Binding Mode Prediction Based on the Well-established Unbound Set
- Evaluation of Peptide Binding Mode Prediction Based on the Latest Bound Set
- Comparison with AF-M 2.3 On the Peptide Binding Mode Prediction
- Conda Environment Reproduce
- Training Structures and Evaluation Datasets
- Usage
Protein-peptide interactions (PpIs) play a critical role in major cellular processes. Recently, a number of machine learning (ML)-based methods have been developed to predict PpIs, but most of them rely heavily on sequence data, limiting their ability to capture the generalized molecular interactions in three-dimensional (3D) space, which is crucial for understanding protein-peptide binding mechanisms and advancing peptide therapeutics. Protein-peptide docking approaches provide a feasible way to generate the structures of PpIs, but they often suffer from low-precision scoring functions (SFs). To address this, we developed DeepPpIScore, a novel SF for PpIs that employs unsupervised geometric deep learning coupled with physics-inspired statistical potential. Trained solely on curated experimental structures without binding affinity data or classification labels, DeepPpIScore exhibits broad generalization across multiple tasks. Our comprehensive evaluations in bound and unbound peptide binding mode prediction, binding affinity prediction, and binding pair identification reveal that DeepPpIScore outperforms or matches state-of-the-art baselines, including popular protein-protein SFs, ML-based methods, and AlphaFold-Multimer 2.3 (AF-M 2.3). Notably, DeepPpIScore achieves superior results in peptide binding mode prediction compared to AF-M 2.3. More importantly, DeepPpIScore offers interpretability in terms of hotspot preferences at protein interfaces, physics-informed noncovalent interactions, and protein-peptide binding energies.
Mamba Installation
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
Then, the following code can be used to reproduce the conda environment:
mkdir -p ~/conda_env/DeepPpIScore
mamba env create --prefix=~/conda_env/DeepPpIScore --file ./env/DeepPpIScore.yaml
mamba activate ~/conda_env/DeepPpIScore
Downloading conda-packed .tar.gz
file from google dirve DeepPpIScore.tar.gz, and then using the following code to reproduce the conda environment:
mkdir -p ~/conda_env/DeepPpIScore
tar -xzvf DeepPpIScore.tar.gz -C ~/conda_env/DeepPpIScore
mamba activate ~/conda_env/DeepPpIScore
conda-unpack
The prepared training structures are available at google dirve pepbdb_graphs_noH_pocket_topk30.zip and pepbdb_graphs_noH_ligand.zip
Pepset is available at PepSet Benchmark
The prepared BoundPep is available at zenodo
PepBinding is available at PepBinding
The prepared pMHCSet is available at zenodo
The code was tested sucessfully on the basci environment equipped with Nvidia Tesla V100 GPU Card
, Python=3.9.13
, CUDA=11.2
, conda=24.3.0
and mamba=1.5.8
git clone https://github.com/zjujdj/DeepPpIScore.git
Downloading esm2_t33_650M_UR50D.pt, esm2_t33_650M_UR50D-contact-regression.pt and put them in the ./data
directory
mv esm2_t33_650M_UR50D.pt esm2_t33_650M_UR50D-contact-regression.pt ./data
cd scripts
# the evaluation configurations can be set in file of model_inference_example.py
# submitting the following code to cpu node to generate graphs using multiprocessing first
python3 -u model_inference_example.py > model_inference_example_graph_gen.log
The generated graph files for this example were stored in the directory of ./data/temp_graphs_noH
# then submitting the following code to gpu node to make predictions
python3 -u model_inference_example.py > model_inference_example.log
The prediction result was listed in directory of ./model_inference/DeepPpIScore/8.0/DeepPpIScore_8.0.csv
. For this csv file, four kinds of score were provided, namely 'cb-cb score', 'cb-cb norm score', 'min-min score' and 'min-min norm score', respectively where the norm score = score / sqrt(contacts). All the analysis in the paper was based on 'min-min score'.
Downloading the prepared training structures from google dirve pepbdb_graphs_noH_pocket_topk30.zip and pepbdb_graphs_noH_ligand.zip, and unzip them in the ./data
directory.
# unzip training structures
cd ./data
unzip pepbdb_graphs_noH_pocket_topk30.zip
unzip pepbdb_graphs_noH_ligand.zip
# model training, the training configurations can be set in file of train_model.py
cd scripts
python3 -u train_model.py > train_model.log