Skip to content

zjujdj/DeepPpIScore

Repository files navigation

DeepPpIScore

This is the official implementation for the paper titled 'Harnessing Deep Statistical Potential for Biophysical Scoring of Protein-peptide Interactions'.

Table of Contents

Introduction

Protein-peptide interactions (PpIs) play a critical role in major cellular processes. Recently, a number of machine learning (ML)-based methods have been developed to predict PpIs, but most of them rely heavily on sequence data, limiting their ability to capture the generalized molecular interactions in three-dimensional (3D) space, which is crucial for understanding protein-peptide binding mechanisms and advancing peptide therapeutics. Protein-peptide docking approaches provide a feasible way to generate the structures of PpIs, but they often suffer from low-precision scoring functions (SFs). To address this, we developed DeepPpIScore, a novel SF for PpIs that employs unsupervised geometric deep learning coupled with physics-inspired statistical potential. Trained solely on curated experimental structures without binding affinity data or classification labels, DeepPpIScore exhibits broad generalization across multiple tasks. Our comprehensive evaluations in bound and unbound peptide binding mode prediction, binding affinity prediction, and binding pair identification reveal that DeepPpIScore outperforms or matches state-of-the-art baselines, including popular protein-protein SFs, ML-based methods, and AlphaFold-Multimer 2.3 (AF-M 2.3). Notably, DeepPpIScore achieves superior results in peptide binding mode prediction compared to AF-M 2.3. More importantly, DeepPpIScore offers interpretability in terms of hotspot preferences at protein interfaces, physics-informed noncovalent interactions, and protein-peptide binding energies.

Image text

Evaluation of Peptide Binding Mode Prediction Based on the Well-established Unbound Set

Image text

Evaluation of Peptide Binding Mode Prediction Based on the Latest Bound Set

Image text

Comparison with AF-M 2.3 On the Peptide Binding Mode Prediction

Image text

Conda Environment Reproduce

Create environment using yaml file provided in ./env directory

Mamba Installation

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh

Then, the following code can be used to reproduce the conda environment:

mkdir -p ~/conda_env/DeepPpIScore
mamba env create --prefix=~/conda_env/DeepPpIScore --file ./env/DeepPpIScore.yaml
mamba activate ~/conda_env/DeepPpIScore

Create environment using conda-packed .tar.gz file

Downloading conda-packed .tar.gz file from google dirve DeepPpIScore.tar.gz, and then using the following code to reproduce the conda environment:

mkdir -p ~/conda_env/DeepPpIScore
tar -xzvf DeepPpIScore.tar.gz -C ~/conda_env/DeepPpIScore
mamba activate ~/conda_env/DeepPpIScore
conda-unpack

Training Structures and Evaluation Datasets

Training Structures

The prepared training structures are available at google dirve pepbdb_graphs_noH_pocket_topk30.zip and pepbdb_graphs_noH_ligand.zip

PepSet

Pepset is available at PepSet Benchmark

BoundPep

The prepared BoundPep is available at zenodo

PepBinding

PepBinding is available at PepBinding

pMHCSet

The prepared pMHCSet is available at zenodo

Usage

The code was tested sucessfully on the basci environment equipped with Nvidia Tesla V100 GPU Card, Python=3.9.13, CUDA=11.2, conda=24.3.0 and mamba=1.5.8

Step 1: Clone the Repository

git clone https://github.com/zjujdj/DeepPpIScore.git

Step 2: Downloading ESM2 CheckPoint

Downloading esm2_t33_650M_UR50D.pt, esm2_t33_650M_UR50D-contact-regression.pt and put them in the ./data directory

mv esm2_t33_650M_UR50D.pt esm2_t33_650M_UR50D-contact-regression.pt ./data

Step 3: Inference Example with the Weights Trained in This Study

cd scripts
# the evaluation configurations can be set in file of model_inference_example.py
# submitting the following code to cpu node to generate graphs using multiprocessing first
python3 -u model_inference_example.py > model_inference_example_graph_gen.log

The generated graph files for this example were stored in the directory of ./data/temp_graphs_noH

# then submitting the following code to gpu node to make predictions
python3 -u model_inference_example.py > model_inference_example.log

The prediction result was listed in directory of ./model_inference/DeepPpIScore/8.0/DeepPpIScore_8.0.csv . For this csv file, four kinds of score were provided, namely 'cb-cb score', 'cb-cb norm score', 'min-min score' and 'min-min norm score', respectively where the norm score = score / sqrt(contacts). All the analysis in the paper was based on 'min-min score'.

Step 4: Model Re-training with the Training Structures Used in This Study

Downloading the prepared training structures from google dirve pepbdb_graphs_noH_pocket_topk30.zip and pepbdb_graphs_noH_ligand.zip, and unzip them in the ./data directory.

# unzip training structures
cd ./data
unzip pepbdb_graphs_noH_pocket_topk30.zip 
unzip pepbdb_graphs_noH_ligand.zip
# model training, the training configurations can be set in file of train_model.py
cd scripts
python3 -u train_model.py > train_model.log

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages