Skip to content

a protein descriptor for site prediction

Notifications You must be signed in to change notification settings

KeithTab/Presite

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProteinDescriptor

a protein descriptor for site prediction

A four-channel grid protein descriptor is constructed based on ligsite, L-J potential, and Coulomb force, which can be used for protein binding site prediction.By using 16 X 16 X 16 sampling to classify and cluster the blocks, the binding sites of the proteins are finally determined. The detailed processing is shown in the figure blow.

requirement

python == 3.6.x
keras == 2.2.4
tenforflow-gpu == 1.13.1
numpy == 1.16.4
tqdm == 4.13.1
sklearn == 0.20.1

training

dataset (scPDB-2017)

To ensure training and testing, the data set should look like this.

ProteinDescriptor

data

data_raw

train

1a4i_1

protein.mol2, protein.pdbqt, site.mol2

data

data_raw

valid

1a4l_2

protein.mol2, protein.pdbqt, site.mol2

data

data_raw

test

1aiq_2

protein.mol2, protein.pdbqt

The training set and validation set must contain site.mol2 file for every protein (for label determination). At the same time, in order to obtain protein feature, the mol2 and pdbqt files of each protein should be included in the dataset, and the pdbqt files can be obtained through openbabel or autodock script.

Because the data of scPDB is too large, only a small part is provided for operation.

usage

python train.py

The features (grid descriptor) of the training set,validation set and testing set are stored in data/feature directory. In order to be more efficient, the features of all proteins are calculted and stored before model training and testing.

prediction

A trained model is saved in the /model/model.h5. For a new protein in prediction, the mol2 and pdbqt files shuold be prepared for feature calculation.

usage

python predict.py protein.mol2 protein.pdbqt 3/5(pocket number) result_file

example:

python predict.py example/1c6y_1/protein.mol2 example/1c6y_1/protein.pdbqt 3 results_example/result.txt

About

a protein descriptor for site prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%