ProteinDescriptor

a protein descriptor for site prediction

A four-channel grid protein descriptor is constructed based on ligsite, L-J potential, and Coulomb force, which can be used for protein binding site prediction.By using 16 X 16 X 16 sampling to classify and cluster the blocks, the binding sites of the proteins are finally determined. The detailed processing is shown in the figure blow.

requirement

python == 3.6.x
keras == 2.2.4
tenforflow-gpu == 1.13.1
numpy == 1.16.4
tqdm == 4.13.1
sklearn == 0.20.1

training

dataset （scPDB-2017）

To ensure training and testing, the data set should look like this.

ProteinDescriptor

data

data_raw

train

1a4i_1

protein.mol2, protein.pdbqt, site.mol2

data

data_raw

valid

1a4l_2

protein.mol2, protein.pdbqt, site.mol2

data

data_raw

test

1aiq_2

protein.mol2, protein.pdbqt

The training set and validation set must contain site.mol2 file for every protein (for label determination). At the same time, in order to obtain protein feature, the mol2 and pdbqt files of each protein should be included in the dataset, and the pdbqt files can be obtained through openbabel or autodock script.

Because the data of scPDB is too large, only a small part is provided for operation.

usage

python train.py

The features (grid descriptor) of the training set，validation set and testing set are stored in data/feature directory. In order to be more efficient, the features of all proteins are calculted and stored before model training and testing.

prediction

A trained model is saved in the /model/model.h5. For a new protein in prediction, the mol2 and pdbqt files shuold be prepared for feature calculation.

usage

python predict.py protein.mol2 protein.pdbqt 3/5(pocket number) result_file

example:

python predict.py example/1c6y_1/protein.mol2 example/1c6y_1/protein.pdbqt 3 results_example/result.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProteinDescriptor

requirement

training

dataset （scPDB-2017）

usage

prediction

usage

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
components		components
data		data
docs		docs
example/1c6y_1		example/1c6y_1
model		model
results		results
results_example		results_example
README.md		README.md
configure.py		configure.py
predict.py		predict.py
train.py		train.py

KeithTab/Presite

Folders and files

Latest commit

History

Repository files navigation

ProteinDescriptor

requirement

training

dataset （scPDB-2017）

usage

prediction

usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages