Skip to content

Ensemble Optimizer (EnOpt) is a fast, accessible tool that streamlines ensemble-docking and consensus-score analysis.

License

Notifications You must be signed in to change notification settings

durrantlab/EnOpt

Repository files navigation

EnOpt

Ensemble Optimizer (EnOpt) is a fast, accessible tool that streamlines ensemble-docking and consensus-score analysis. EnOpt takes as input a matrix of docking scores from an ensemble virtual screen, organized as compounds (rows) X protein conformations (columns). It uses simple, interpretable machine learning to identify most-predictive subensembles and an ensemble composite score.

Setup

Before using EnOpt, ensure that you have installed a python enviroment with all necessary packages (e.g., NumPy, Pandas, SciPy, etc.). We have provided a conda specification file to make it easier to set up an environment with all necessary packages:

conda create --name [environment name] --file conda_spec_file.txt

To print a guide with all standard options and their usage:

python ensemble_optimizer.py --help

Simple usage

An example of the simplest use of EnOpt:

python ensembe_optimizer.py -f [input file matrix]

Options

Input options

The input CSV file containing the ensemble docking score matrix (required):

-f INPUT_FILE

A file containing the names of known ligands, separated by commas:

-l KNOWN_LIGS, --knownLigs KNOWN_LIGS

A JSON file containing all user-specified EnOpt input parameters, as an alternative to the command line input:

--json_input JSON_INPUT

Output options

The prefix of the output file:

--outFile OUT_FILE

The number of known ligands to include in interactive output:

--top_known_out TOP_KNOWN_OUT

The number of unknowns (compounds that are not known ligands) to include in interactive output:

--top_unknown_out TOP_UNKNOWN_OUT

Scoring options

The scoring scheme to use for combining scores across conformations:

--scoringScheme SCORING_SCHEME

(One of "eA", "eB", "rA", or "rB". "eA" uses the average score across all conformations in the ensemble. "eB" uses the best score across all conformations. "rA" uses the average of the score rank for each conformation. "rB" uses the best-ranked score across all conformations. Default: eA.)

Whether to compute weights optimized using tree models:

--weightedScore

(EnOpt performs optimization using known ligands if included. Otherwise, it uses score rankings; not recommended.)

Whether higher (more positive) scores describing stronger binding:

--invertScoreSign

(The scheme depends on the docking program used; for example, smina uses more negative scores to represent stronger binding. Default: False, meaning that more negative scores represent stronger binding.)

Optimization options

Method to determine weighted scores:

--optimizationMethod OPT_METHOD

(One of "RF", Random Forest, or "XGB", Gradient-boosted trees. Default: RF.)

Number of top conformations to include in the "best subensemble":

--topConformations TOPN_CONFS

(Default: 3)

Whether to perform hyperparameter optimization for tree models:

--hyperparam          

Default: False (default tree model parameters will be used).

Optional JSON file containing user-provided parameters for optimization:

--tree_params TREE_PARAMS

If not provided, default hyperparameter optimization options will be used.

Paper/lab link

Find more tools for analysis of protein-ligand binding at https://durrantlab.pitt.edu/durrant-lab-software/.

Contact info

For questions, suggestions, or problems with the tool contact Roshni Bhatt at [email protected].

Acknowledgements

This work was supported by the National Institute of Health (1R01GM132353-01) and the University of Pittsburgh's Center for Research Computing, RRID:SCR_022735 (supported by NSFOAC-2117681). We would like to thank Yogindra Raghav for his contributions in generating initial proof-of-concept code. We also thank Darian Yang for assistance in collating and pruning ideas.

About

Ensemble Optimizer (EnOpt) is a fast, accessible tool that streamlines ensemble-docking and consensus-score analysis.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published