Skip to content

Latest commit

 

History

History
39 lines (35 loc) · 10.1 KB

File metadata and controls

39 lines (35 loc) · 10.1 KB

SAMPL7 physical property challenge manuscript

Provided are figures, tables, and supplementary documents for the "Evaluation of logP, pKa, and log D predictions from the SAMPL7 blind challenge" paper.

Files in .SVG format are compatible with Inskape, an open source vector graphics editor. LateX tables were generated using https://www.tablesgenerator.com/ and saved as in .TGN file format for any future editing. Tables can be viewed by loading the .TGN files back using the "Load table" option from the menu on the website.

What's Here

  • figures/ - Contains figures used in the paper.

    • accuracy_statistics_logP/ - Overall accuracy assessment for all methods participating in the SAMPL7 log P challenge in .PDF and .SVG file format. Both root-mean-square error (RMSE) and mean absolute error (MAE) are shown, with error bars denoting 95% confidence intervals.
    • correlation_plots_well_performing_methods_logP/ - Predicted vs. experimental value correlation plots of 5 best performing methods and one representative average method in the SAMPL7 log P challenge. Dark and light green shaded areas indicate 0.5 and 1.0 units of error. Error bars indicate standard error of the mean of predicted and experimental values. Figure is available in .PDF and .SVG file format.
    • molecule_prediction_accuracy_logP-classes-marked/ - Molecule-wise prediction accuracy in the log P challenge. Molecules are labeled with their compound class as a reference. (A) shows the MAE calculated for each molecule as an average of all methods. (B) shows the MAE of each molecule separated by method category. (C) shows log P prediction error distribution for each molecule across all prediction methods. (D) shows log P prediction error distribution for each molecule calculated for only 5 methods from blind ranked submissions that were determined to be consistently well-performing. Figure is available in .PDF and .SVG file format.
    • relative-free-energy-network-example/ - For each molecule in the SAMPL7 pKa challenge we asked participants to predict the relative free energy between our selected neutral reference microstate and the rest of the enumerated microstates for that molecule. This figure gives an example of this using challenge molecule SM43 as an example-- participants were asked to predict the relative free energy between SM43_micro000 (our selected neutral microstate highlighted in yellow) and all of the other enumerated microstates. Figure is available in .PDF and .SVG file format.
    • titration-graph/ - Graph that corresponds to the titration method in the paper. Blue and orange lines represent two states. Blue states have one more proton than the orange states, and thus a formal charge higher by +1. The blue state has one tautomer and the orange state has 3, denoted by the dashed lines. The solid lines are the ensemble averaged state probability for each group with a given charge. The crossing point between two ensemble lines is the macroscopic pKa. Figure is available in .PDF and .SVG file format.
    • accuracy-statistics-pKa/ - Overall accuracy assessment for all methods participating in the SAMPL7 pKa challenge. Both RMSE and MAE are shown, with error bars denoting 95% confidence intervals. Figure is available in .PDF and .SVG file format.
    • correlation-plots-well-performing-methods-pKa/ - Predicted vs. experimental value correlation plots of the two best performing methods and one representative average method in the SAMPL7 pKa challenge. Dark and light green shaded areas indicate 0.5 and 1.0 units of error. Error bars indicate standard error of the mean of predicted and experimental values. Figure is available in .PDF and .SVG file format.
    • correlation-statistics-pKa/ - Overall correlation assessment for all methods participating in the SAMPL7 pKa challenge. Pearson’s R^2 and Kendall’s Rank Correlation Coefficient Tau are shown, with error bars denoting 95% confidence intervals. Figure is available in .PDF and .SVG file format.
    • molecule_prediction_accuracy_pKa-classes-marked/ - Molecule-wise prediction error distribution plots show the prediction accuracy for individual molecules across all prediction methods for the pKa challenge. Molecules are labeled with their compound class as a reference. (A) shows the MAE of each molecule separated by method category suggests the most challenging molecules were different for each method category. (B) shows error distribution for each molecule over all prediction methods. Figure is available in .PDF and .SVG file format.
    • difficult-transitions-2d-depictions/ - Figure shows some chemical transformations that repeatedly show up as having large disagreement on the sign of the relative free energy prediction. Figure is available in .PDF and .SVG file format.
    • microstate_averages_and_distribution-classes-marked/ - The average relative microstate free energy predicted per microstate and the distribution across predictions in the SAMPL7 pKa challenge. Molecules are labeled with their compound class as a reference. (A) shows the average relative microstate free energy predicted per microstate. Error bars are the standard deviation of the relative microstate free energy predictions. (B) shows the distribution for each relative microstate free energy prediction over all prediction methods. Figure is available in .PDF and .SVG file format.
    • micro-disagreement/ - Structures of microstates where relative microstate free energy predictions disagree. The average relative free energy prediction along with the standard deviation are listed under each transition. Figure is available in .PDF and .SVG file format.
    • Shannon_entropy_pos_neg_only/ - Shows the Shannon entropy (H) per microstate transition. Microstates with entropy values greater than 0 reflect increasing disagreement in the predicted sign. Microstates with an entropy of 0 are not shown here, but indicate that methods made predictions which had the same sign for the free energy change associated with a particular transition. Figure is available in .PDF and .SVG file format.
    • accuracy_statistics_logD/ - Overall accuracy assessment for log D estimation. Both RMSE and MAE are shown, with error bars denoting 95% confidence intervals. Figure is in .PDF and .SVG file format.
    • correlation_plots_logD/ - Predicted vs. experimental value correlation plots of all log D estimation methods in the SAMPL7 challenge. Dark and light green shaded areas indicate 0.5 and 1.0 units of error. Error bars indicate standard error of the mean of predicted and experimental values. Figure is in .PDF and .SVG file format.
    • compounds-2d-depiction.pdf - Structures of the 22 molecules used for the SAMPL7 physical property blind prediction challenge in .PDF file format.
    • different-logD-combo-accuracy.pdf - This plot is similar to accuracy_statistics_logD/, except it includes some additional pKa and log P combinations (for log D estimation). Shown is the RMSE in calculated log D values, with error bars denoting 95% confidence intervals. Figure is in .PDF file format.
  • SI/ - Contains figures and tables in the paper SI.

    • correlation_statistics_logP/ - Overall correlation assessment for all methods participating in the SAMPL7 logP. Pearson’s R^2 and Kendall’s Rank Correlation Coefficient Tau are shown, with error bars denoting 95% confidence intervals. Figure is in .PDF and .SVG file format.
    • compound-classes-and-2d-structures/ - Compound classes and structures of the molecules in the SAMPL7 physical property challenge. Figure is in .PDF and .SVG file format.
    • chemical_property_distribution/ - Distribution of molecular properties of the 22 compounds from the SAMPL7 physical property blind challenge. (A) show a histogram of experimental log P values. The ticks along the x-axis indicate the individual values. (B) shows a histogram of experimental pKa values. (C) shows a histogram of experimental log D values. (D) shows a histogram of molecular weights calculated for the compounds in the SAMPL7 dataset. (E) shows a histogram of the number of rotatable bonds in each molecule. Figure is in .PDF and .SVG file format.
    • compound-smiles-and-class.csv - SMILES and compound class of SAMPL7 physical property challenge molecules. Table is in .CSV file format.
    • charge-state-number-per-molecule/ - Contains the number of states per charge state for the microstates used in the SAMPL7 pKa challenge. The total number of microstates (protomers and tautomers) is listed. Table is in .CSV and .TGN file format.
    • logP-statistics.csv - Evaluation statistics calculated for all methods in the log P challenge. There are six error metrics reported: RMSE, MAE, mean (signed) error (ME), coefficient of determination (R^2), linear regression slope (m), and Kendall’s Rank Correlation Coefficient, and error slope (ES). The mean and 95% confidence intervals of each statistic is presented.
    • pKa-statistics.csv - Same as logP-statistics.csv, but for pKa.
    • logD-statistics.csv - Same as logP-statistics.csv, but for logD.
    • microstate-additional-info.csv - File in .CSV file format that contains additional info for microscopic pKa predictions. It lists the: microstate, total number of relative free energy predictions, average relative free energy prediction, average relative free energy prediction STD, Minimum relative free energy prediction, maximum relative free energy prediction, number of (+) sign predictions, number of (-) sign predictions, number of neutral (0) sign predictions, and Shannon entropy (H).
    • MM-logP-method-details.tgn - Details of MM-based physical methods in the log P prediction challenge. Force fields, water models, and octanol phase choice are reported. A dry octanol phase indicates the octanol phase was composed of only octanol. A wet octanol phase indicates the octanol phase was treated as a mixture of octanol and water. RMSE, MAE, R^2, and Kendall's Tau values are reported as mean and 95% confidence intervals. Table is in .TGN file format.