Handy functionality for working with OpenFF data
Available on PyPi, so a pip install should work:
$ pip install openff-spellbook
Preferably in a preconfigured virtual environment e.g. conda. Append --user if such an environment is not being used.
Currently no dependency checking is performed... depending on the functionality, openforcefield (RDKit), OpenBabel, CMILES, OpenMM, QCElemental, QCPortal, geomeTRIC, and Psi4 are needed.
Bottled functionality resides in the ui
submodule. So far:
The OpenForceField Spellbook TorsionDrive parser and plotter
This useful utility is an automated pipeline to save and plot torsiondrive data and figures.
$ python3 -m offsb.ui.qca.torsiondrive -h
usage: torsiondrive.py [-h] [--out_file_name OUT_FILE_NAME]
[--datasets DATASETS] [--qm-energy]
[--mm-energy {None,all,vdw,bonds,angles,dihedrals,outofplanes}]
[--openff-name OPENFF_NAME]
[--openff-parameter OPENFF_PARAMETER]
[--openff-previous OPENFF_PREVIOUS]
{torsiondrive_groupby_openff}
The OpenForceField Spellbook TorsionDrive parser
positional arguments:
{torsiondrive_groupby_openff}
optional arguments:
-h, --help show this help message and exit
--out_file_name OUT_FILE_NAME
--datasets DATASETS
--qm-energy
--mm-energy {None,all,vdw,bonds,angles,dihedrals,outofplanes}
--openff-name OPENFF_NAME
--openff-parameter OPENFF_PARAMETER
--openff-previous OPENFF_PREVIOUS
The OpenForceField Spellbook error scanner for QCArchive
$ python3 -m offsb.ui.qca.errors -h
usage: errors.py [-h] [--save-xyz] [--report-out REPORT_OUT] [--full-report]
The OpenForceField Spellbook error scanner for QCArchive
optional arguments:
-h, --help show this help message and exit
--save-xyz
--report-out REPORT_OUT
--full-report
$ python3 -m offsb.ui.qca.run-optimization
usage: run-optimization.py [-h] [-o OUT_JSON] [-i] [-m MEMORY] [-n NTHREADS]
optimization_id molecule_id
positional arguments:
optimization_id QCA ID of the optimization to run
molecule_id QCA ID of the molecule to use
optional arguments:
-h, --help show this help message and exit
-o OUT_JSON, --out_json OUT_JSON
Output json file name
-i, --inputs-only just generate input json; do not run
-m MEMORY, --memory MEMORY
amount of memory to give to psi4 eg '10GB'
-n NTHREADS, --nthreads NTHREADS
number of processors to give to psi4
The OpenForceField Spellbook energy extractor from QCArchive
$ python3 -m offsb.ui.qca.energy-per-molecule
usage: energy-per-molecule.py [-h] [--report-out REPORT_OUT]
The OpenForceField Spellbook energy extractor from QCArchive
optional arguments:
-h, --help show this help message and exit
--report-out REPORT_OUT
Transform a SMILES string into a QCSchema
$ python3 -m offsb.ui.smiles.load -h
usage: load.py [-h] [-c CUTOFF] [-n MAX_CONFORMERS] [-s LINE_START]
[-e LINE_END] [-H HEADER_LINES] [-u] [-i ISOMERS]
[-o OUTPUT_FILE] [-f FORMATTED_OUT] [-j] [-m] [--ncpus NCPUS]
input
A tool to transform a SMILES string into a QCSchema. Enumerates stereoisomers
if the SMILES is ambiguous, and generates conformers.
positional arguments:
input Input file containing smiles strings. Assumes that the
file is CSV-like, splits on spaces, and the SMILES is
the first column
optional arguments:
-h, --help show this help message and exit
-c CUTOFF, --cutoff CUTOFF
Prune conformers less than this cutoff using all
pairwise RMSD comparisons (in Angstroms)
-n MAX_CONFORMERS, --max-conformers MAX_CONFORMERS
The number of conformations to attempt generating
-s LINE_START, --line-start LINE_START
The line in the input file to start processing
-e LINE_END, --line-end LINE_END
The line in the input file to stop processing (not
inclusive)
-H HEADER_LINES, --header-lines HEADER_LINES
The number of lines at the top of the file to skip
before data begins
-u, --unique-smiles If stereoisomers are generated, organize molecules by
their unambiguous SMILES string
-i ISOMERS, --isomers ISOMERS
The number of stereoisomers to keep if multiple are
found
-o OUTPUT_FILE, --output-file OUTPUT_FILE
The file to write the output log to
-f FORMATTED_OUT, --formatted-out FORMATTED_OUT
Write all molecules to a formatted output as qc_schema
molecules. Assumes singlets! Choose either --json or
--msgpack as the the format
-j, --json Write the formatted output to qc_schema (json) format.
-m, --msgpack Write the formatted output to qc_schema binary message
pack (msgpack).
--ncpus NCPUS Number of processes to use.
An example output if the SMILES input file is just C
(methane) would be the following:
{
"C": [
{
"schema_name": "qcschema_molecule",
"schema_version": 2,
"validated": true,
"symbols": [
"C",
"H",
"H",
"H",
"H"
],
"geometry": [
0.00967296,
-0.02006983,
0.01136534,
1.0387219,
1.42757171,
-1.12813096,
1.41684881,
-1.11105294,
1.10602765,
-1.10880164,
-1.23235809,
-1.277628,
-1.35644204,
0.93590916,
1.28836596
],
"name": "CH4",
"molecular_charge": 0.0,
"molecular_multiplicity": 1,
"connectivity": [
[
0,
1,
1.0
],
[
0,
2,
1.0
],
[
0,
3,
1.0
],
[
0,
4,
1.0
]
],
"fix_com": false,
"fix_orientation": false,
"provenance": {
"creator": "QCElemental",
"version": "v0.15.1",
"routine": "qcelemental.molparse.from_schema"
},
"extras": null
}
]
}
Submit an Optimization Dataset based on SMILES
First, generate the the JSON for --input-molecules from python3 -m offsb.ui.smiles.load
. This will
be the direct input for --input-molecules
. Then call the following:
$ python3 -m offsb.ui.smiles.load -h
usage: submit-optimizations.py [-h] [--input-molecules INPUT_MOLECULES]
[--metadata METADATA]
[--compute-spec COMPUTE_SPEC]
[--threads THREADS]
[--dataset-name DATASET_NAME] [--server SERVER]
[--priority {low,normal,high}]
[--compute-tag COMPUTE_TAG] [--verbose]
The OpenFF Spellbook QCArchive Optimization dataset submitter
optional arguments:
-h, --help show this help message and exit
--input-molecules INPUT_MOLECULES
A JSON file which contains the QCSchema ready for
submission. The json should be a list at the top-
level, containing dictionaries with a name as a key,
and the value a list of QCMolecules representing the
different conformations of the same molecule. Note
that entry data, e.g. the CMILES info, should not be
specified here as it is generated automatically from
this input.
--metadata METADATA The JSON file containing the metadata of the dataset.
--compute-spec COMPUTE_SPEC
A JSON file containing the new compute specification
to add to the dataset
--threads THREADS Number of threads to use to communicate with the
server
--dataset-name DATASET_NAME
The name of the dataset. This is needed if the dataset
already exists and no metadata is supplied. Useful
when e.g. adding computes or molecules to an existing
dataset.
--server SERVER The server to connect to. The special value
'from_file' will read from the default server
connection config file for e.g. authentication
--priority {low,normal,high}
The priority of the calculations to submit.
--compute-tag COMPUTE_TAG
The compute tag used to match computations with
compute managers. For OpenFF calculations, this should
be 'openff'
--verbose Show the progress in the output.
Here, an example --metadata metadata.json
could be:
{
"submitter": "trevorgokey",
"creation_date": "2020-09-18",
"collection_type": "OptimizationDataset",
"dataset_name": "OpenFF Sandbox CHO PhAlkEthOH v1.0",
"short_description": "A diverse set of CHO molecules",
"long_description_url": "https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2020-09-18-OpenFF-Sandbox-CHO-PhAlkEthOH",
"long_description": "This dataset contains an expanded set of the AlkEthOH and PhEthOH datasets, which were used in the original derivation of the frosst specification.",
"elements": [
"C",
"H",
"O"
],
"change_log": [
{"author": "trevorgokey",
"date": "2020-09-18",
"version": "1.0",
"description": "A diverse set of CHO molecules. The molecules in this set were generated to include all stereoisomers if chirality was ambiguous from the SMILES input. Conformations were generated which had an RMSD of at least 4 Angstroms from all other conformers"
}
]
}
And if we want to perform both optimizations using B3LYP-D3BJ/DZVP and MM OpenFF 1.0.0, then the JSON file to give to --compute-spec compute.json
could be the following:
{"default":
{"opt_spec":
{"program": "geometric",
"keywords":
{"coordsys": "tric",
"enforce": 0.1,
"reset": true,
"qccnv": true,
"epsilon": 0.0}
},
"qc_spec": {
"driver": "gradient",
"method": "b3lyp-d3bj",
"basis": "dzvp",
"program": "psi4",
"keywords": {
"maxiter": 200,
"scf_properties": [
"dipole",
"quadrupole",
"wiberg_lowdin_indices",
"mayer_indices"
]
}
}
},
"openff-1.0.0":
{"opt_spec":
{"program": "geometric",
"keywords":
{"coordsys": "tric",
"enforce": 0.1,
"reset": true,
"qccnv": true,
"epsilon": 0.0}
},
"qc_spec": {
"driver": "gradient",
"method": "openff-1.0.0",
"basis": "smirnoff",
"program": "openmm",
"keywords": { }
}
}
}
Note that the default
specification is the standard for fitting new versions of the SMIRNOFF OpenForceField.
Running the command will will produce the following output if --verbose
is used. First, create the input molecules:
$ python3 -m offsb.ui.smiles.load methane.smi -n 10 -c 2 -f methane.json -j
1 / 1 ENTRY: C
ISOMER 1/ 1 CONFS: 1 SMILES: C
Inputs: 1 Isomers: 1 Conformations: 1
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 2.71it/s]
Totals:
Inputs: 1
Isomers: 1
Conformations: 1
Now submit the optimizations (here a private server using localhost:7777
):
$ python3 -m offsb.ui.qca.submit-optimizations --verbose --metadata metadata.json --compute-spec compute.json --server localhost:7777 --priority normal --compute-tag openff --input-molecules methane.json
Arguments given:
{'compute_spec': 'compute.json',
'compute_tag': 'openff',
'dataset_name': None,
'input_molecules': 'methane.json',
'metadata': 'metadata.json',
'priority': 'normal',
'server': 'localhost:7777',
'threads': None}
Dataset created with the following metadata:
{'change_log': [{'author': 'trevorgokey',
'date': '2020-09-18',
'description': 'A diverse set of CHO molecules. The molecules '
'in this set were generated to include all '
'stereoisomers if chirality was ambiguous from '
'the SMILES input. Conformations were '
'generated which had an RMSD of at least 4 '
'Angstroms from all other conformers',
'version': '1.0'}],
'collection_type': 'OptimizationDataset',
'creation_date': '2020-09-18',
'dataset_name': 'OpenFF Sandbox CHO PhAlkEthOH v1.0',
'elements': ['C', 'H', 'O'],
'long_description': 'This dataset contains an expanded set of the AlkEthOH '
'and PhEthOH datasets, which were used in the original '
'derivation of the frosst specification.',
'long_description_url': 'https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2020-09-18-OpenFF-Sandbox-CHO-PhAlkEthOH',
'short_description': 'A diverse set of CHO molecules',
'submitter': 'trevorgokey'}
Successfully added specification default:
{'opt_spec': {'keywords': {'coordsys': 'tric',
'enforce': 0.1,
'epsilon': 0.0,
'qccnv': True,
'reset': True},
'program': 'geometric'},
'qc_spec': {'basis': 'dzvp',
'driver': 'gradient',
'keywords': {'maxiter': 200,
'scf_properties': ['dipole',
'quadrupole',
'wiberg_lowdin_indices',
'mayer_indices']},
'method': 'b3lyp-d3bj',
'program': 'psi4'}}
Successfully added specification openff-1.0.0:
{'opt_spec': {'keywords': {'coordsys': 'tric',
'enforce': 0.1,
'epsilon': 0.0,
'qccnv': True,
'reset': True},
'program': 'geometric'},
'qc_spec': {'basis': 'smirnoff',
'driver': 'gradient',
'keywords': {},
'method': 'openff-1.0.0',
'program': 'openmm'}}
Loading methane.json into QCArchive...
Number of unique molecules: 1
Entries: 100%|████████████████████████████████████| 1/1 [00:00<00:00, 39.24it/s]
Number of new entries: 1/1
Submitting calculations in batches of 20 for specification default
Tasks: 100%|██████████████████████████████████████| 1/1 [00:00<00:00, 16.18it/s]
Submitting calculations in batches of 20 for specification openff-1.0.0
Tasks: 100%|██████████████████████████████████████| 1/1 [00:00<00:00, 20.08it/s]
Number of new tasks: 2
The format of the file required for --datasets
in all commands is the following:
TYPE NAME WITH SPACES / SPEC1 SPEC2
Where we could specify, using the above dataset submission example, as:
OptimizationDataset OpenFF Sandbox CHO PhAlkEthOH v1.0 / default openff-1.0.0