Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting docs cleanup #85

Open
wants to merge 16 commits into
base: v2_final
Choose a base branch
from
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,6 @@ Pipfile.lock
notes

*.db
*.toml

experiments/

Expand Down
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,17 +124,15 @@ We want to add several features to **yahpo_gym** in future versions:
- [rbv2](https://github.com/pfistfl/rbv2) (R-Package) can be used to reproduce runs from all `rbv2_*` in a real setting.
- [iaml](https://github.com/sumny/iaml) (R-Package) can be used to reproduce runs from all `iaml_*` in a real setting.
- [HPOBench](https://github.com/automl/HPOBench) can be used to reproduce several other scenarios in a real setting. Furthermore, we soon hope to integrate our surrogates with **HPOBench** in order to provide a single, common API.
- [SyneTune](https://github.com/awslabs/syne-tune) offers additional benchmarks and helpful tuning. `SyneTune`can be used with yahpo gym instances!

### Citation

If you use YAHPO Gym, please cite the following paper:

- Pfisterer, F., Schneider, L., Moosbauer, J., Binder, M., & Bischl, B. (2022). YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization. In International Conference on Automated Machine Learning.

Moreover, certain `scenarios` built upon previous work, e.g., the `lcbench` scenario uses data from:

- Zimmer, L., Lindauer, M., & Hutter, F. (2021). Auto-Pytorch: Multi-Fidelity Metalearning for Efficient and Robust AutoDL. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(9), 3079-3090.
- Zimmer, L. (2020). data_2k_lw.zip. figshare. Dataset. https://doi.org/10.6084/m9.figshare.11662422.v1, Apache License, Version 2.0.
Moreover, depending on which scenarios you use, see the **Overview over benchmark instances** above.

**Please make sure to always also cite the original data sources as YAHPO Gym would not have been possible without them!**

Expand Down
2 changes: 2 additions & 0 deletions yahpo_gym/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ build/

# Ignore OSX finder directories
.DS_Store

.vscode/*
19 changes: 8 additions & 11 deletions yahpo_gym/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ YAHPO Gym consists of several `scenarios`. A scenario (e.g. `lcbench`) is a coll

The **full, up-to-date overview** can be obtained from the [Documentation](https://slds-lmu.github.io/yahpo_gym/scenarios.html).
The fidelity is given either as the dataset fraction `fraction` or the number of epochs `epoch`.
Search spaces can be numeric, mixed and have dependencies (as indicated in the `H` column).
Search spaces can be numeric, mixed and have hierarchical dependencies (as indicated in the `H` column).

Original data sources are given by:

Expand All @@ -46,12 +46,12 @@ Original data sources are given by:
### Installation

```console
pip install yahpo-gym==1.0.1
pip install yahpo-gym==2.0
```

### Setup

To run a benchmark you need to obatin the ONNX model (`new_model.onnx`), [ConfigSpace](https://automl.github.io/ConfigSpace/) (`config_space.json`) and some encoding info (`encoding.json`).
To run a benchmark you need to obatin the ONNX model (`new_model.onnx`), [ConfigSpace](https://automl.github.io/ConfigSpace/) (`config_space.json`) and some encoding info (`encoding.json`) for the respective benchmark.

You can download these [here (Github)](https://github.com/slds-lmu/yahpo_data) or [here (Syncshare)](https://syncandshare.lrz.de/getlink/fiCMkzqj1bv1LfCUyvZKmLvd/).

Expand All @@ -63,6 +63,7 @@ from yahpo_gym import local_config
local_config.init_config()
local_config.set_data_path("path-to-data")
```

You can test whether the setup was successful by instantiating the object as documented in the **Usage** section below.

### Usage
Expand Down Expand Up @@ -98,8 +99,8 @@ result_dict = b.objective_function(configuration=config, fidelity={"epoch": 50},

#### Using YAHPO with `syne-tune`

We are currently working on integrating `yahpo_gym` with `syne-tune`.
See [here](https://github.com/awslabs/syne-tune/pull/337) for progress on this issue.
`yahpo-gym` is also integrated in [`syne-tune`](https://github.com/awslabs/syne-tune).
See [syne-tune docs](https://github.com/awslabs/syne-tune/blob/main/examples/launch_asha_yahpo.py) for an extensive example.

### A note on OpenML Task IDs

Expand All @@ -111,7 +112,7 @@ To query meta information, use https://www.openml.org/t/<task_id>.

### Example: Tuning an instance using HPBandSter

We include a full example for optimization using **BOHB** on a YAHPO Gym instance in a [jupyter notebook](https://github.com/slds-lmu/yahpo_gym/blob/main/yahpo_gym/notebooks/tuning_hpandster_on_yahpo.ipynb).
We include a full example for optimization using **BOHB** from [HPBandSter](https://github.com/automl/HpBandSter) on a YAHPO Gym instance in a [jupyter notebook](https://github.com/slds-lmu/yahpo_gym/blob/main/yahpo_gym/notebooks/tuning_hpandster_on_yahpo.ipynb).

### All Examples

Expand All @@ -126,11 +127,7 @@ We include a full example for optimization using **BOHB** on a YAHPO Gym instanc
If you use YAHPO Gym, please cite the following paper:

- Pfisterer, F., Schneider, L., Moosbauer, J., Binder, M., & Bischl, B. (2022). YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization. In International Conference on Automated Machine Learning.

Moreover, certain `scenarios` built upon previous work, e.g., the `lcbench` scenario uses data from:

- Zimmer, L., Lindauer, M., & Hutter, F. (2021). Auto-Pytorch: Multi-Fidelity Metalearning for Efficient and Robust AutoDL. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(9), 3079-3090.
- Zimmer, L. (2020). data_2k_lw.zip. figshare. Dataset. https://doi.org/10.6084/m9.figshare.11662422.v1, Apache License, Version 2.0.
In addition, YAHPO contains `scenarios` built upon previous work, see the *Source* column in the table above for citation info.

**Please make sure to always also cite the original data sources as YAHPO Gym would not have been possible without them!**

Expand Down
2 changes: 2 additions & 0 deletions yahpo_gym/docs/source/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,5 @@ We provide several examples for using `YAHPO Gym`:
- `Paper Examples <https://github.com/slds-lmu/yahpo_gym/blob/main/yahpo_gym/notebooks/code_sample.ipynb>`_

- `Paper Experiments <https://github.com/slds-lmu/yahpo_exps/tree/main/paper>`_

- `Using YAHPO with syne-tune <https://github.com/awslabs/syne-tune/blob/main/examples/launch_asha_yahpo.py>`_
16 changes: 10 additions & 6 deletions yahpo_gym/docs/source/frequently_asked.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,9 @@ Citation

If you use YAHPO Gym, please cite the following paper:

* Pfisterer, F., Schneider, L., Moosbauer, J., Binder, M., & Bischl, B. (2022). YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization. In International Conference on Automated Machine Learning.
* Pfisterer, F., Schneider, L., Moosbauer, J., Binder, M., & Bischl, B. (2022). YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization. Proceedings of the First International Conference on Automated Machine Learning, in Proceedings of Machine Learning Research 188:3/1-39

Moreover, certain `scenarios` built upon previous work, e.g., the `lcbench` scenario uses data from:

* Zimmer, L., Lindauer, M., & Hutter, F. (2021). Auto-Pytorch: Multi-Fidelity Metalearning for Efficient and Robust AutoDL. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(9), 3079-3090.

* Zimmer, L. (2020). data_2k_lw.zip. figshare. Dataset. https://doi.org/10.6084/m9.figshare.11662422.v1, Apache License, Version 2.0.
In addition, please cite the scenarios used, see the `Scenarios table <https://slds-lmu.github.io/yahpo_gym/scenarios.html>`_.

OpenML task_id and dataset_id
=======================
Expand Down Expand Up @@ -100,6 +96,13 @@ While XGBoost can be considered state-of-the art on tabular data and very good p

We are looking into this issue and will try to address it in upcoming versions of `YAHPO Gym`.

Replications and the **repl** parameter (rbv2_, iaml_)
=======================
Metrics obtained from the *rbv2_*, and *iaml_* benchmarks include a **repl** hyperparameter.
Surrogate models here model the individual folds of a 10-fold CV run (as defined in the OpenML tasks) which allows for evaluating scenarios *multi-fidelity* scenarios such as running only a subset of cross-validation folds.
Replications here model the `cummulative mean` of the previous folds, i.e. fold 3 is the mean performance in the first three folds.
By default, the **repl** parameter is fixed to the 10th cv fold.

Noisy Surrogates
=======================

Expand All @@ -112,3 +115,4 @@ This internally works as follows:
While this works well in theory, this was not tested thoroughly and the use of noisy surrogates is therefore discouraged at the moment.
Furthermore, we have not extensively tested whether all noisy surrogates indeed correctly return noisy predictions.
We will improve this in upcoming versions of `YAHPO Gym`.

15 changes: 14 additions & 1 deletion yahpo_gym/docs/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,27 @@ Getting Started
Installation (Python)
=======================

`YAHPO Gym` can be installed using `pip`:

`YAHPO Gym` can be installed from **PyPy** using `pip`:


.. code-block:: bash

pip install yahpo-gym

or the latest version directly from GitHub:

.. code-block:: bash

pip install yahpo-gym

or the latest version directly from GitHub:

.. code-block:: bash

pip install yahpo-gym


.. code-block:: bash

pip install "git+https://github.com/slds-lmu/yahpo_gym#egg=yahpo_gym&subdirectory=yahpo_gym"
Expand Down
3 changes: 2 additions & 1 deletion yahpo_gym/docs/source/scenarios.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ Original data sources are given by:
* [5] Zimmer, L. (2020). data_2k_lw.zip. figshare. Dataset. https://doi.org/10.6084/m9.figshare.11662422.v1, Apache License, Version 2.0.
* [6] None, simply cite Pfisterer, F., Schneider, L., Moosbauer, J., Binder, M., & Bischl, B. (2022). YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization. In International Conference on Automated Machine Learning.

Please make sure to always also cite the original data sources as YAHPO Gym would not have been possible without them!
Please make sure to always also **cite** the original data sources as YAHPO Gym would not have been possible without them!


In `yahpo_gym`, there is a `Configuration` object for each **scenario**.

Expand Down
24 changes: 21 additions & 3 deletions yahpo_gym/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
[build-system]
requires = ["setuptools"]

requires = ["setuptools>=61.0"]

build-backend = "setuptools.build_meta"

[project]
Expand All @@ -17,6 +19,18 @@ keywords = [
"yahpo",
]
classifiers = [
"Intended Audience :: Developers",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: Apache Software License",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
"Topic :: Software Development :: Libraries :: Python Modules",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3 :: Only",
"Development Status :: 3 - Alpha",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Natural Language :: English",
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"Intended Audience :: Science/Research",
Expand All @@ -42,6 +56,9 @@ docs = [
"pandas",
]

[project.scripts]
setup-yahpo = "yahpo_gym.scripts.setup_yahpo:main"

[project.urls]
repository = "https://github.com/slds-lmu/yahpo_gym"
homepage = "https://slds-lmu.github.io/yahpo_gym/"
Expand All @@ -50,7 +67,7 @@ documentation = "https://slds-lmu.github.io/yahpo_gym/"

[tool.ruff]
line-length = 200
include = ["pyproject.toml", "yahpo_gym/*.py"]
include = ["pyproject.toml", "yahpo_gym/*.py", "scripts/*.py"]
indent-width = 4
target-version = "py310"
fix = true
Expand All @@ -62,4 +79,5 @@ multi_line_output = 3

[tool.mypy]
check_untyped_defs = true
ignore_missing_imports = true
ignore_missing_imports = true

1 change: 1 addition & 0 deletions yahpo_gym/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,5 @@
],
},
keywords=["module", "inference", "yahpo"],
url="https://github.com/slds-lmu/yahpo_gym",
)
24 changes: 16 additions & 8 deletions yahpo_gym/yahpo_gym/get_suite.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
from pandas import read_json
from yahpo_gym.local_config import local_config

def get_suite(type:str, version:float = 1.0):

def get_suite(type: str, version: float = 1.0):
"""
Interface for benchmark scenario meta information.
Interface for benchmark scenario meta information.
Abstract base class used to instantiate configurations that contain all
relevant meta-information about a specific benchmark scenario.

Expand All @@ -12,20 +13,27 @@ def get_suite(type:str, version:float = 1.0):
type: str
The type of benchmark to be used. Can be either 'single' (single-objective) or 'multi' (multi-objective).
version: float
The version of the benchmark to be used.
The version of the benchmark to be used. Defaults to 1.0.
"""
assert type in ['single', 'multi'], "type must be either 'single' or 'multi'"
assert _data_has_version(version), "version must coincide with version in `local_config.data_path`"
assert type in ["single", "multi"], "type must be either 'single' or 'multi'"
assert _data_has_version(
version
), "version must coincide with version in `local_config.data_path`"
# Get file
fp = local_config.data_path.joinpath("benchmark_suites").joinpath(f"v{version}").joinpath(f"{type}.json")
fp = (
local_config.data_path.joinpath("benchmark_suites")
.joinpath(f"v{version}")
.joinpath(f"{type}.json")
)
# Read json
with open(fp, "r") as f:
data = read_json(f, orient='records')
data = read_json(f, orient="records")
return data


def _data_has_version(version: float):
fp = local_config.data_path.joinpath("VERSION")
with open(fp, "r") as f:
for line in f:
if line.startswith(f'VERSION:{version}'):
if line.startswith(f"VERSION:{version}"):
return True
Empty file.
55 changes: 55 additions & 0 deletions yahpo_gym/yahpo_gym/scripts/setup_yahpo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/usr/bin/env python
import argparse
import subprocess
from pathlib import Path
from yahpo_gym import benchmark_set
from yahpo_gym.local_config import local_config
import yahpo_gym.benchmarks.lcbench # noqa: F401#


def setup(dest_dir: Path | str):
"""
Clone yahpo data into <dest_dir>/yahpo_data
"""
# Define the repository URL
yahpo_data_url = "https://github.com/slds-lmu/yahpo_data.git@v2_final"

# Run the git clone command
dest_dir = Path(dest_dir).joinpath("yahpo_data")
subprocess.run(["git", "clone", yahpo_data_url, dest_dir])

local_config.init_config()
local_config.set_data_path(dest_dir)


def test():
bench = benchmark_set.BenchmarkSet("lcbench")
bench.instances
bench.set_instance("3945")
value = bench.config_space.sample_configuration(1).get_dictionary()
result = bench.objective_function(value)
print(f"Eval objective for {value}: {result}")
if result is not None:
print("Setup successfull!")


def parse_args():
"""Parse command-line arguments."""
parser = argparse.ArgumentParser(description="Setup script for yahpo-gym.")
parser.add_argument(
"dest_dir",
help="Destination directory for cloning meta-data required for yahpo-gym.",
)
return parser.parse_args()


def main():
"""Main function."""
args = parse_args()
print(args)
setup(args.dest_dir)
test()


if __name__ == "__main__":
main()
12 changes: 12 additions & 0 deletions yahpo_train/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,18 @@ keywords = [
"yahpo",
]
classifiers = [
"Intended Audience :: Developers",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: Apache Software License",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
"Topic :: Software Development :: Libraries :: Python Modules",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3 :: Only",
"Development Status :: 3 - Alpha",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Natural Language :: English",
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"Intended Audience :: Science/Research",
Expand Down
Loading