slds-lmu · pfistfl · Apr 9, 2024 · Apr 10, 2024 · Apr 10, 2024 · Apr 2, 2024
diff --git a/.gitignore b/.gitignore
@@ -136,7 +136,6 @@ Pipfile.lock
 notes
 
 *.db
-*.toml
 
 experiments/
 

diff --git a/README.md b/README.md
@@ -124,17 +124,15 @@ We want to add several features to **yahpo_gym** in future versions:
 - [rbv2](https://github.com/pfistfl/rbv2) (R-Package) can be used to reproduce runs from all `rbv2_*` in a real setting.
 - [iaml](https://github.com/sumny/iaml) (R-Package) can be used to reproduce runs from all `iaml_*` in a real setting.
 - [HPOBench](https://github.com/automl/HPOBench) can be used to reproduce several other scenarios in a real setting. Furthermore, we soon hope to integrate our surrogates with **HPOBench** in order to provide a single, common API.
+- [SyneTune](https://github.com/awslabs/syne-tune) offers additional benchmarks and helpful tuning. `SyneTune`can be used with yahpo gym instances!
 
 ### Citation
 
 If you use YAHPO Gym, please cite the following paper:
 
 - Pfisterer, F., Schneider, L., Moosbauer, J., Binder, M., & Bischl, B. (2022). YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization. In International Conference on Automated Machine Learning.
 
-Moreover, certain `scenarios` built upon previous work, e.g., the `lcbench` scenario uses data from:
-
-- Zimmer, L., Lindauer, M., & Hutter, F. (2021). Auto-Pytorch: Multi-Fidelity Metalearning for Efficient and Robust AutoDL. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(9), 3079-3090.
-- Zimmer, L. (2020). data_2k_lw.zip. figshare. Dataset. https://doi.org/10.6084/m9.figshare.11662422.v1, Apache License, Version 2.0.
+Moreover, depending on which scenarios you use, see the **Overview over benchmark instances** above.
 
 **Please make sure to always also cite the original data sources as YAHPO Gym would not have been possible without them!**
 

diff --git a/yahpo_gym/.gitignore b/yahpo_gym/.gitignore
@@ -8,3 +8,5 @@ build/
 
 # Ignore OSX finder directories
 .DS_Store
+
+.vscode/*
diff --git a/yahpo_gym/README.md b/yahpo_gym/README.md
@@ -30,7 +30,7 @@ YAHPO Gym consists of several `scenarios`. A scenario (e.g. `lcbench`) is a coll
 
 The **full, up-to-date overview** can be obtained from the [Documentation](https://slds-lmu.github.io/yahpo_gym/scenarios.html).
 The fidelity is given either as the dataset fraction `fraction` or the number of epochs `epoch`.
-Search spaces can be numeric, mixed and have dependencies (as indicated in the `H` column).
+Search spaces can be numeric, mixed and have hierarchical dependencies (as indicated in the `H` column).
 
 Original data sources are given by:
 
@@ -46,12 +46,12 @@ Original data sources are given by:
 ### Installation
 
 ```console
-pip install yahpo-gym==1.0.1
+pip install yahpo-gym==2.0
 ```
 
 ### Setup
 
-To run a benchmark you need to obatin the ONNX model (`new_model.onnx`), [ConfigSpace](https://automl.github.io/ConfigSpace/) (`config_space.json`) and some encoding info (`encoding.json`).
+To run a benchmark you need to obatin the ONNX model (`new_model.onnx`), [ConfigSpace](https://automl.github.io/ConfigSpace/) (`config_space.json`) and some encoding info (`encoding.json`) for the respective benchmark.
 
 You can download these [here (Github)](https://github.com/slds-lmu/yahpo_data) or [here (Syncshare)](https://syncandshare.lrz.de/getlink/fiCMkzqj1bv1LfCUyvZKmLvd/).
 
@@ -63,6 +63,7 @@ from yahpo_gym import local_config
 local_config.init_config()
 local_config.set_data_path("path-to-data")
 ```
+
 You can test whether the setup was successful by instantiating the object as documented in the **Usage** section below.
 
 ### Usage
@@ -98,8 +99,8 @@ result_dict = b.objective_function(configuration=config, fidelity={"epoch": 50},
 
 #### Using YAHPO with `syne-tune`
 
-We are currently working on integrating `yahpo_gym` with `syne-tune`.
-See [here](https://github.com/awslabs/syne-tune/pull/337) for progress on this issue.
+`yahpo-gym` is also integrated in [`syne-tune`](https://github.com/awslabs/syne-tune).
+See [syne-tune docs](https://github.com/awslabs/syne-tune/blob/main/examples/launch_asha_yahpo.py) for an extensive example.
 
 ### A note on OpenML Task IDs
 
@@ -111,7 +112,7 @@ To query meta information, use https://www.openml.org/t/<task_id>.
 
 ### Example: Tuning an instance using HPBandSter
 
-We include a full example for optimization using **BOHB** on a YAHPO Gym instance in a [jupyter notebook](https://github.com/slds-lmu/yahpo_gym/blob/main/yahpo_gym/notebooks/tuning_hpandster_on_yahpo.ipynb).
+We include a full example for optimization using **BOHB** from [HPBandSter](https://github.com/automl/HpBandSter) on a YAHPO Gym instance in a [jupyter notebook](https://github.com/slds-lmu/yahpo_gym/blob/main/yahpo_gym/notebooks/tuning_hpandster_on_yahpo.ipynb).
 
 ### All Examples
 
@@ -126,11 +127,7 @@ We include a full example for optimization using **BOHB** on a YAHPO Gym instanc
 If you use YAHPO Gym, please cite the following paper:
 
 - Pfisterer, F., Schneider, L., Moosbauer, J., Binder, M., & Bischl, B. (2022). YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization. In International Conference on Automated Machine Learning.
-
-Moreover, certain `scenarios` built upon previous work, e.g., the `lcbench` scenario uses data from:
-
-- Zimmer, L., Lindauer, M., & Hutter, F. (2021). Auto-Pytorch: Multi-Fidelity Metalearning for Efficient and Robust AutoDL. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(9), 3079-3090.
-- Zimmer, L. (2020). data_2k_lw.zip. figshare. Dataset. https://doi.org/10.6084/m9.figshare.11662422.v1, Apache License, Version 2.0.
+In addition, YAHPO contains `scenarios` built upon previous work, see the *Source* column in the table above for citation info.
 
 **Please make sure to always also cite the original data sources as YAHPO Gym would not have been possible without them!**
 

diff --git a/yahpo_gym/docs/source/examples.rst b/yahpo_gym/docs/source/examples.rst
@@ -12,3 +12,5 @@ We provide several examples for using `YAHPO Gym`:
 - `Paper Examples <https://github.com/slds-lmu/yahpo_gym/blob/main/yahpo_gym/notebooks/code_sample.ipynb>`_
 
 - `Paper Experiments <https://github.com/slds-lmu/yahpo_exps/tree/main/paper>`_
+
+- `Using YAHPO with syne-tune <https://github.com/awslabs/syne-tune/blob/main/examples/launch_asha_yahpo.py>`_
diff --git a/yahpo_gym/docs/source/frequently_asked.rst b/yahpo_gym/docs/source/frequently_asked.rst
@@ -8,13 +8,9 @@ Citation
 
 If you use YAHPO Gym, please cite the following paper:
 
-* Pfisterer, F., Schneider, L., Moosbauer, J., Binder, M., & Bischl, B. (2022). YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization. In International Conference on Automated Machine Learning.
+* Pfisterer, F., Schneider, L., Moosbauer, J., Binder, M., & Bischl, B. (2022). YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization. Proceedings of the First International Conference on Automated Machine Learning, in Proceedings of Machine Learning Research 188:3/1-39
 
-Moreover, certain `scenarios` built upon previous work, e.g., the `lcbench` scenario uses data from:
-
-* Zimmer, L., Lindauer, M., & Hutter, F. (2021). Auto-Pytorch: Multi-Fidelity Metalearning for Efficient and Robust AutoDL. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(9), 3079-3090.
-
-* Zimmer, L. (2020). data_2k_lw.zip. figshare. Dataset. https://doi.org/10.6084/m9.figshare.11662422.v1, Apache License, Version 2.0.
+In addition, please cite the scenarios used, see the `Scenarios table <https://slds-lmu.github.io/yahpo_gym/scenarios.html>`_.
 
 OpenML task_id and dataset_id
 =======================
@@ -100,6 +96,13 @@ While XGBoost can be considered state-of-the art on tabular data and very good p
 
 We are looking into this issue and will try to address it in upcoming versions of `YAHPO Gym`.
 
+Replications and the **repl** parameter (rbv2_, iaml_)
+=======================
+Metrics obtained from the *rbv2_*, and *iaml_* benchmarks include a **repl** hyperparameter. 
+Surrogate models here model the individual folds of a 10-fold CV run (as defined in the OpenML tasks) which allows for evaluating scenarios *multi-fidelity* scenarios such as running only a subset of cross-validation folds.
+Replications here model the `cummulative mean` of the previous folds, i.e. fold 3 is the mean performance in the first three folds.
+By default, the **repl** parameter is fixed to the 10th cv fold.
+
 Noisy Surrogates
 =======================
 
@@ -112,3 +115,4 @@ This internally works as follows:
 While this works well in theory, this was not tested thoroughly and the use of noisy surrogates is therefore discouraged at the moment.
 Furthermore, we have not extensively tested whether all noisy surrogates indeed correctly return noisy predictions.
 We will improve this in upcoming versions of `YAHPO Gym`.
+
diff --git a/yahpo_gym/docs/source/getting_started.rst b/yahpo_gym/docs/source/getting_started.rst
@@ -6,14 +6,27 @@ Getting Started
 Installation (Python)
 =======================
 
-`YAHPO Gym` can be installed using `pip`:
+
+`YAHPO Gym` can be installed from **PyPy** using `pip`:
+
+
+.. code-block:: bash
+
+    pip install yahpo-gym
+
+or the latest version directly from GitHub:
 
 .. code-block:: bash
 
     pip install yahpo-gym
 
 or the latest version directly from GitHub:
+
+.. code-block:: bash
 
+    pip install yahpo-gym
+
+
 .. code-block:: bash
 
     pip install "git+https://github.com/slds-lmu/yahpo_gym#egg=yahpo_gym&subdirectory=yahpo_gym"

diff --git a/yahpo_gym/docs/source/scenarios.rst b/yahpo_gym/docs/source/scenarios.rst
@@ -30,7 +30,8 @@ Original data sources are given by:
 * [5] Zimmer, L. (2020). data_2k_lw.zip. figshare. Dataset. https://doi.org/10.6084/m9.figshare.11662422.v1, Apache License, Version 2.0.
 * [6] None, simply cite Pfisterer, F., Schneider, L., Moosbauer, J., Binder, M., & Bischl, B. (2022). YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization. In International Conference on Automated Machine Learning.
 
-Please make sure to always also cite the original data sources as YAHPO Gym would not have been possible without them!
+Please make sure to always also **cite** the original data sources as YAHPO Gym would not have been possible without them!
+
 
 In `yahpo_gym`, there is a `Configuration` object for each **scenario**. 
 

diff --git a/yahpo_gym/pyproject.toml b/yahpo_gym/pyproject.toml
@@ -1,5 +1,7 @@
 [build-system]
-requires = ["setuptools"]
+
+requires = ["setuptools>=61.0"]
+
 build-backend = "setuptools.build_meta"
 
 [project]
@@ -17,6 +19,18 @@ keywords = [
     "yahpo",
 ]
 classifiers = [
+    "Intended Audience :: Developers",
+    "Intended Audience :: Science/Research",
+    "License :: OSI Approved :: Apache Software License",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence",
+    "Topic :: Software Development :: Libraries :: Python Modules",
+    "Operating System :: OS Independent",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3 :: Only",
+    "Development Status :: 3 - Alpha",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Natural Language :: English",
     "Development Status :: 3 - Alpha",
     "Intended Audience :: Developers",
     "Intended Audience :: Science/Research",
@@ -42,6 +56,9 @@ docs = [
     "pandas",
 ]
 
+[project.scripts]
+setup-yahpo = "yahpo_gym.scripts.setup_yahpo:main"
+
 [project.urls]
 repository = "https://github.com/slds-lmu/yahpo_gym"
 homepage = "https://slds-lmu.github.io/yahpo_gym/"
@@ -50,7 +67,7 @@ documentation = "https://slds-lmu.github.io/yahpo_gym/"
 
 [tool.ruff]
 line-length = 200
-include = ["pyproject.toml", "yahpo_gym/*.py"]
+include = ["pyproject.toml", "yahpo_gym/*.py", "scripts/*.py"]
 indent-width = 4
 target-version = "py310"
 fix = true
@@ -62,4 +79,5 @@ multi_line_output = 3
 
 [tool.mypy]
 check_untyped_defs = true
-ignore_missing_imports = true
+ignore_missing_imports = true
+
diff --git a/yahpo_gym/setup.py b/yahpo_gym/setup.py
@@ -41,4 +41,5 @@
         ],
     },
     keywords=["module", "inference", "yahpo"],
+    url="https://github.com/slds-lmu/yahpo_gym",
 )
diff --git a/yahpo_gym/yahpo_gym/get_suite.py b/yahpo_gym/yahpo_gym/get_suite.py
@@ -1,9 +1,10 @@
 from pandas import read_json
 from yahpo_gym.local_config import local_config
 
-def get_suite(type:str, version:float = 1.0):
+
+def get_suite(type: str, version: float = 1.0):
     """
-    Interface for benchmark scenario meta information. 
+    Interface for benchmark scenario meta information.
     Abstract base class used to instantiate configurations that contain all
     relevant meta-information about a specific benchmark scenario.
 
@@ -12,20 +13,27 @@ def get_suite(type:str, version:float = 1.0):
     type: str
         The type of benchmark to be used. Can be either 'single' (single-objective) or 'multi' (multi-objective).
     version: float
-        The version of the benchmark to be used.
+        The version of the benchmark to be used. Defaults to 1.0.
     """
-    assert type in ['single', 'multi'], "type must be either 'single' or 'multi'"
-    assert _data_has_version(version), "version must coincide with version in `local_config.data_path`"
+    assert type in ["single", "multi"], "type must be either 'single' or 'multi'"
+    assert _data_has_version(
+        version
+    ), "version must coincide with version in `local_config.data_path`"
     # Get file
-    fp = local_config.data_path.joinpath("benchmark_suites").joinpath(f"v{version}").joinpath(f"{type}.json")
+    fp = (
+        local_config.data_path.joinpath("benchmark_suites")
+        .joinpath(f"v{version}")
+        .joinpath(f"{type}.json")
+    )
     # Read json
     with open(fp, "r") as f:
-        data = read_json(f, orient='records')
+        data = read_json(f, orient="records")
     return data
 
+
 def _data_has_version(version: float):
     fp = local_config.data_path.joinpath("VERSION")
     with open(fp, "r") as f:
         for line in f:
-            if line.startswith(f'VERSION:{version}'):
+            if line.startswith(f"VERSION:{version}"):
                 return True
diff --git a/yahpo_gym/yahpo_gym/scripts/__init__.py b/yahpo_gym/yahpo_gym/scripts/__init__.py
diff --git a/yahpo_gym/yahpo_gym/scripts/setup_yahpo.py b/yahpo_gym/yahpo_gym/scripts/setup_yahpo.py
@@ -0,0 +1,55 @@
+#!/usr/bin/env python
+import argparse
+import subprocess
+from pathlib import Path
+from yahpo_gym import benchmark_set
+from yahpo_gym.local_config import local_config
+import yahpo_gym.benchmarks.lcbench  # noqa: F401#
+
+
+def setup(dest_dir: Path | str):
+    """
+    Clone yahpo data into <dest_dir>/yahpo_data
+    """
+    # Define the repository URL
+    yahpo_data_url = "https://github.com/slds-lmu/yahpo_data.git@v2_final"
+
+    # Run the git clone command
+    dest_dir = Path(dest_dir).joinpath("yahpo_data")
+    subprocess.run(["git", "clone", yahpo_data_url, dest_dir])
+
+    local_config.init_config()
+    local_config.set_data_path(dest_dir)
+
+
+def test():
+    bench = benchmark_set.BenchmarkSet("lcbench")
+    bench.instances
+    bench.set_instance("3945")
+    value = bench.config_space.sample_configuration(1).get_dictionary()
+    result = bench.objective_function(value)
+    print(f"Eval objective for {value}: {result}")
+    if result is not None:
+        print("Setup successfull!")
+
+
+def parse_args():
+    """Parse command-line arguments."""
+    parser = argparse.ArgumentParser(description="Setup script for yahpo-gym.")
+    parser.add_argument(
+        "dest_dir",
+        help="Destination directory for cloning meta-data required for yahpo-gym.",
+    )
+    return parser.parse_args()
+
+
+def main():
+    """Main function."""
+    args = parse_args()
+    print(args)
+    setup(args.dest_dir)
+    test()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/yahpo_train/pyproject.toml b/yahpo_train/pyproject.toml
@@ -17,6 +17,18 @@ keywords = [
     "yahpo",
 ]
 classifiers = [
+    "Intended Audience :: Developers",
+    "Intended Audience :: Science/Research",
+    "License :: OSI Approved :: Apache Software License",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence",
+    "Topic :: Software Development :: Libraries :: Python Modules",
+    "Operating System :: OS Independent",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3 :: Only",
+    "Development Status :: 3 - Alpha",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Natural Language :: English",
     "Development Status :: 3 - Alpha",
     "Intended Audience :: Developers",
     "Intended Audience :: Science/Research",