Skip to content

Latest commit

 

History

History
29 lines (22 loc) · 1.17 KB

README.md

File metadata and controls

29 lines (22 loc) · 1.17 KB

Post-Processing for ML-LLM-Bench Dataset

This folder contains the post-processing scripts for the ML-LLM-Bench dataset. You can load the dataset using the following code:

from datasets import load_dataset

ml_bench = load_dataset("super-dainiu/ml-bench")    # splits: ['full', 'quarter']

The dataset contains the following columns:

  • github_id: The ID of the GitHub repository.
  • github: The URL of the GitHub repository.
  • repo_id: The ID of the sample within each repository.
  • id: The unique ID of the sample in the entire dataset.
  • path: The path to the corresponding folder in LLM-Bench.
  • arguments: The arguments specified in the user requirements.
  • instruction: The user instructions for the task.
  • oracle: The oracle contents relevant to the task.
  • type: The expected output type based on the oracle contents.
  • output: The ground truth output generated based on the oracle contents.
  • prefix_code: The code snippet for preparing the execution environment

If you want to run ML-LLM-Bench, you need to do post-processing on the dataset. You can use the following code to post-process the dataset:

bash scripts/post_process/prepare.sh