Skip to content

BUDA overview

David Anderson edited this page Nov 24, 2024 · 13 revisions

BOINC Universal Docker app (BUDA) is a framework for running Docker-based science apps on BOINC.

It's 'universal' in the sense that one BOINC app handles arbitrary science apps. The science app's Dockerfile and executables are in workunits rather than app versions.

On the server, there is a single BOINC app; let's call it 'buda'. This has app versions for the various platforms (Win, Mac, Linux) Each app version contains the Docker wrapper built for that platform.

There are various possible interfaces for job submission to BUDA. We could make a Python-based remote API. We (or others) could use this API to integrate it into batch systems.

But for starters, we implemented a generic (multi-application) web-based job submission system, using the per-user file sandbox system.

BUDA science apps and versions

BOINC provides a web interface for managing BUDA science apps and submitting jobs to them. These tools assume the following structure:

  • Each science app has a name, like 'worker' or 'autodock'.
  • A science app can have multiple variants, each with a different plan class. There might be variants for 1 CPU, for N CPUs, and for various GPU types. (BUDA handles science apps that use GPUs).

Each science app variant is a collection of files:

  • A Dockerfile
  • a config file, job.toml
  • input and output templates
  • A main program or script
  • Other files
  • A file 'file_list' listing the other files in template order.

The set of science apps and variants is represented by a directory hierarchy of the form

project/buda_apps/
    <sci_app_name>/
        cpu/
            ... files
        <plan_class>/
            ... files
        ...
    ...

Note: you can build this hierarchy manually but typically it's maintained using a web interface; see below.

This is similar to BOINC's hierarchy of apps and app versions, except:

  • It's represented in a directory hierarchy, not in database tables
  • Science app variants are not associated with platforms (since we're using Docker).
  • It stores only the current version, not a sequence of versions (that's why we call them 'variants', not 'versions').

BUDA is not polymorphic

Conventional BOINC apps are 'polymorphic': if an app has both CPU and GPU variants, you submit jobs without specifying which one to use; the BOINC scheduler makes the decision.

It would be possible to make BUDA polymorphic, but this would be complex, requiring significant changes to the scheduler. So - at least for now - BUDA is not polymorphic.

When you submit jobs you have to specify which plan class to use. This could be a slight nuisance: a plan class could have little computing power, and you might avoid using it, but then you wouldn't get the power.

Validators and assimilators

In the current BOINC architecture, each BOINC app has its own validator and assimilator. If multiple science apps "share" the same BOINC app, we'll need a way to let them have different validators and assimilators.

This could be built on the script-based framework; each science app could specify the names of validator and assimilator scripts, which would be stored in workunits.

Implementation notes

BUDA will require changes to the scheduler.

Currently: given a job, it scans app versions, looking for one host can accept based on plan class. That won't work here. The plan class is already fixed.

Instead:

  • add plan_class field to workunit (or could put in xml_doc)
  • if considering sending a WU to a host, and WU has a plan class
    • skip if no app version with that platform / plan class (e.g. can't send metal job to Win host)
    • skip if host can't handle the plan class

If we wanted to make BUDA polymorphic

  • The scheduler would have to scan the buda_apps dir structure (or we could add this info to the DB).

  • Jobs are tagged with BUDA science app name.

  • The scheduler scans versions of that science app.

  • If find a plan class the host can accept, build wu.xml_doc based on BUDA app version info.

The above is possible but would be a lot of work.

Clone this wiki locally