Skip to content

RemoteInputFiles

David Anderson edited this page Dec 11, 2024 · 8 revisions

Job-based input file management

Input files of BOINC jobs must be available on a public web server. For projects that use remote job submission, job submitters generally don't have login access to the BOINC server, so they can't store files there directly.

There are several options:

  • Files are served from a publicly-accessible server, possibly other than the BOINC server. They must be managed, and file immutability enforced, by a mechanism outside BOINC.
  • User file sandbox: job submitters maintain, via a web interface, a set of files on the server.
  • Job-based file management: files are automatically transferred from the submission machine to the BOINC server via Web RPCs.

This document describes the latter mechanism.

In this system, you must supply physical names of files that are globally unique. The easiest way to do this is to include a hash of the file contents in the name.

File cleanup is based on file/batch associations. You must create a batch (with create_batch()) before querying or uploading files. Each file can be associated with one or more batches. Files that are no longer associated with an active batch are automatically deleted from the server.

The system uses two Web RPCs. These are implemented as XML sent via HTTP POST; the RPC handler is html/user/job_files.php.

Python binding

The Python binding is part of the 'BOINC_SERVER' class described in remote job submission. This class provides two functions:

query_files(phys_names, batch_id=0, delete_time=0)

This takes a list of physical filenames. It returns a list of the indices of the files that are not currently on the server:

{'absent_files': {'file': '1'}}

For the other files, it creates batch/file associations as needed.

upload_files(local_names, phys_names, batch_id=0, delete_time=0)

This first calls query_files to get a list of missing files. Then it uploads these files and creates batch/file associations.

## C++ interface

The following C++ functions are provided (in lib/remote_submit.cpp).
They are to be called on the job submission host;
the files must exist on that host.
```c
extern int query_files(
    const char* project_url,
    const char* authenticator,
    std::vector<string> &boinc_names,        // must be unique, e.g. by including content hash
    int batch_id,
    std::vector<int> &absent_files,		// output
    std::string& error_message
);

Inputs:

  • project_url: the project's master URL
  • authenticator: the job submitter's authenticator.
  • boinc_names: a duplicate-free list of the BOINC's physical names of the files. These typically will include a hash (e.g. MD5) of the file contents.
  • batch_id: the ID of a batch whose jobs will reference the files (these jobs need not exist yet). The operation will fail if the user is not authorized to submit jobs to the batch's application.

Action: for each file, see if it exists on the server. If it does, create an association to the given batch.

Output:

  • return value: nonzero on error
  • absent_files: a list of files not present on the server (represented as indices into the boinc_names vector).
  • error_message: if error, an explanatory string.
extern int upload_files (
    const char* project_url,
    const char* authenticator,
    std::vector<string> &paths,
    std::vector<string> &boinc_names,
    int batch_id,
    std::string& error_message
);

Inputs:

  • project_url, authenticator, batch_id: as above.
  • paths: a list of paths of files to be uploaded
  • boinc_names: a list of BOINC names of these files (see above).
  • batch_id: the ID of a batch with which the files are associated. The operation will fail if the user is not authorized to submit jobs to the batch's application.

Action: Upload the files, and create associations to the given batch.

Output:

  • return value: nonzero on error
  • error_message: if error, an explanatory string.

If you use this system, periodically run the script html/ops/delete_job_files. This will delete files that are no longer associated with an active batch.

File size limits

Note: This mechanism upload files via a PHP script. PHP's default max file upload size is 2MB. To increase this, edit /etc/php.ini, and change, e.g.

upload_max_filesize = 64M
post_max_size = 64M
Clone this wiki locally