GitHub - huntdatacenter/BlueBox: BlueBox helps to move into distributed compute with your research workload.

BlueBox helps to move into distributed compute with your research workload. It simplifies installation of dependency packages on multiple servers. Handling data, code and results is still as easy as with single machine.

Setup your home first (master)

Install dependencies on your home server:

sudo apt update && sudo apt install -y tox sshpass

Get this repository:

git clone https://github.com/huntdatacenter/bluebox.git && cd bluebox

Use code, data, and results folders in the repository for synchronisation (read below). We provide basic examples, but otherwise are these folders excluded from git repository, so you can keep using them and get updates.

Before using your IAAS nodes create the node list vim hosts.txt. Follow example hosts:

[email protected]
[email protected]
[email protected]

Usage

Run make to get help on commands:

lint                 Run linter
setup                Setup nodes for use
data                 Push data
deps                 Install dependencies
code                 Push code
results              Pull results
clean                Clean results remote
list                 List results remote
cleandata            Clean data remote
listdata             List data remote
run                  Run tasks.txt
help                 Show this help

Setting up environment on IAAS nodes (workers)

When using first time or adding nodes run initial setup of IAAS nodes.

sets up ssh keys
common dependencies
code dependencies

make setup

Dependencies

If you have specific dependencies (apt, pip, R, or conda packages) for your code follow example.packages.yml when defining your own config package.yml. If you just need to update these dependencies, on nodes that already have been set up, run:

make deps

Push code

Synchronise scripts from code directory to all IAAS servers:

make code

Push data

To simply push data from ./data directory to remote nodes run:

make data

If you need to remove the data from remote nodes:

make cleandata

Pull results

To pull the results from remote nodes to ./results directory run:

make results

To clean the results on remote nodes after running pulling them:

make clean

Run parallel workload

To run a workload make sure that your own scripts and data are in place on remote nodes. We are providing example in example.tasks.txt, with one command per line, e.g.:

bash example.sh J01
bash example.sh J02
bash example.sh J03
...

Starting a workload on nodes:

make run tasks=example.tasks.txt

Command above is wrapping distribution of tasks to hosts using parallel, which shortcuts long version:

parallel --ungroup --joblog task.log --sshloginfile hosts.txt --no-run-if-empty --workdir /home/ubuntu/bluebox :::: tasks.txt

j: number of jobs per node
ungroup: immediate output in terminal, do not use if need output of jobs organised in groups
workdir: directory with scripts/code GNU Parallel - manual pages

To run all (clean, code, data, tasks, and results) commands use the shortcut:

make run-all

In our example we just let the node sleep for some time and report which nodes are assigned jobs, when they start and when they are done.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
bluebox		bluebox
code		code
data		data
docs		docs
results		results
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
example.hosts.txt		example.hosts.txt
example.packages.yml		example.packages.yml
example.tasks.txt		example.tasks.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setup your home first (master)

Usage

Setting up environment on IAAS nodes (workers)

Dependencies

Push code

Push data

Pull results

Run parallel workload

About

Releases 4

Packages

Contributors 2

Languages

License

huntdatacenter/BlueBox

Folders and files

Latest commit

History

Repository files navigation

Setup your home first (master)

Usage

Setting up environment on IAAS nodes (workers)

Dependencies

Push code

Push data

Pull results

Run parallel workload

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages