Below are my instructions for easy and fast implementation of jupyter notebooks on a SLURM cluster over SSH. These instructions are built for ready use on the Vera computing cluster (PSC-McWilliams Center), but can be generalized to any computing cluster that uses a recent version of SLURM.
There are two ways to run a jupyter notebook on a remote SLURM cluster: on a login node or on a compute node. The compute node instance gives you full access to the cluster's computational resources, but is slightly more difficult to set up than the login node method.
Each method has its drawbacks: Heavy computation on a login node will clog up other users' access to the system, while the compute node method will waste resources when idle. You should use the login node method when doing simple, resource-light computation (e.g. data transfer/exploration, plot generation, etc.) and the compute node method for doing high memory or extended computation (e.g. model fitting, simulation, etc.). Please be conscientious of your use.
- Install jupyter and its dependencies in your home directory. I would recommend doing this within a Conda environment for better module control. Vera has a pre-installed anaconda3 module environment that you can load with
module load anaconda3
. - Navigate to your account root directory
~
and generate a configuration file which dictates server settings.user@server:~$ jupyter notebook --generate-config
- Generate a hashed password to secure your jupyter instances. The following command will prompt you to enter a password and will automatically write it to your configuration file.
user@server:~$ jupyter notebook password
- Modify the configuration file located at
~/.jupyter/jupyter_notebook_config.py
and un-comment and change the following lines. You'll need to search through the config file to find each line.Choose a 4-digit port number other thanc.NotebookApp.open_browser = False c.NotebookApp.port = 8888 # (You can set this to any four-digit integer)
8888
to avoid overlapping with another user's jupyter instance
From your local machine, run
user@local:~$ ssh -L <port>:localhost:<port> -t <user>@<server> "jupyter notebook"
where <port>
is the port you set earlier, <user>
is your cluster user id, and <server>
is the address of the login server. The -L
flag tells ssh to tunnel the localhost:<port>
of the remote server to that of your local machine. The -t
flag opens up the connection as an interactive session, allowing you to pass SIGINT
(Ctrl-C) to end the jupyter notebook before killing the ssh connection. To open your jupyter notebook within a specific conda environment (e.g. <env>
), replace the command in quotations with source activate <env>; jupyter notebook
.
You can now access the jupyter server as if it was running on your own localhost. In your browser, navigate to http://localhost:<port>
.
You can end the jupyter instance and the ssh
bridge by either clicking 'Quit' in the Jupyter Dashboard or passing SIGINT
(Ctrl-C) to the running terminal.
Here are some miscellaneous notes on running on a login node:
- If you run into a "bash: jupyter: command not found" error, you have to replace the
"jupyter notebook"
command with the location of the jupyter binary in the server directories. You can find it usingwhich jupyter
. - If you're comfortable with leaving jupyter running indefinitely in the background on your server, remove the -t flag and add the -f flag. This forks the process to the background. Just know that, in order to shut down the server, you'll need to find the process ID and kill it manually. Here's a stack overflow with how to do this.
- You can wrap the long
ssh
command inside a simple bash script so you don't have to type it up every time. I've included my example in this repo as vera_login.sh. This is run with:user@local:~$ sh vera_login.sh
- This procedure can be modified to run a JupyterLab server instead of just Jupyter notebooks. To do so, simply replace
jupyter notebook
withjupyter lab
in the abovessh
command.
The below instructions will teach you how to submit a SLURM batch job to run an instance of jupyter on a compute node. Then, we will describe how to pipe signals from that compute node to your local machine to enable access through your local web browser. The system configuration for vera is listed here, including the available CPUs and memory of compute nodes.
This procedure is built off of this tutorial: http://docs.ycrc.yale.edu/clusters-at-yale/guides/jupyter/ , which describes a general procedure for remote jupyter access in a SLURM scheduler. Since that resource describes much of the what/how/why of the procedure, I'll just give you the important stuff.
- Follow Setup steps from Jupyter on a login node
- Copy jupyter.job into your veera home directory
~
. This contains all the information necessary for the SLURM scheduler to run your jupyter instance. - Change the output-error directory to a desired (junk) folder (lines 6 and 7 in jupyter.job). Here, Jupyter will dump text files containing all the typical stuff that would otherwise be printed to terminal.
- Change the desired
<port>
to the one chosen in the setup (line 11 in jupyter.job).
-
From your home directory, run:
user@server:~$ sbatch jupyter.job
This will submit your jupyter job to the cluster for execution. Assuming the cluster isn't too bogged down, your job should soon appear as RUNNING under your username.
user@server:~$ squeue
Your job should be assigned a node, displayed in the
NODELIST
field (e.g.compute-1-23
). The name of this node is necessary for the next step. I will refer to this as<node>
. -
In your local terminal, run the following command
user@server:~$ ssh -N -L <port>:<node>:<port> <user>@<server>
This forms a continuously running 'bridge' between your terminal and the compute node running your jupyter notebook.
Finally, you should be able to access your jupyter server on a local web browser. In your browser, navigate to http://localhost:<port>
.
Lastly, to close everything, you have to both stop the job running on the cluster and also kill the bridge connecting you to the cluster.
- To stop the jupyter job, you can either click 'Quit' in the Jupyter Dashboard or
scancel
theJOBID
of your jupyter process. - To close the ssh bridge, you can simply pass
SIGINT
(Ctrl-C) to the running bridge in your local terminal.
- Feel free to play around with the requested resources in jupyter.job (e.g. number of cpus, max runtime, etc.).
- Like in Jupyter on a login node, you can also wrap the ssh command in a simple bash script. An example is presented in vera_compute.sh.
- I've also included jupyter_gpu.job to demonstrate how to extend this analysis to run on a GPU node (like twig on vera).
Feel free to reach out to me at [email protected] if you have any questions.