Docker/Singularity at the Martinos Center
Due to security concerns with Docker, we do not support running Docker in full access mode on our Linux workstations or the compute cluster. In limited cases, we do support Docker in isolation (namespace remap) mode. This mode lets you build Docker containers but the isolation restriction prevents proper binding of local storage into the container when running them. This means the programs in the container cannot operate on your data outside the container.
In the HPC community a popular alternative is Singularity, now called Apptainer. This lets you run most Docker containers used for data analysis type workflows. It also lets you access the data n storage outside the container with the bind mechanism giving you the same access you have normally. However the big issue with Apptainer is in general it requires root access to build new Apptainer images from scratch. But there are a couple of workarounds to that.
UPDATE: in 2022 the name changed from Singularity to Apptainer. There is a symlink to apptainer, so the old singularity command will still work for now. Also, apptainer will properly notice any environment variables with names starting with SINGULARITY_ instead of APPTAINER_. However the name for the tmp/cache directory has changed from ~/.singularity to ~/.apptainer so if you symlinked this before you need to do it again.
Setup your environment for Docker/Apptainer
Docker is not installed on the center’s Linux Workstations by default. If you want it installed, you need to request it from the Martinos Help desk. Check for existance of the /var/run/docker.sock file. Singularity, being a simple user-space program is installed everywhere. Podman (a docker-like clone) will still need permissions setup for each user properly before it can be used.
A very important issue is that Docker (Podman) and Apptainer can end up writing many GBs to areas of your home directory that will overflow your quota. Also, your home directory is on a networked mounted filesystem and several aspects of container management do not work on network mounted filesystems. In particular builds will not work. To prevent these issues you must symlink link the directories they use to point to other storage volumes you own or in your group that are on local filesystems of the machine you are using.
The two places that need symlinking are ~/.apptainer and for Docker/Podman ~/.local/share/containers. Here is an example:
cd /local_mount/space/myworkstation/1/users/raines #<- change this to your workstation storage mkdir apptainer mkdir apptainer/tmp mkdir apptainer/cache mkdir docker rm -rf ~/.apptainer ln -s $PWD/apptainer ~/.apptainer rm -rf ~/.local/share/containers ln -s $PWD/docker ~/.local/share/containers
(do this or singularity/docker will fill up homedir or workstation OS disk at /tmp)
NOTE: Instead of symlinking ~/.apptainer one can instead set the location with APPTAINER_TMPDIR and APPTAINER_CACHEDIR. Or use them to override your symlink in cases where you are doing a build just to export the result to a Apptainer image file and do not need to keep the Apptainer sandbox files or Podman overlays. In these cases you should temporarily set these in your shell to space under /scratch if that exists on the machine. On the MLSC cluster this is done automatically. For example using /scratch when it exists on your machine:
mkdir -p /scratch/raines/{tmp,cache} cd /scratch/raines export APPTAINER_TMPDIR=/scratch/raines/tmp export APPTAINER_CACHEDIR=/scratch/raines/cache
Martinos users are welcome to use the machine icepuff1 with this scratch method to do Apptainer sandbox or Podman builds.
Running pre-built images from Docker Hub
Here is an example using dhcp-structural-pipeline.
cd /cluster/mygroup/users/raines mkdir dhcp-structural-pipeline cd dhcp-structural-pipeline apptainer pull dhcpSP.sif docker://biomedia/dhcp-structural-pipeline:latest
(this will take a long time)
(if instead you have a Docker container tar archive run:
apptainer build dhcpSP.sif docker-archive:///space/foobar/1/projects/dhcp-structural-pipeline.tar)
mkdir data
(copy your T1 and T2 data into new data subdir)
apptainer run -e -B $PWD/data:/data dhcp-SP.sif \ subject1 session1 44 -T1 /data/T1.nii.gz -T2 /data/T2.nii.gz -t 8 -d /data
The -e option sets a clean environment before running the container. For some software, you don’t want to do this in order to pass in settings by environment variables. If you do not use -e then you should remove certain variables that can break the container.
The LD_LIBRARY_PATH is one example that will really screw things up. Variables with XDG and DBUS in their name can also cause problems. In dhcp-structural-pipeline for example, if you have FSLDIR set to something under /usr/pubsw or /space/freesurfer it will fail since those paths will not be found in the container. Be aware and be careful.
Checkout using -e combined with the –env-file option for more consistent control of your shell environment when using containers.
Also note you do not need to pull the image every time you run it. You only pull it again to get new versions.
To make the file path environment inside the container very much like it is outside the container when running things normally on Martinos Linux machines, you can add the following options:
-B /autofs -B /cluster -B /space -B /homes -B /vast -B /usr/pubsw -B /usr/local/freesurfer
This would let you source the Freesurfer environment as normal inside the container. For that though the container would be need to be one with a non-minimal OS install that would have all the system libraries Freesurfer requires.
WARNING: The NVIDIA NGC containers do a ‘find -L /usr …’ in the entrypoint script on startup. So doing -B /usr/pubsw ends up making startup take over 15 minutes as it then searchs the 100’s of GBs of files in /usr/pubsw! This ‘find’ is pretty useless so there are two solutions:
- Just do not use -B /usr/pubsw if you don’t need that path in what you are running
- Add -B /cluster/batch/IMAGES/nvidia_entrypoint.sh:/usr/local/bin/nvidia_entrypoint.sh to your Singularity command line to overwrite the entrypoint script with a copy I made that removes the ‘find’
We have also discovered that most Docker images built for NVIDIA GPU use in AI/ML try to do some fancy stuff in their entrypoint script on startup to put the “correct” CUDA libs in a directory named /usr/local/cuda/compat/lib. You will get errors regarding changing this in Singularity since the container’s internal filesytem is unwritable. Also your CUDA programs in the container might fail if the CUDA libs in that directory are used.
To fix this add as an option to singularity
-B /opt:/usr/local/cuda/compat
This basically just nullfies that directory so it is not used and has no libaries. Singularity automatically adds to the LD_LIBRARY_PATH defined in the container a directory with the correct CUDA libs matching the driver running on the host.
For more info on singularity options run man apptainer-run or man apptainer-exec or read the User Guide.
The difference between run and exec is that run will run the default entrypoint startup script built-in to the container while exec will just run the command you give on the command line instead and skip the startup configuration.
Documentation can be found online here.
Building your own Apptainer images
In most cases, building Apptainer images locally requires full root access via sudo which we will not give on our Linux workstations. There are two workarounds to this. The simplest is that the organization that makes Apptainer has a remote build option that you can register for.
create Apptainer definition file named myimage.def then first try to build normally to see if root access is even needed apptainer build myimage.sif myimage.def if this fails due to root permission problems try remote build apptainer remote login SylabsCloud apptainer build --remote myimage.sif myimage.def
If there is anything sensitive in your build though you should not use remote build. Instead you should pull a SIF image of the base image you want to start with and then modify it as shown in the fakeroot/writable example in the next section below.
Modify existing Apptainer images
First you should check if you really need to modify the image. For example, if you are using Python in an image and simply need to add new packages via pip you can do that without modifying the image using PYTHONUSERBASE that you bind mount into the container. For example:
cd /cluster/itgroup/raines mkdir -p local/lib vi vars.txt #create it with your favorite editor (emacs, pico) cat vars.txt ---------------------------------------------------------- | PYTHONUSERBASE=/cluster/itgroup/raines | PYTHONPATH=$PYTHONUSERBASE/lib/python3.7/site-packages | PATH=$PYTHONUSERBASE/bin:$PATH ---------------------------------------------------------- apptainer exec --nv --env-file vars.txt \ -B /cluster/itgroup/raines -B /scratch:/scratch \ -B /autofs -B /cluster -B /space -B /vast \ /cluster/batch/IMAGES/tensorflow-20.12-tf2-py3.sif \ pip3 install nibabel apptainer exec --nv --env-file vars.txt \ -B /cluster/itgroup/raines -B /scratch:/scratch \ -B /autofs -B /cluster -B /space -B /vast \ /cluster/batch/IMAGES/tensorflow-20.12-tf2-py3.sif \ python3 /cluster/itgroup/raines/script_needing_nibabel_and_TF.py
To modify a existing SIF image container file, one needs to first convert it to a sandbox, run a shell inside the sandbox in fakeroot/writable mode and do the steps in that shell to modify the container as desired. Then you exit the container and convert the sandbox to a SIF file.
For this to work you will have to email help@nmr.mgh.harvard.edu to request to be added to the /etc/subuid file on the machine you will use for builds to turn on user namespace mapping. That machine need to have a large /scratch volume too (sandboxes do not work on network mounted volumes). You then do something like this example:
mkdir -p /scratch/$USER/{tmp,cache} cd /scratch/$USER export APPTAINER_TMPDIR=/scratch/$USER/tmp export APPTAINER_CACHEDIR=/scratch/$USER/cache apptainer build --sandbox --fakeroot myTF \ /cluster/batch/IMAGES/tensorflow-20.11-tf2-py3.sif apptainer shell --fakeroot --writable --net myTF > apt-get update > apt-get install -qqy python3-tk > python3 -m pip install matplotlib > exit apptainer build --fakeroot /cluster/mygroup/users/$USER/myTF.sif myTF
NOTE: you can do a rm -rf /scratch/$USER afterward but there will a be a few files you cannot delete due to the namespace mapping that happens. The daily /scratch cleaner job will eventually clean it up.
Building your own Docker image and running with Apptainer
There are plenty of tutorials on building Docker images online. You should go a read one of them to get started (here is the official one). The main things you need to keep in mind are to tag each build of your image with a unique version tag and that you DO NOT need to push/upload the image to any hub. The image you build is not a single file. It is a special overlay that ends up under /var/lib/docker (or for Podman in ~/.local/share/containers). You never touch these files directly. All interaction is via the docker/podman subcommands.
docker build --tag proj_A_recon:v1 . docker image ls
Note not all directives in a Dockerfile will convert to Singularity so some should be avoided. More info can be found here. Basically, only the FROM, COPY, ENV, RUN, CMD, HEALTHCHECK, WORKDIR and LABEL directives are supported. Directives that effect the eventual runtime of the container like VOLUME will not translate.
You can also do a “docker run -it –rm proj_A_recon bash” to shell into your container to verify and test things internally.
Next step is to convert to a Apptainer SIF image. This will be a single file created in the directory you run the command in.
apptainer build proj_A_recon.sif docker-daemon://proj_A_recon:v1
or if it was actually built with Podman you need to do
podman save --format oci-archive proj_A_recon:v1 -o proj_A_recon.tar apptainer build proj_A_recon.sif oci-archive://proj_A_recon.tar
And once that is done you can run it.
apptainer run -B /cluster/mygroup/data:/data proj_A_recon.sif
or
apptainer exec -B /cluster/mygroup/data:/data proj_A_recon.sif /bin/bash