Containerize your HPC applications

When should I containerize my HPC application ?

Creating a Docker or Singularity container for your HPC applications a great option for developers that want to incoroporate continuous integration into their development workflow and produce a portable build of their application for end-users and customers. This strategy results in the ability to automatically test your application while producing a container (Docker or Singularity) that is ready for use on your Fluid-Slurm-GCP Cluster. Additionally, if your customers and users operate in other Google Cloud Projects, or elsewhere, you can easily share your container image with them to use on their systems, provided they have a container platform available.

If your application will only run on your cluster or other Fluid-Slurm-GCP clusters, consider either creating a Spack package or creating a VM image instead.

Create a Docker Container

To create a Docker Container, you will use Google Cloud Build and a Docker builder to build a Docker container from a Dockerfile. This requires creating a cloudbuild.yaml file and a Dockerfile to help automate this process. Additionally, this setup will allow you to leverage build trigger.

Requirements

To get started, you'll need

Set up your repository

The steps below will walk you through setting up a cloudbuild.yaml and Dockerfile in your application's repository. If you haven't done so already, clone your application's repository to your local workstation or into Google Cloud Shell.

  • Create a Dockerfile. The example below starts from ubuntu:20.04. It copies your application's git repository (from the current path) into a directory called /build in the container. The RUN statements are used to execute commands inside the container, typically for installing your application and its dependencies.

FROM ubuntu:20.04
COPY . /build
RUN DEBIAN_FRONTEND=noninteractive apt-get -y update && apt-get install
DEPENDENCIES
RUN cd /build && I
NSTALL_APPLICATION_INSTRUCTIONS

  • Create a cloudbuild.yaml file, using this template as a start, replacing APPLICATION with the name of your application.

- id: APPLICATION Build
name: 'gcr.io/cloud-builders/docker'
args: ['build',
'.',
'-t',
'gcr.io/${PROJECT_ID}/
APPLICATION:${_TAG}',
]
substitutions:
_TAG: 'latest'
images: ['gcr.io/${PROJECT_ID}/
APPLICATION:${_TAG}']

Submit a build

To test out your applications build with Cloud Build, you can simply run the following command from the root directory of your application's repository.

gcloud builds submit . --async --project=PROJECT-ID

You can check your build status and history from the cloud console at https://console.cloud.google.com/cloud-build/builds.

During this process, Cloud Build uses Docker to create a Docker container in an isolated and secure environment. When Docker finishes, the resulting container image is published to your Google Container Registry.

Additional Notes

  • If your application uses MPI, make sure that the MPI installation inside your container matches the MPI installation on your HPC cluster. Note that Fluid-Slurm-GCP ships with OpenMPI 4.0.2, but you can build other MPI flavors with Spack.

  • If you are building GPU accelerated applications with HIP/ROCm, make sure you install the CUDA toolkit within your container and set HIP_PLATFORM=nvcc using an ENV statement before building your application. Currently, Google Cloud Platform and Fluid-Slurm-GCP only offer Nvidia GPUs. However, by leveraging HIP/ROCm, your application will be portable to AMD platforms (HIP_PLATFORM=hcc).

Run your Docker Container

Once you have a Docker image posted to your Google Container Registry GCR, you can use Singularity on your Fluid-Slurm-GCP cluster.

  • Log in to your cluster's login node

  • Load Singularity into your path by using Spack

spack load singularity

  • Use the gcloud SDK to configure credentials to pull from your Google Container Registry

gcloud auth configure-docker

  • Pull your image from your GCR. Be sure to replace PROJECT-ID with the GCP project ID where your container is hosted and APPLICATION with the name of your application's container. This will save APPLICATION.sif

singularity pull APPLICATION.sif docker://gcr.io/PROJECT-ID/APPLICATION:latest

  • To run your application on the cluster, create a batch file and submit a batch job. Use the example batch script below as a starting point, replacing APPLICATION with the name of your application and CMD with the commands you want to run in your container.

#!/bin/bash
#SBATCH --account=my-slurm-account
#SBATCH --partition=this-partition
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1

spack load singularity
singularity run
APPLICATION.sif CMD

  • Submit your job with sbatch

Running Containers with MPI

If your application uses MPI, make sure that the MPI installation inside your container matches the MPI installation on your HPC cluster. Note that Fluid-Slurm-GCP ships with OpenMPI 4.0.2. To run with MPI, you will need to load OpenMPI with Spack and wrap your singularity run call with mpirun. The example batch file below provides a good starting point.

#!/bin/bash
#SBATCH --account=my-slurm-account
#SBATCH --partition=this-partition
#SBATCH --ntasks=8
#SBATCH --ntasks-per-node=8

spack load singularity openmpi
mpirun -np ${SLURM_NTASKS} singularity run
APPLICATION.sif CMD

Running Containers with ROCm or CUDA

Singularity provides simple flags to expose GPUs to your container. If you have a GPU accelerated application, with either ROCm or CUDA, you need to add the --nv flag to run on Nvidia GPUs (currently, Google Cloud Platform only provides Nvidia GPUs). The example batch file below provides a good starting point for GPU accelerated containerized applications. Make sure that your compute partition is configured to have GPUs available.

#!/bin/bash
#SBATCH --account=my-slurm-account
#SBATCH --partition=this-partition
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1
spack load singularity
singularity run --nv
APPLICATION.sif CMD

Automate Builds : Set up build triggers

Now that you've added the necessary tooling to your application's repository to create a VM image, you can set up build triggers. Build triggers are useful for automating builds and tests of your application when commits are pushed to your Github, Bitbucket, or Google Source Repositories.

Manually configure

To get started, simply

When configuring the Build Configuration for the trigger, set the following variables

  • File Type = Cloud Build configuration file (yaml or json)

  • Cloud Build configuration file location = /imaging/cloudbuild.yaml

Under the Advanced configurations, add the following substitution variables

  • _ZONE = "ZONE" | Set the zone to the GCP zone where the imaging node will be deployed.

  • _SUBNETWORK = "SUBNETWORK" | Set the subnetwork to use when deploying the imaging node. If you exclude this variable, the build will use the default subnetwork.

  • _IMAGE_NAME = "IMAGE-NAME" | Set the name for the resulting VM image.

Click create.

Terraform Infrastucture as Code

Where possible, we encourage the use of infrastructure as code to manage your CI infrastructure. Below, we provide step by step instructions for version controlling your cloud build infrastructure with Terraform.


We recommend that you use your Google Cloud Shell to deploy your build triggers with Terraform. Cloud Shell comes with terraform preinstalled and configured with Google Cloud authentication


terraform {
backend "gcs" {
bucket = "{GCS-BUCKET-NAME}"
prefix = "{APPLICATION-NAME}-build-triggers"
}
}
provider "google" {
version = "3.9"
}
resource "google_cloudbuild_trigger" "prod-trigger" {
name = "{APPLICATION-NAME}-prod"
trigger_template {
branch_name = "master"
repo_name = "{GIT-REPO}"
project_id = "{PROJECT-ID}"
}
substitutions = {
_ZONE = "{ZONE}"
_IMAGE_NAME = "{APPLICATION-NAME}"
_SUBNETWORK = "{SUBNETWORK}"
}
filename = "cloudbuild.yaml"
}

  • Save the file. Run the following sequence of commands to deploy your build triggers

terraform init
terraform validate
terraform plan -out=tfplan
terraform apply "tfplan" --auto-approve

Keep in mind that you can also create build triggers for managing your build triggers. Read more on managing infrastructure-as-code with Cloud Build.

Integrate with a Gitflow development strategy

Gitflow is workflow and branching model that maintains two evergreen git branches

  • dev | A branch for development work that may contain untested builds.

  • prod | A branch for production builds that have been thoroughly tested.

Feature and bugfix branches branch off of dev for developers to carry out work independently. The dev branch is used for integrating contributions from multiple developers. It can be helpful to test every commit into the dev and prod automatically. You can accomplish this by creating two Cloud Build triggers, one for each branch.

Manually Configure

You can manually create triggers through the cloud console, as described previously. To integrate with a Gitflow development workflow, create two triggers, one for a dev branch and one for prod. The branch that triggers a build is set in the Source configuration settings under Branch.

For the dev build trigger, set _IMAGE_NAME = "{APPLICATION-NAME}-dev".For the prod build trigger, set _IMAGE_NAME = "{APPLICATION-NAME}". With this setup, you will be able to keep your production images stable, while having the riskier dev images available for testing.

Terraform

If you're using Terraform, you can use the template hcl block below in your tf/main.tf file to create both triggers.

resource "google_cloudbuild_trigger" "prod-trigger" {
name = "{APPLICATION-NAME}-prod"
trigger_template {
branch_name = "prod"
repo_name = "{GIT-REPO}"
project_id = "{PROJECT-ID}"
}
substitutions = {
_ZONE = "{ZONE}"
_IMAGE_NAME = "{APPLICATION-NAME}"
_SUBNETWORK = "{SUBNETWORK}"
}
filename = "cloudbuild.yaml"
}

resource "google_cloudbuild_trigger" "dev-trigger" {
name = "{APPLICATION-NAME}-dev"
trigger_template {
branch_name = "dev"
repo_name = "{GIT-REPO}"
project_id = "{PROJECT-ID}"
}
substitutions = {
_ZONE = "{ZONE}"
_IMAGE_NAME = "{APPLICATION-NAME}-dev"
_SUBNETWORK = "{SUBNETWORK}"
}
filename = "cloudbuild.yaml"
}