Custom VM Images

When should I create custom VM images ?

Creating custom virtual machine images for HPC applications is a great option for teams that want to incorporate continuous integration into their development workflow. This strategy results in the ability to automatically test your application while producing a virtual machine image that is ready for use on your Fluid-Slurm-GCP Cluster. Additionally, if your customers and users operate in other Google Cloud Projects, you can easily share your VM image with them to use on their Fluid-Slurm-GCP cluster.

If you'd like for your application to be accessible beyond Google Cloud and Fluid-Slurm-GCP, consider either creating a Spack package or containerizing your application.

Use Fluid Numerics Provided Custom VM Images

Fluid Numerics is working on a library of application VM images that are ready to deploy with your Fluid-Slurm-GCP cluster. The codelabs below will walk you through configuring you cluster to use these images.

If there's an image you'd like to see offered, you can open a support ticket with our team.

Create a custom VM image

To create a custom VM image, you will use Google Cloud Build and Hashicorp's Packer to execute a script that you provide to install your application on a Google Compute Engine VM. This requires creating a cloudbuild.yaml file and a packer.json to help automate this process. Additionally, this setup will allow you to leverage build trigger. We've provided a public, open-source repository with a template and a few examples to help you get started.

Requirements

To get started, you'll need

Set up your repository

The steps below will walk you through setting up a cloudbuild.yaml and packer.json file in your application's repository. If you haven't done so already, clone your application's repository to your local workstation or into Google Cloud Shell.

  • Clone the fluid-slurm-gcp_custom-image-bakery repository

  • Copy the imaging subdirectory into your application's directory hierarchy.

  • Edit the imaging/startup-script.sh file. This file will contain the instructions for building and installing your application and your application dependencies. We recommend that you install your application under /usr/local. When installing your application, you will need to provide a mechanism bring the application binary into the user's PATH and the application shared library dependencies into LD_LIBRARY_PATH. You can either write a modulefile and save it under /share/modulefiles, or you can write a script that appends to PATH and LD_LIBRARY_PATH and save it under /etc/profile.d/. Check out our example startup-script that shows how to install OpenFOAM on a Google Cloud VM.

  • Edit the imaging/cloudbuild.yaml so that the default _IMAGE_NAME is the name of your application.

Submit a build

To test out your applications build with Cloud Build, you can simply run the following command from the root directory of your application's repository.

gcloud builds submit . --async --project=PROJECT-ID --config=imaging/cloudbuild.yaml --substitutions=_DISK_SIZE_GB=50

You can check your build status and history from the cloud console at https://console.cloud.google.com/cloud-build/builds.

During this process, Cloud Build uses packer to deploy a GCE instance (the "imaging node") with the fluid-slurm-gcp-compute VM image. Then, packer executes your startup-script.sh file on the instance. When the script is finished, the imaging node is deleted and the disk is saved as a GCE VM image.

Configure your compute partitions to use your image

Once you have a virtual machine image created in your GCP, you can now configure your cluster to use that virtual machine image.

  • Log in to your cluster's controller instance. You can find this instance (and it's IP address) in the compute engine page

  • Create a cluster-configuration file using the cluster-services CLI

sudo su
cluster-services list all > config.yaml

  • Open the config.yaml file in a text editor and edit the partitions block to create a partition for running your application. Make sure to set the disk size to be at least the size used to create your image. For example, to update your first partition to use your application, update the following variables in config.yaml :

  • partitions[0].name = APPLICATION-NAME

  • partitions[0].machines[0].disk_size_gb = 50

  • partitions[0].machines[0].image = projects/PROJECT-ID/global/images/IMAGE-NAME

  • Save the changes in config.yaml. Note that the machine_type you choose depends on your applications compute requirements.

  • Update your cluster's partitions using cluster-services. First, preview the changes you are about to make

cluster-services update partitions --config=config.yaml --preview

  • Once you are ready to apply the changes, run the same command without the --preview flag.

cluster-services update partitions --config=config.yaml

Now, when you submit jobs to this partition and machine block, compute nodes will be launched using your custom VM image.

Automate Builds : Set up build triggers

Now that you've added the necessary tooling to your application's repository to create a VM image, you can set up build triggers. Build triggers are useful for automating builds and tests of your application when commits are pushed to your Github, Bitbucket, or Google Source Repositories.

Manually configure

To get started, simply

When configuring the Build Configuration for the trigger, set the following variables

  • File Type = Cloud Build configuration file (yaml or json)

  • Cloud Build configuration file location = /imaging/cloudbuild.yaml

Under the Advanced configurations, add the following substitution variables

  • _ZONE = "ZONE" | Set the zone to the GCP zone where the imaging node will be deployed.

  • _SUBNETWORK = "SUBNETWORK" | Set the subnetwork to use when deploying the imaging node. If you exclude this variable, the build will use the default subnetwork.

  • _IMAGE_NAME = "IMAGE-NAME" | Set the name for the resulting VM image.

Click create.

Terraform Infrastucture as Code

Where possible, we encourage the use of infrastructure as code to manage your CI infrastructure. Below, we provide step by step instructions for version controlling your cloud build infrastructure with Terraform.


We recommend that you use your Google Cloud Shell to deploy your build triggers with Terraform. Cloud Shell comes with terraform preinstalled and configured with Google Cloud authentication


terraform {
backend "gcs" {
bucket = "{GCS-BUCKET-NAME}"
prefix = "{APPLICATION-NAME}-build-triggers"
}
}
provider "google" {
version = "3.9"
}
resource "google_cloudbuild_trigger" "prod-trigger" {
name = "{APPLICATION-NAME}-prod"
trigger_template {
branch_name = "master"
repo_name = "{GIT-REPO}"
project_id = "{PROJECT-ID}"
}
substitutions = {
_ZONE = "{ZONE}"
_IMAGE_NAME = "{APPLICATION-NAME}"
_SUBNETWORK = "{SUBNETWORK}"
}
filename = "imaging/cloudbuild.yaml"
}

  • Save the file. Run the following sequence of commands to deploy your build triggers

terraform init
terraform validate
terraform plan -out=tfplan
terraform apply "tfplan" --auto-approve

Keep in mind that you can also create build triggers for managing your build triggers. Read more on managing infrastructure-as-code with Cloud Build.

Integrate with a Gitflow development strategy

Gitflow is workflow and branching model that maintains two evergreen git branches

  • dev | A branch for development work that may contain untested builds.

  • prod | A branch for production builds that have been thoroughly tested.

Feature and bugfix branches branch off of dev for developers to carry out work independently. The dev branch is used for integrating contributions from multiple developers. It can be helpful to test every commit into the dev and prod automatically. You can accomplish this by creating two Cloud Build triggers, one for each branch.

Manually Configure

You can manually create triggers through the cloud console, as described previously. To integrate with a Gitflow development workflow, create two triggers, one for a dev branch and one for prod. The branch that triggers a build is set in the Source configuration settings under Branch.

For the dev build trigger, set _IMAGE_NAME = "{APPLICATION-NAME}-dev".For the prod build trigger, set _IMAGE_NAME = "{APPLICATION-NAME}". With this setup, you will be able to keep your production images stable, while having the riskier dev images available for testing.

Terraform

If you're using Terraform, you can use the template hcl block below in your tf/main.tf file to create both triggers.

resource "google_cloudbuild_trigger" "prod-trigger" {
name = "{APPLICATION-NAME}-prod"
trigger_template {
branch_name = "prod"
repo_name = "{GIT-REPO}"
project_id = "{PROJECT-ID}"
}
substitutions = {
_ZONE = "{ZONE}"
_IMAGE_NAME = "{APPLICATION-NAME}"
_SUBNETWORK = "{SUBNETWORK}"
}
filename = "imaging/cloudbuild.yaml"
}

resource "google_cloudbuild_trigger" "dev-trigger" {
name = "{APPLICATION-NAME}-dev"
trigger_template {
branch_name = "dev"
repo_name = "{GIT-REPO}"
project_id = "{PROJECT-ID}"
}
substitutions = {
_ZONE = "{ZONE}"
_IMAGE_NAME = "{APPLICATION-NAME}-dev"
_SUBNETWORK = "{SUBNETWORK}"
}
filename = "imaging/cloudbuild.yaml"
}