Architecture

Overview

The Fluid-Slurm-GCP system consists of three types of Google Compute Engine (GCE) instances :

  1. Login nodes
  2. Controller node
  3. Compute nodes

Since the v2.0.0 upgrade in February 2020, multiple groups of compute nodes can be aligned with a number of compute partitions. Further, these compute nodes can be deployed to any VPC subnetwork in any GCP project, provided appropriate firewall rules, IAM roles, and quotas are specified. This flexibility permits a simplified approach towards constructing an HPC cluster that can leverage compute cycles from all GCP regions worldwide. Further, the multi-project distribution of compute resources, allows billing administrators handling financial operations (FinOps) to obtain granular breakouts of compute expenses by user, research team/grant, software application, etc.

When launching from the Fluid-Slurm-GCP Marketplace page, all resources are launched in a single GCP project. You can use cluster-services toolkit and the cluster configuration file to customize your cluster to meet your business and technical requirements.

Fluid Numerics has also developed terraform deployment scripts to import and control more complex deployments that can be built out from an initial marketplace deployment. Reach out to support@fluidnumerics.com to learn how to gain access to these terraform scripts and for assistance in designing your customized HPC cluster on Google Cloud Platform.

Login Nodes

Login nodes serve as the primary point of connection to the outside world. Users access the cluster and the Slurm job scheduler through the login nodes using ssh.

For ssh access to function properly, your network should allow tcp:22 from all IP addresses, or from a set of whitelisted IP addresses. Additionally, users must be granted appropriate permissions and must also attach RSA keys to their Cloud Identity profile. Keep in mind, administrators can manage each user's Cloud Identity profile with the Admin SDK; reach out to Fluid Numerics Support if you'd like help managing organization access with IAM and the Admin SDK.

Controller

The controller hosts the /home, /apps, and /etc/munge directories by default and hosts the Slurm Controller Daemon (slurmctld), Slurm Database Daemon (Slurmdbd), and the Slurm database. The provided cluster-services CLI can be used to mount /home from a different fileserver, such as Filestore.

The Slurm Controller Daemon is responsible for scheduling jobs and job steps to run on GCP resources. It additionally manages creation and deletion of GCP compute resources through the use of the Slurm Powersave module.

The Slurm database daemon logs job accounting details to the Slurm database. By default, the slurm database is a Mariadb database that is located on the controller. Reach out to Fluid Numerics Support if you'd like help migrating the Slurm database to CloudSQL to improve scalability and redundancy.

Compute Nodes

In Fluid Numerics Slurm-GCP, compute nodes are grouped in the cluster-configuration schema as GCE instances that have identical characteristics ( e.g. vCPU count, memory size, disk type, disk size, accelerator type and count, etc. ). Any number of compute node groups can be aligned with Slurm partitions, giving you the flexibility of aligning users and Slurm account groups with access to GCP resources. Additionally, this arrangement allows you to build high availability partitions that span multiple GCP zones and regions.

Compute nodes can be marked as static or ephemeral. Static compute nodes remain as active compute instances in GCP, whereas ephemeral compute nodes are created and deleted on-the-fly based on scheduled workloads.

Static Compute Nodes

Static compute nodes offer the benefit of near-immediate job launch. However, you must pay for the instances, even if they are idle.

Ephemeral Compute Nodes

Jobs executed on ephemeral compute instances can take up to 60 seconds to begin executing. This delayed startup time occurs if the instances must be created before running the job. However, ephemeral compute nodes can offer significant cost savings when idle times are expected.