Slurm-GCP

What is Fluid-Slurm-GCP?

Fluid Numerics' Slurm-GCP deployment leverages Google Compute Engine resources and the Slurm job scheduler to execute high performance computing (HPC) and high throughput computing (HTC) workloads. Compute nodes are created on-the-fly to execute jobs using custom compute node images. Slurm automatically removes idle compute nodes to minimize the expense of unused compute resources.

Quick Start

Looking to experiment with operating your own HPC cluster? Our Google Cloud Marketplace is a great place to get started with a click-to-deploy solution. Within 30 minutes, you can be running HPC and HTC applications using all of Google's datacenters worldwide.

Learn More

HPC Cluster with Terraform

Want to build out more complex infrastructure with a cloud-native HPC cluster and manage your resources using infrastructure-as-code? Use our terraform modules and examples to deploy and manage your fluid-slurm-gcp cluster with other infrastructure components.

Learn more

Fully Managed HPC Cluster

Let us help you! Simply let us know what you want to see in a HPC cluster. We will take care of provisioning Cloud Identity accounts, secure IAM policies, networking infrastructure, and your cloud-native HPC cluster. When ready, you'll be able to ssh to your cluster like a traditional HPC system.

Learn more

Get community support for Fluid-Slurm-GCP and discuss usage, feature request, bugs, and issues

Codelabs to walk through procedures for launching and adminstrating Slurm-GCP

Learn how to quickly launch Fluid Numerics' Slurm-GCP deployment and submit your first job on the cluster.

Learn more about how to operate and customize Slurm-GCP to fit your needs.

Submit a bug report, feature request, or general feedback/questions.

Slurm-GCP updates

July 2020 (v2.4.0)

  • Upgrade to Slurm 20.02
  • Add support for easy CloudSQL integration
  • GSuite SMTP Email Relay Integration support for email notification on job completion
  • Terraform modules and examples now publicly available!
  • (bugfix) Enabled storage.full auth-scope for GCSFuse

April 2020 (v2.3.0)

  • (feature upgrade) GCP Marketplace solutions now come with read-write access scopes to GCS storage
  • (bugfix) Resolved issue on compute nodes with hyperthreading disabled causing incorrect core-count configuration in slurm.conf
  • python/2.7.1 and python/3.8.0 are now available under /apps and through environment modules.

The cluster-services CLI has been updated with the Version 2.3.0 release of fluid-slurm-gcp. Updates include

  • Updated help documentation
  • The default_partition item has been added to the cluster-config schema which allows users to specify a default Slurm partition.
  • --preview flag for all update commands allows you to preview the changes to your cluster prior to actually making the changes
  • cluster-services add user --name flag removed. Individual users can be added to the default slurm account using cluster-services add user <name>
  • User's can now obtain template cluster-config blocks using cluster-services sample all/mounts/partitions/slurm_accounts
  • User provided cluster-configs are now validated against /apps/cls/etc/cluster-config.schema.json
  • Added cluster-services logging to /apps/cls/log/cluster-services.log
  • Fixed incorrect core count bug with the partitions[].machines[].enable-hyperthreading flag
  • Removed add/remove mounts/partitions options; mounts and partitions are now updated by using update all, update mounts, and/or update partitions calls.
  • add/remove user call only adds or removes a user to the default Slurm account. These calls are strictly convenience calls.
  • cluster-config schema now specified compute, controller, and login images in compute_image, controller_image, and login_image rather than in the partitions.machines, controller, and login list-objects.

March 2020 (Fluid-Slurm-GCP+Ubuntu)

Fluid Numerics has released another flavor of fluid-slurm-gcp on GCP Marketplace that is based on the Ubuntu operating system!

In addition to the flexible multi-project/region/zone of "classic" fluid-slurm-gcp, the fluid-slurm-gcp-ubuntu solution includes

  • Ubuntu 19.10 Operating System
  • zfs-utils for ZFS filesystem management (but no Lustre kernels)
  • apt package manager
  • Environment modules, Spack, and Singularity (same as the classic fluid-slurm-gcp)

March 2020 (Fluid-Slurm-GCP+OpenHPC)

Fluid Numerics has released another flavor of fluid-slurm-gcp on GCP Marketplace with pre-installed OpenHPC packages.

In addition to the flexible multi-project/region/zone of "classic" fluid-slurm-gcp, the fluid-slurm-gcp+openhpc solution includes

  • lmod Environment modules
  • GCC 8.2.0 compilers
  • MPICH, MVAPICH, and OpenMPI
  • Serial and Parallel IO Libraries (HDF5, NetCDF, Adios)
  • HPC Profilers/Performance Tuning Toolkits (Score-P, Tau, Scalasca)
  • Scientific libraries for HPC ( MFEM, PETSc, Trilinos, and much more!)

February 2020 ( v2.0.0 )

Fluid Numerics has released upgrades to the Fluid-Slurm-GCP marketplace deployment and the cluster-services CLI toolkit in February 2020. These upgrades came about from cluster configuration schema changes that permit :

  • Specification of multiple compute machines per partition
  • Support for multiple GCP regions and multiple GCP zone
  • User defined compute machine names
  • User defined Slurm accounts, and user-partition alignment through cluster-services
  • Multi-project ready cluster-configuration schema for simplified Orbitera billing platform integrations

Slurm-GCP Issue Collection

If you'd like to submit a bug report feature request, or general feedback/questions, submit a ticket in our fluid-slurm-gcp issue collector and browse our known bugs, issues and feature requests.