Fluid-Slurm-GCP
Fluid-Slurm-GCP is now an integrated component of your Research Computing Cloud or Cluster
What is Fluid-Slurm-GCP?
Fluid Numerics®' Slurm-GCP deployment leverages Google Compute Engine resources and the Slurm job scheduler to execute high performance computing (HPC) and high throughput computing (HTC) workloads. Compute nodes are created on-the-fly to execute jobs using custom compute node images. Slurm automatically removes idle compute nodes to minimize the expense of unused compute resources.
Quick Start
Looking to experiment with operating your own HPC cluster? Our Google Cloud Marketplace is a great place to get started with a click-to-deploy solution. Within 30 minutes, you can be running HPC and HTC applications using all of Google's datacenters worldwide.
HPC Cluster with Terraform
Want to build out more complex infrastructure with a cloud-native HPC cluster and manage your resources using infrastructure-as-code? Use the slurm-gcp terraform modules and examples to deploy and manage your fluid-slurm-gcp cluster with other infrastructure components.
Fully Managed HPC Cluster
Let us help you! Simply let us know what you want to see in a HPC cluster. We will take care of provisioning Cloud Identity accounts, secure IAM policies, networking infrastructure, and your cloud-native HPC cluster. When ready, you'll be able to ssh to your cluster like a traditional HPC system.
Get community support for Fluid-Slurm-GCP and discuss usage, feature request, bugs, and issues
Codelabs to walk through procedures for launching and adminstrating Slurm-GCP
Learn how to quickly launch Fluid Numerics®' Slurm-GCP deployment and submit your first job on the cluster.
Learn more about how to operate and customize Slurm-GCP to fit your needs.
Submit a bug report, feature request, or general feedback/questions.
Fluid-Slurm-GCP updates
April 2022 - 4/15/22 - Fluid-Slurm-GCP deprecation and migration to RCC
Fluid-Slurm-GCP is now an integrated component of your Research Computing Cluster from Fluid Numerics
We recommend the following paths to an updated and supported release:
fluid-slurm-gcp-centos-*-v3* , replace with rcc-centos-7-v300-256bf0b
fluid-slurm-gcp-ubuntu-*-v3* , replace with rcc-ubuntu-2004-v300-1104600
fluid-slurm-gcp-ohpc-* , replace with rcc-centos-7-v300-256bf0b
If you need assistance with your migration from legacy Fluid-Slurm-GCP to an RCC image please reach out to support@fluidnumerics.com for assistance
August 2021 (v3.0.0)
The cluster-config schema has been rebased off of schedmd/slurm-gcp . This was done to smooth the transition between the open-source solution and the supported and licensed fluid-slurm-gcp. The cluster-services CLI has been updated to be consistent with this schema.
Add slurm_qos options to cluster-config and cluster-services support for building QOS for alignment with slurm accounts.
Update spack version (to v0.16.2)
Install GCC 7.5.0, GCC 8.5.0, GCC 9.4.0, GCC 10.2.0, and the Intel OneAPI Compilers v2021.2.0
Install OpenMPI 4.0.5 for each compiler
Update Singularity version (to v3.7.4)
Add support for GVNIC
Added the HPC VM Image Library with the applications listed below tested and readily available.
WRF v4.2
Gromacs v2021.2
OpenFOAM (org) v8
Paraview 5.9.1
FEOTS v2
SELF v1.0.0
April 2021 (v2.6.1)
[Resolve : ROCm Spack builds difficult to use with 3rd party apps] : Move ROCm install to /opt/rocm via yum repositories
[Improve sysctl.conf for large MPI jobs] : Increase net.core.somaxconn, net.ipv4.tcp_max_syn_backlog, and fs.file-max
March 2021 (v2.6.0)
Core package installation transitioned to Spack
GCC 10.2.0 + OpenMPI 4.0.2
ROCm 4.0.0
CUDA 10.0.130
Singularity 3.7.0
Transition from environment modules to lmod
Spack creates the lmod module tree
Implement HPC Best Practices for Google Cloud Platform on compute node images
Resolved Bugs
Singularity pull fails - caused by incorrect path reference to squashfs-tools
September 2020 (v2.5.0)
Ubuntu 19.04 to Ubuntu 20.04
CentOS Kernel upgrade
Nvidia GPU Drivers upgrade
Build and enable Slurm REST API support
July 2020 (v2.4.0)
Slurm 19.05 to Slurm 20.02
Add support for easy CloudSQL integration
GSuite SMTP Email Relay Integration support for email notification on job completion
Terraform modules and examples now publicly available!
(bugfix) Enabled storage.full auth-scope for GCSFuse
April 2020 (v2.3.0)
(feature upgrade) GCP Marketplace solutions now come with read-write access scopes to GCS storage
(bugfix) Resolved issue on compute nodes with hyperthreading disabled causing incorrect core-count configuration in slurm.conf
python/2.7.1 and python/3.8.0 are now available under /apps and through environment modules.
The cluster-services CLI has been updated with the Version 2.3.0 release of fluid-slurm-gcp. Updates include
Updated help documentation
The default_partition item has been added to the cluster-config schema which allows users to specify a default Slurm partition.
--preview flag for all update commands allows you to preview the changes to your cluster prior to actually making the changes
cluster-services add user --name flag removed. Individual users can be added to the default slurm account using cluster-services add user <name>
User's can now obtain template cluster-config blocks using cluster-services sample all/mounts/partitions/slurm_accounts
User provided cluster-configs are now validated against /apps/cls/etc/cluster-config.schema.json
Added cluster-services logging to /apps/cls/log/cluster-services.log
Fixed incorrect core count bug with the partitions[].machines[].enable-hyperthreading flag
Removed add/remove mounts/partitions options; mounts and partitions are now updated by using update all, update mounts, and/or update partitions calls.
add/remove user call only adds or removes a user to the default Slurm account. These calls are strictly convenience calls.
cluster-config schema now specified compute, controller, and login images in compute_image, controller_image, and login_image rather than in the partitions.machines, controller, and login list-objects.
March 2020 (Fluid-Slurm-GCP+Ubuntu)
Fluid Numerics® has released another flavor of fluid-slurm-gcp on GCP Marketplace that is based on the Ubuntu operating system!
In addition to the flexible multi-project/region/zone of "classic" fluid-slurm-gcp, the fluid-slurm-gcp-ubuntu solution includes
Ubuntu 19.10 Operating System
zfs-utils for ZFS filesystem management (but no Lustre kernels)
apt package manager
Environment modules, Spack, and Singularity (same as the classic fluid-slurm-gcp)
March 2020 (Fluid-Slurm-GCP+OpenHPC)
Fluid Numerics® has released another flavor of fluid-slurm-gcp on GCP Marketplace with pre-installed OpenHPC packages.
In addition to the flexible multi-project/region/zone of "classic" fluid-slurm-gcp, the fluid-slurm-gcp+openhpc solution includes
lmod Environment modules
GCC 8.2.0 compilers
MPICH, MVAPICH, and OpenMPI
HPC Profilers/Performance Tuning Toolkits (Score-P, Tau, Scalasca)
Scientific libraries for HPC ( MFEM, PETSc, Trilinos, and much more!)
February 2020 ( v2.0.0 )
Fluid Numerics® has released upgrades to the Fluid-Slurm-GCP marketplace deployment and the cluster-services CLI toolkit in February 2020. These upgrades came about from cluster configuration schema changes that permit :
Specification of multiple compute machines per partition
Support for multiple GCP regions and multiple GCP zone
User defined compute machine names
User defined Slurm accounts, and user-partition alignment through cluster-services
Multi-project ready cluster-configuration schema for simplified Orbitera billing platform integrations
Slurm-GCP Issue Collection
If you'd like to submit a bug report feature request, or general feedback/questions, submit a ticket in our fluid-slurm-gcp issue collector and browse our known bugs, issues and feature requests.