What is Fluid-Slurm-GCP?
Fluid Numerics®' Slurm-GCP deployment leverages Google Compute Engine resources and the Slurm job scheduler to execute high performance computing (HPC) and high throughput computing (HTC) workloads. Compute nodes are created on-the-fly to execute jobs using custom compute node images. Slurm automatically removes idle compute nodes to minimize the expense of unused compute resources.
Looking to experiment with operating your own HPC cluster? Our Google Cloud Marketplace is a great place to get started with a click-to-deploy solution. Within 30 minutes, you can be running HPC and HTC applications using all of Google's datacenters worldwide.
HPC Cluster with Terraform
Want to build out more complex infrastructure with a cloud-native HPC cluster and manage your resources using infrastructure-as-code? Use the slurm-gcp terraform modules and examples to deploy and manage your fluid-slurm-gcp cluster with other infrastructure components.
Fully Managed HPC Cluster
Let us help you! Simply let us know what you want to see in a HPC cluster. We will take care of provisioning Cloud Identity accounts, secure IAM policies, networking infrastructure, and your cloud-native HPC cluster. When ready, you'll be able to ssh to your cluster like a traditional HPC system.
Learn how to quickly launch Fluid Numerics®' Slurm-GCP deployment and submit your first job on the cluster.
August 2021 (v3.0.0)
April 2021 (v2.6.1)
[Resolve : ROCm Spack builds difficult to use with 3rd party apps] : Move ROCm install to /opt/rocm via yum repositories
[Improve sysctl.conf for large MPI jobs] : Increase net.core.somaxconn, net.ipv4.tcp_max_syn_backlog, and fs.file-max
March 2021 (v2.6.0)
Core package installation transitioned to Spack
GCC 10.2.0 + OpenMPI 4.0.2
Transition from environment modules to lmod
Spack creates the lmod module tree
Implement HPC Best Practices for Google Cloud Platform on compute node images
Singularity pull fails - caused by incorrect path reference to squashfs-tools
September 2020 (v2.5.0)
Ubuntu 19.04 to Ubuntu 20.04
CentOS Kernel upgrade
Nvidia GPU Drivers upgrade
Build and enable Slurm REST API support
July 2020 (v2.4.0)
Slurm 19.05 to Slurm 20.02
Add support for easy CloudSQL integration
GSuite SMTP Email Relay Integration support for email notification on job completion
Terraform modules and examples now publicly available!
(bugfix) Enabled storage.full auth-scope for GCSFuse
April 2020 (v2.3.0)
(feature upgrade) GCP Marketplace solutions now come with read-write access scopes to GCS storage
(bugfix) Resolved issue on compute nodes with hyperthreading disabled causing incorrect core-count configuration in slurm.conf
python/2.7.1 and python/3.8.0 are now available under /apps and through environment modules.
The cluster-services CLI has been updated with the Version 2.3.0 release of fluid-slurm-gcp. Updates include
Updated help documentation
The default_partition item has been added to the cluster-config schema which allows users to specify a default Slurm partition.
--preview flag for all update commands allows you to preview the changes to your cluster prior to actually making the changes
cluster-services add user --name flag removed. Individual users can be added to the default slurm account using cluster-services add user <name>
User's can now obtain template cluster-config blocks using cluster-services sample all/mounts/partitions/slurm_accounts
User provided cluster-configs are now validated against /apps/cls/etc/cluster-config.schema.json
Added cluster-services logging to /apps/cls/log/cluster-services.log
Fixed incorrect core count bug with the partitions.machines.enable-hyperthreading flag
Removed add/remove mounts/partitions options; mounts and partitions are now updated by using update all, update mounts, and/or update partitions calls.
add/remove user call only adds or removes a user to the default Slurm account. These calls are strictly convenience calls.
cluster-config schema now specified compute, controller, and login images in compute_image, controller_image, and login_image rather than in the partitions.machines, controller, and login list-objects.
March 2020 (Fluid-Slurm-GCP+Ubuntu)
Fluid Numerics® has released another flavor of fluid-slurm-gcp on GCP Marketplace that is based on the Ubuntu operating system!
In addition to the flexible multi-project/region/zone of "classic" fluid-slurm-gcp, the fluid-slurm-gcp-ubuntu solution includes
Ubuntu 19.10 Operating System
zfs-utils for ZFS filesystem management (but no Lustre kernels)
apt package manager
Environment modules, Spack, and Singularity (same as the classic fluid-slurm-gcp)
March 2020 (Fluid-Slurm-GCP+OpenHPC)
Fluid Numerics® has released another flavor of fluid-slurm-gcp on GCP Marketplace with pre-installed OpenHPC packages.
In addition to the flexible multi-project/region/zone of "classic" fluid-slurm-gcp, the fluid-slurm-gcp+openhpc solution includes
lmod Environment modules
GCC 8.2.0 compilers
MPICH, MVAPICH, and OpenMPI
February 2020 ( v2.0.0 )
Fluid Numerics® has released upgrades to the Fluid-Slurm-GCP marketplace deployment and the cluster-services CLI toolkit in February 2020. These upgrades came about from cluster configuration schema changes that permit :
Specification of multiple compute machines per partition
Support for multiple GCP regions and multiple GCP zone
User defined compute machine names
User defined Slurm accounts, and user-partition alignment through cluster-services
Multi-project ready cluster-configuration schema for simplified Orbitera billing platform integrations