Operating HPC clusters in Google Cloud Platform opens up many new possibilities and capabilities for your organization. However, "there is no free lunch". With all of these new possibilities, there's more for system administrators and engineers to control. Cluster-services offers an easy to use command line interface to modify compute partitions, Slurm accounting, and network mounted storage.
Through Fluid Numerics' experience in developing custom cloud-HPC solutions, we've uncovered typical operation and maintenance tasks and encapsulated them in cluster-services. The cluster-services command line interface is used to modify compute partitions and to add or remove external filesystems, such as Lustre. Performing these types of operations on slurm-gcp manually requires multiple steps to ensure that the desired changes are achieved. Rather than modifying configuration files or re-deploying, use cluster-services to customize your slurm-gcp deployment.
The Fluid Numerics Slurm-GCP marketplace deployment comes with a command line interface, called cluster-services, for managing your resources after deployment. The cluster-services CLI allows you manage your cluster's partitions and available machines, Slurm accounting, and external filesystem mounts.
The cluster-services CLI has been updated with the Version 2.3.0 release of fluid-slurm-gcp. Updates include
- Updated help documentation
default_partitionitem has been added to the cluster-config schema which allows users to specify a default Slurm partition.
- --preview flag for all update commands allows you to preview the changes to your cluster prior to actually making the changes
cluster-services add user --nameflag removed. Individual users can be added to the default slurm account using
cluster-services add user <name>
- User's can now obtain template cluster-config blocks using
cluster-services sample all/mounts/partitions/slurm_accounts
- User provided cluster-configs are now validated against
- Added cluster-services logging to
- Fixed incorrect core count bug with the
- Removed add/remove mounts/partitions options; mounts and partitions are now updated by using
update mounts, and/or
update partitions calls.
add/remove usercall only adds or removes a user to the default Slurm account. These calls are strictly convenience calls.
- cluster-config schema now specified compute, controller, and login images in
login_image ratherthan in the
To customize your cluster, the following workflow is recommended :
1. Create a configuration file from the current configuration
$ sudo su
[root]# cluster-services list all > config.yaml
2. Edit your config.yaml and validate and preview the changes. Note that all cluster-services update commands validate your config file against the cluster-config schema.
[root]# cluster-services update all --preview --config=config.yaml
3. If the configuration file validates you approve the changes that are previewed, apply the changes :
[root]# cluster-services update all --config=config.yaml