Fluid Numerics' Slurm-GCP (fluid-slurm-gcp) solution consists of three main instance types
All three instance types use images provided by Fluid Numerics that are licensed to users according to the Fluid Numerics EULA. Each instance type incurs GCP resources costs in addition to image usage fees at a rate of $0.01/vCPU/hour and $0.09/GPU/hour. Fluid Numerics charges this fee so that we can continue to provide new useful features, upgrades and updates, and bug fixes in addition contributions to the open-source cloud-HPC ecosystem.
This document will help you better understand the costs associated with the Fluid Numerics Slurm-GCP solution. As a disclaimer, pricing examples are purely illustrative and your actual costs may differ from estimates you arrive at from following the ideas discussed in this article.
Google Compute Engine (GCE) instances are billed per second and the billing rate depends on the number of virtual cores, the CPU platform (e.g. n1, n2, e2), the amount of memory, the boot disk type, and the boot disk size. To illustrate how to estimate costs for your system, I'll walk through an example deployment with the following characteristics
Controller : n1-standard-16 with 50 GB pd-standard boot disk
Login node : n1-standard-16 with 20 GB pd-standard boot disk
Ephemeral Compute node : n1-highcpu-8 + 1x Nvidia Tesla V100 GPU with 20 GB pd-standard boot disk and 2 local SSD
In this example, I'll further assume that the controller and login nodes are active 24/7. We have no static compute nodes in this case. Thus, compute nodes only incur costs during job execution, with minor overhead. Because of this, I'll provide monthly cost estimates for the controller and login nodes, and hourly estimates for the compute nodes. The compute node estimate will be developed without assuming any sustained use discounts.
In this example, the controller is an n1-standard-16 instance with 50 GB pd-standard disk that is operated 24/7. I've estimated that this instance costs $390.36/month ($0.535/hour) using the GCP pricing calculator. Note that this rate assumes a sustained use discount of 30% that we earn for operating this system 24/7.
Fluid Numerics usage fee for this instance is $116.80/month ($0.16/hour=$0.01/vCPU/hour x 16 vCPU). This gives the total monthly cost of $507.16/month
In this example, the login node is an n1-standard-16 instance with 20 GB pd-standard disk that is operated 24/7. I've estimated that this instance costs $389.16 ($0.533/hour) using the GCP pricing calculator. As with the controller, this rate assumes a sustained use discount of 30% that we earn for operating this system 24/7.
Fluid Numerics usage fee for this instance is $116.80/month ($0.16/hour=$0.01/vCPU/hour x 16 vCPU). This gives the total monthly cost of $505.96/month
Compute ($2.188/node/hour - $3.017/node/hour)
In this example, each ephemeral compute node is an n1-highcpu-8 instance with an Nvidia Tesla V100 GPU accelerator, 20 GB pd-standard boot disk, and 2 local SSD's. Actual costs for the ephemeral compute nodes is highly dependent on your actual usage, but I'll show you how we can develop a range of expected per-node-hour costs and monthly costs.
I'll first develop a per-node-hour rate, under the assumption of 24/7 usage. At this rate of usage, each compute node is estimated to cost $1,472.90/node/month ($2.018/node/hour). This rate includes a 30% sustained use discount.
If usage were only 1 hour/day, the per-node-month cost drops to $86.583/node/month ($2.847/node/hour) . I need to point out that the number I'm showing here is different than the total from the GCP pricing calculator. This is because the calculator assumes that the 20 GB standard disk is persisted 24/7. To arrive at the cost estimate shown here, I take the component cost for the n1-highcpu-8+Nvidia Tesla V100 + 2x Local SSD ( $86.55/month ) and add the quoted disk cost multiplied by 30.417/730. By dividing the disk cost by 730, I obtain an estimated hourly cost for the disk usage. By multiplying by 30.417 I am obtaining the accrued cost for the boot disk that is active only when the instance is active.
Fluid Numerics usage fee for this instance is $0.17/node/hour ( = $0.01/vCPU/hour x 8 vCPU/node + $0.09/GPU/hour x 1 GPU/node ). This gives a range of per-node-hour costs of $2.188/node/hour - $3.017/node/hour.
Planning significant usage ?
Many of us in HPC know what "power-users" are. These folks tend to run large jobs ( 100's-10,000's CPUs / 10's-1000's GPUs ) for extended periods of time regularly for their work. If you see your team as power-users in HPC, reach out to our team at email@example.com to discuss special licensing discounts, in addition to services, including capacity planning and performance tuning/optimization for Google Cloud Platform to help optimize your computing costs