Common Errors

Invalid Authentication Credential

Problem

If you reboot your fluid-slurm-gcp cluster, you may run into issues where Slurm commands no longer work from the login node. Most often this is associated with system clocks on the controller and login node being out of sync when the munge service starts on each node. In this event, running commands like sinfo will fail with errors that look like :

sinfo: error: If munged is up, restart with --num-threads=10

sinfo: error: Munge encode failed: Failed to access "/var/run/munge/munge.socket.2": No such file or directory

sinfo: error: slurm_send_node_msg: authentication: Invalid authentication credential

Solution

To solve this problem,

  1. SSH into your cluster's controller

  2. Restart the munge service
    sudo service munge restart

If you continue to experience issues, reach out to fluid-slurm-gcp@fluidnumerics.com for community support.