slurm_notifier
Man page for the SPANK slurm_notifier plugin
- Date:
30 Sep 2019
- Manual section:
7
- Manual group:
LDMS sampler
SYNOPSIS
Within plugstack.conf: required OVIS_PREFIX/LIBDIR/ovis-ldms/libslurm_notifier.so stream=STREAM_NAME timeout=TIMEOUT_SEC [user_debug] client=XPRT:HOST:PORT:AUTH
…
DESCRIPTION
slurm_notifier is a SPANK plugin that notifies ldmsd about job events (e.g. job start, job termination) and related information (e.g. job_id, task_id, task process ID). The notification is done over ldmsd_stream publish mechanism. See SUBSCRIBERS below for plugins known to consume the spank plugin messages.
stream=STREAM_NAME specifies the name of publishing stream. The default value is slurm.
timeout=TIMEOUT_SEC is the number of seconds determining the time-out of the LDMS connections (default 5).
user_debug, if present, enables sending certain plugin management debugging messages to the user’s slurm output. (default: disabled – slurm_debug2() receives the messages instead).
client=XPRT:HOST:PORT:AUTH specifies ldmsd to which slurm_notifier publishes the data. The XPRT specifies the type of the transport, which includes sock, rdma, ugni, and fabric. The HOST is the hostname or the IP address that ldmsd resides. The PORT is the listening port of the ldmsd. The AUTH is the LDMS authentication method that the ldmsd uses, which are munge, or none. The client option can be repeated to specify multiple ldmsd’s.
WORKFLOW_ID
The WORKFLOW_ID environment variable from users’ job submission environment
is used to identify jobs that belong to the same “workflow”. The WORKFLOW_ID
is now a part of job evet messages (job_init, step_init, task_init,
task_exit, step_exit and job_exit), and the value can be accessed by
msg["data"]["workflow_id"]. Please note that libslurm_notifier.so needs
to be in /etc/slurm/plugstack.conf on the job-submission node (e.g. head
node) so that sbatch/salloc can pass WORKFLOW_ID to slurmctld
(and subsequently slurmd).
Example job submission:
$ export WORKFLOW_ID=mywork_01
$ sbatch myjob.sh
SUBSCRIBERS
The following plugins are known to process slurm_notifier messages:
slurm_sampler (collects slurm job & task data)
slurm_sampler2 (collects slurm job & task data)
papi_sampler (collects PAPI data from tasks identified)
linux_proc_sampler (collects /proc data from tasks identified)
EXAMPLES
/etc/slurm/plugstack.conf:
required /opt/ovis/lib64/ovis-ldms/libslurm_notifier.so stream=slurm timeout=5 client=sock:localhost:10000:munge client=sock:node0:10000:munge
SEE ALSO
spank(8), slurm_sampler(7), papi_sampler(7), linux_proc_sampler(7), ldmsd(8), ldms_quickstart(7),