ldmsd_smplrp
LDMSD Sampler Policy
- Date:
2026-04-07
- Manual section:
7
- Manual group:
LDMS
- Version:
LDMS 4.5
SYNOPSIS
smplrp_addname=NAMEpath=SMPLRP_JSON_PATHsmplrp_startname=NAMEsmplrp_stopname=NAMEsmplrp_delname=NAME
DESCRIPTION
An LDMSD Sampler Policy (smplrp) is a configuration object in LMDSD that
manipulates sampler plugin instances according to the configuration provided in
SMPLRP_JSON_PATH. Currently, smplrp supports policies with actions based
on Slurm job_init and job_exit events:
load/term action:
load+config+starta new plugin instance onjob_initevent
termthe plugin instance (with delay) onjob_exiteventinterval modification action:
stop+starta plugin instance with the specified interval onjob_initand/orjob_exitevents.The JSON configuration format for
smplrpis described in the following section.
smplrp JSON Configuration
smplrp JSON file expects to have a dictionary as a root object as follows:
{
"message_tag": "__STR__", # optional
"component_id": "__COMPONENT_ID__",
"job_event_actions": [
{
# action_1; see ACTIONS section below
},
{
# action_2; see ACTIONS section below
},
...
]
}
message_tag is an optional option informing the policy about which
ldms_msg tag it should subscribe to receive slurm job information from
the slurm_notifier. The default value is “slurm”.
component_id value is used to supply component_id=VALUE parameter of the
config command. This is required by most of the sampler plugins. The
specified VALUE can be an environment variable, e.g.
“component_id”: “${COMP_ID}”
job_event_actions is a LIST of action objects. The actions specified in
this list are executed on job_init and job_exit events.
ACTIONS
There are currently two kinds of actions: load_term and interval_mod.
A load_teerm action load + config + start a sampler plugin
instance on job_init event, and term the plugin instance (with delay) on
job_exit event.
An interval_mod action modifies the sampling interval of the target plugin
instance on job_init and job_exit events. For example, one might want to
sample 10Hz when there is a (or multiple) job running, and goes back to 1Hz when
there are no jobs running.
Please see LOAD_TERM ACTION section and INTERVAL_MOD ACTION section for for their respective description and configuration.
LOAD_TERM ACTION
load_term action acts on job_init and job_exit events.
On job_init event, the load_term action does the following:
Issues `
load name=NAMEplugin=PLUGIN` command. The value of NAME and PLUGIN are determined fromload_term["name"]andload_term["plugin"]respectively. Theload_term["name"]is treated as a STRING TEMPLATE (see STRING TEMPLATE section) where “%J” is substributed by Slurm Job ID. To avoid plugin name collision, it is advisable to include “%J” as a part of theload_term["name"].Issues `
config name=NAMEproducer=PRDCRinstance=INST...` command (or commands) to configure the plugin instance from 1) ifloadsucceeded. The NAME, PRDCR and INST values are determined fromload_term["name"],load_term["producer"]andload_term["instance"](STRING TEMPLATE) respectively. The...extra config command attributes are obtained from attribute-values inload_term["config"]object.In the case that the sampler plugin requires multiple
configcommands to work,load_term["configs"](notice the ‘s’) LIST of config object can be specified.REMARK: The values may contain environment variables (e.g.
${HOSTNAME}), and they will be used to constructconfigcommand(s). The config handler logic inldmsd_requesrtwill replace the values accordingly like a config command from configuration file.Issues `
start name=NAMEinterval=INTERVALoffset=OFFSET` command after successful config command(s). The NAME, INTERVAL and OFFSET are obtained fromload_term["name"],load_term["interval"]andload_term["offset"]respectively.
On job_exit event, the load_term action schedules a timeout event
specified by load_term["term_delay"] that will later issue `term
name=NAME` command. If load_term["term_delay"] is not specified, the
term command is issued immediately on job_exit event.
The load_term action configuration format and attributes are described as
follows:
{
"type": "load_term",
"plugin": "__STR__", # [required] plugin name e.g. "meminfo"
"name": "__TEMPLATE__", # [optional] plugin instance name e.g. "foo-%J", see STRING TEMPLATE section
# default: "%N-%L-%J"
"producer": "__STR__", # [optional] e.g. "${HOSTNAME}"
# default: gethostname()
"instance": "__TEMPLATE__", # [optional] LDMS set name e.g. "${HOSTNAME}/foo/%J"
# default: "%P/%N-%L-%J"
"interval": "__TIME_STR__", # [optional] sampling interval e.g. "1s"
# default: "1s"
"offset": "__TIME_STR__", # [optional] sampling e.g. "200ms"
# default: "0s"
"term_delay": "__TIME_STR__", # [optional] plugin termination delay e.g. "10s"
# default: "0s"
"config": { # [optional]
# additional attribute-values for `config` command
"__ATTR0__": "__VALUE0__",
...
# The non-attribute-value config arguments (keywords) can be specifid
# with attribute with null value.
"__KEYWORD0__": null,
...
},
"configs": [
# [optional] Each object in this list will result in a config call. This
# is to support plugins that require multiple config calls.
{
# config_0
"__ATTR_0_0__": "__VALUE_0_0__",
...
"__KEYWORD_0_0__": null,
...
},
{
# config_1
"__ATTR_1_0__": "__VALUE_1_0__",
...
"__KEYWORD_1_0__": null,
...
},
...
]
}
INTERVAL_MOD ACTION
interval_mod action (imod for short) acts on job_init and
job_exit events.
On job_init event of the FIRST tenant, the imod action performs stop +
start with the new interval from imod["job_init"] to the targeted plugin.
If the job is not the first tenant (there are other jobs started before it and
still running), imod ignores the job_init event.
On job_exit event of the LAST tenant, the imod action performs stop +
start with the new interval from imod["job_exit"] to the targeted plugin.
If the job is not the last tenant (there are other jobs still running), imod
ignores the job_exit event.
{
"type": "interval_mod",
"name": "__TEMPLATE__", # [required] identifier of the plugin to modify.
# Also see STRING TEMPLATE section.
"job_init": { # [required] The sampling interval for `job_init` event
"interval": "__TIME_STR__", # [required]
"offset": "__TIME_STR__" # [optional] default "0s"
},
"job_exit": { # [required] The sampling interval for `job_exit` event
"interval": "__TIME_STR__", # [required]
"offset": "__TIME_STR__" # [optional] default "0s"
}
}
STRING TEMPLATE
Some configuration values (e.g. load_term["name"]) supports string template
expansion. The special %<CHAR> placeholders will be replaced as follows:
%J: Slurm Job ID.%C:smplrp["component_id"].%P:load_term["producer"].%L:load_term["plugin"].%N: the policy name (fromsmplrp_addcommand).%W: theWORKFLOW_ID(from job env).%%: literal%.
Example 1
This example shows minimal configuration for both load_term action and
interval_mod action. Attributes that have default values are omitted.
ldmsd.conf file:
msg_enable
smplrp_add name=sp0 path=sp0.json
smplrp_start name=sp0
sp0.json file:
{
"component_id": "${COMPONENT_ID}",
"job_event_actions": [
{
"type": "load_term",
"plugin": "procnetdev2",
},
{
"type": "interval_mod",
"name": "meminfo",
"job_init": { "interval": "0.1s" },
"job_exit": { "interval": "1s" }
}
]
}
To concretely explain what will happen on job_init and job_exit event,
let’s suppose that:
ldmsdis running onnode020.
${COMPONENT_ID}environment variable is set to “20”.
on job_init
When job_init of job_id 10 arrives, the following commands will be
constructued and issued to ldmsd:
load name=sp0-procnetdev2-10 plugin=procnetdev2
config name=sp0-procnetdev2-10 producer=node020 instance=node020/sp0-procnetdev2-10 component_id=20
start name=sp0-procnetdev2-10 interval=1s offset=0
stop name=meminfo
start name=meminfo interval=0.1s offset=0
The first 3 lines are derived from the first action (load_term). The
name default value is %N-%L-%J (policy name - plugin - job_id), which is
expanded to “sp0-procnetdev2-10”. The default value of producer is the value
from gethostname(), which is “node020” in this case. The instance
default value is %P/%N-%L-%J, which is expanded to
“node020/sp0-procnetdev2-10”. The default interval is “1s”, and the default
offset is “0”.
The last 2 lines are derived from the 2nd action (interval_mod) using the
interval from interval_mod["job_init"].
on job_exit
When job_exit of job_id 10 arrives, the following commands are issued to
ldmsd:
stop name=sp0-procnetdev2-10
term name=sp0-procnetdev2-10
stop name=meminfo
start name=meminfo interval=1s offset=0
The first two lines are from the 1st action (load_term). Note that in this
case the plugin instance is immediately terminated because the term_delay is
not set (default: 0).
The last 2 lines are derived from the 2nd action (interval_mod) using the
interval from interval_mod["job_exit"].
Example 2
This is an example that specifies all values.
msg_enable
smplrp_add name=sp0 path=sp0.json
smplrp_start name=sp0
sp0.json file:
{
"message_tag": "myslurm",
"component_id": "${COMPONENT_ID}",
"job_event_actions": [
{
"type": "load_term",
"plugin": "procnetdev2",
"name": "mynetdev-%J",
"producer": "${HOSTNAMEJ}",
"instance": "${HOSTNAME}/mynetdev-%$J",
"interval": "2s",
"offset": "0s",
"term_delay": "10s",
"config": {
"ifaces": "lo,eth0"
}
},
{
"type": "interval_mod",
"name": "vmstat",
"job_init": { "interval": "0.1s", "offset": "0s" },
"job_exit": { "interval": "1s", "offset": "0.1s" }
},
{
"type": "interval_mod",
"name": "procstat",
"job_init": { "interval": "0.1s", "offset": "0s" },
"job_exit": { "interval": "1s", "offset": "0.2s" }
}
]
}
In this example, sp0 sampler policy subscribe for myslurm message tag.
To concretely explain what will happen on job_init and job_exit event,
let’s suppose that:
ldmsdis running onnode020.
${COMPONENT_ID}environment variable is set to “20”.
on job_init
When job_init of job_id 10 arrives, the following commands will be
constructued and issued to ldmsd:
load name=sp0-procnetdev2-10 plugin=procnetdev2
config name=sp0-procnetdev2-10 producer=node020 instance=node020/sp0-procnetdev2-10 component_id=20 ifaces=lo,eth0
start name=sp0-procnetdev2-10 interval=2s offset=0s
stop name=vmstat
start name=vmstat interval=0.1s offset=0s
stop name=procstat
start name=procstat interval=0.1s offset=0s
The first chunk (3 lines) is from action 1 (load_term), the second chunk (2
lines) is from action 2 (interval_mod on vmstat), and the third chunk (2
lines) is from action 3 (interval_mod on procstat).
on job_exit
When job_exit of job_id 10 arrives, the following commands are issued to
ldmsd:
stop name=sp0-procnetdev2-10
stop name=vmstat
start name=vmstat interval=1s offset=0.1s
stop name=procstat
start name=procstat interval=1s offset=0.2s
The first chunk is from action 1 (load_term). This time, sp0 only stops
the the instance because term_delay is set to 10s. The term command
is scheduled to execute 10s in the future.
The second chunk is from action 2 (interval_mod on vmstat), using
interval from job_exit.
The third chunk is from action 3 (interval_mod on procstat), using
interval from job_exit.
After 10 seconds has passed, term name=sp0-procnetdev2-10 is issued to
ldmsd.