Logo

Introduction To LDMS

  • LDMS Quick Start

LDMS Streams

  • Streams-enabled Application Data Collectors

ASF

  • AppSysFusion

Deployment

  • Deployment

Man Pages

  • LDMS
  • LDMSD
  • Plugins
    • store
    • sampler
      • app_sampler
      • linux_proc_sampler
      • aries_linkstatus
      • aries_mmr
      • aries_mmr_configurable
      • blob_msg_writer
      • blob_stream_writer
      • ldms-csv-from-blobs
      • clock
      • coretemp
      • cray_dvs_sampler
      • cray_system_sampler_variants
      • cxi_sampler
      • dcgm_sampler
      • dstat
      • ldms_dstat_schema_name
      • all_example
      • array_example
      • synthetic
      • file_importer
      • filesingle
      • ldms-sensors-config
      • fptrans
      • ldms_msg_sampler
      • hello_sampler
      • hweventpapi
      • ibmad_records_sampler
      • ibmad_sampler
      • ldms-ibnet-sampler-gen
      • ibnet
      • ldms_ibnet_schema_name
      • json_stream_sampler
      • kgnilnd
      • jobid
      • ldms_sampler_base
      • edac
      • lnet_stats
      • loadavg
      • lustre2_client
      • lustre_mdc
      • lustre_client
      • lustre_mdt
      • lustre_ost
      • meminfo
      • msr_interlagos
      • netlink-notifier
        • Transmit Linux kernel netlink process life messages to ldmsd streams.
          • SYNOPSIS
          • DESCRIPTION
          • OPTIONS
          • ENVIRONMENT
          • FILES
          • JOB ID FILES
          • MESSAGE FORMATS
          • NOTES
          • EXAMPLES
          • SEE ALSO
      • opa2
      • papi_sampler
      • For perfevent2 example
      • perfevent
      • perfevent2
      • procdiskstats
      • procinterrupts
      • procnet
      • procnetdev
      • procnetdev2
      • procnfs
      • procstat
      • procstat2
      • rapl
      • rdc_sampler
      • ldms_rdc_schema_name
      • sampler_atasmart
      • shm_sampler
      • slingshot_info
      • slingshot_metrics
      • slurm_sampler
      • slurm_notifier
      • sysclassib
      • syspapi_sampler
      • tx2mon
      • variable
      • vmstat
      • zfs_leafvdevs
      • zfs_topvdevs
      • zfs_zpool
  • Contributing
  • Third-plugins

Contributing to LDMS

  • How to write an LDMS Sampler Plugin
  • How to write an LDMS Store Plugin
  • Documentation For RST Generation
LDMS
  • LDMS
  • netlink-notifier
  • View page source

netlink-notifier

Transmit Linux kernel netlink process life messages to ldmsd streams.

Date:

25 June 2021

Manual section:

8

Manual group:

LDMS sampler

ldms-notify - systemd service

SYNOPSIS

ldms-netlink-notifier [OPTION…]

DESCRIPTION

The netlink-notifier generates JSON message for ldmsd and JSON aware LDMS samplers. Its messages are mostly compatible with those from the slurm spank based notifier.

OPTIONS

-c   use task comm field for process name.
-d   strip off directory path from process name.
-D   specify run duration in seconds. If unspecified, run forever.
-e   select which events to monitor.
-E   equivalent to -e all.
-g   show glyphs for event types in debug mode.
-h   show this help.
-i seconds    time (float) to sleep between checks for processes exceeding the short dir filter time.
              If the -i value > the -m value, -i may effectively filter out additional processes.
-j file       file to log json messages and transmission status.
-J file       file to dump json messages format examples to.
-l   force stdout line buffering.
-L file      log to file instead of stdout.
-r   run with real time FIFO scheduler (available on some kernels).
-s   show short process name in debugging.
-S   suppress stream message publication.
-t   show debugging trace messages.
-u umin      ignore processes with uid < umin
-v lvl  log level for stream library messages. Higher is quieter. Error messages are >= 3.
-q   run quietly
-x   show extra process information.
-X   equivalent to -Egrx.
The ldmsd connection and commonly uninteresting or short-lived processes may be specified with the options or environment variables below.
The 'short' options do not override the exclude entirely options.
--exclude-programs[=]<path>   change the default value of exclude-programs
      When repeated, all values are concatenated.
      If given with no value, the default (nullexe):<unknown> is removed.
      If not given, the default is used unless
      the environment variable NOTIFIER_EXCLUDE_PROGRAMS is set.
--exclude-dir-path[=]<path>   change the default value of exclude-dir-path
      When repeated, all values are concatenated.
      If given with no value, the default /sbin is removed.
      If not given, the default is used unless
      the environment variable NOTIFIER_EXCLUDE_DIR_PATH is set.
--exclude-short-path[=]<path>         change the default value of exclude-short-path
      When repeated, all values are concatenated.
      If given with no value, the default /bin:/usr is removed.
      If not given, the default is used unless
      the environment variable NOTIFIER_EXCLUDE_SHORT_PATH is set.
--exclude-short-time[=][val]  change the default value of exclude-short-time.
      If repeated, the last value given wins.
      If given with no value, the default 1 becomes 0 unless
      the environment variable NOTIFIER_EXCLUDE_SHORT_TIME is set.
--stream[=]<val>      change the default value of stream.
      If repeated, the last value given wins.
      The default slurm is used if env NOTIFIER_LDMS_STREAM is not set.
--xprt[=]<val>        change the default value of xprt.
      If repeated, the last value given wins.
      The default sock is used if env NOTIFIER_LDMS_XPRT is not set.
--host[=]<val>        change the default value of host.
      If repeated, the last value given wins.
      The default localhost is used if env NOTIFIER_LDMS_HOST is not set.
--port[=]<val>        change the default value of port.
      If repeated, the last value given wins.
      The default 411 is used if env NOTIFIER_LDMS_PORT is not set.
--auth[=]<val>        change the default value of auth.
      If repeated, the last value given wins.
      The default munge is used if env NOTIFIER_LDMS_AUTH is not set.
--reconnect[=]<val>   change the default value of reconnect.
      If repeated, the last value given wins.
      The default 600 is used if env NOTIFIER_LDMS_RECONNECT is not set.
--timeout[=]<val>     change the default value of timeout.
      If repeated, the last value given wins.
      The default 1 is used if env NOTIFIER_LDMS_TIMEOUT is not set.
--track-dir[=]<path>     change the pids published directory.
      The default is used if env NOTIFIER_TRACK_DIR is not set.
      The path given should be on a RAM-based file system for efficiency,
      and it should not contain any files except those created by
      this daemon. When enabled, track-dir will be populated even if
      -S is used to suppress the stream output.
--purge-track-dir    if track-dir is set, purge any files there
      which do not correspond to current processes.
      Equivalently, NOTIFIER_PURGE_TRACK_DIR may be set.
--component_id=<U64>     set the value of component_id.
      If not set, the component_id field is not included in the stream formats produced.
--ProducerName=<name>    set the value of ProducerName
      If not set, the ProducerName field is not included in the stream formats produced.
--format=N           change the format of messages to version N.
         If not set, the highest available format is used. See MESSAGE FORMATS.
--jobid-file=FILE    look for job_id numbers in FILE. The default is not to look
     for a job id file if this option is not given nor NOTIFIER_JOBID_FILE is defined.
     See JOB ID FILES for details.

ENVIRONMENT

The following variables override defaults if a command line option is not present, as described in the options section.

NOTIFIER_EXCLUDE_PROGRAMS="(nullexe):<unknown>"
NOTIFIER_EXCLUDE_DIRS=/sbin
NOTIFIER_EXCLUDE_SHORT_PATH=/bin:/usr
NOTIFIER_EXCLUDE_SHORT_TIME=1
NOTIFIER_TRACK_DIR=/var/run/ldms-netlink-tracked
NOTIFIER_LDMS_RECONNECT=600
NOTIFIER_LDMS_TIMEOUT=1
NOTIFIER_LDMS_STREAM=slurm
NOTIFIER_LDMS_XPRT=sock
NOTIFIER_LDMS_HOST=localhost
NOTIFIER_LDMS_PORT=411
NOTIFIER_LDMS_AUTH=munge
NOTIFIER_FORMAT=3
NOTIFIER_HEARTBEAT=(none)
NOTIFIER_PURGE_TRACK_DIR
NOTIFIER_JOBID_FILE=(none)

Omitting (nullexe):<unknown> from NOTIFIER_EXCLUDE_PROGRAMS may cause incomplete output related to processes no longer present. In exotic circumstances, this may be desirable. The value of NOTIFIER_PURGE_TRACK_DIR is not used to enable purge, just its presence.

FILES

Users or other processes may discover which processes are the subject of notifications by examining the files in

/NOTIFIER_TRACK_DIR/*

For each pid started event which would be emitted to an LDMS stream, a temporary file with the name of the pid is created in NOTIFIER_TRACK_DIR. The file will contain the json event attempted. The temporary file will be removed when the corresponding pid stopped event is sent. These files are not removed when the notifier daemon exits, so that they will be found after a restart. Client applications may validate a file by checking the contents against the /proc/$pid/stat content, if it exists. Invalid files should be removed by clients or system scripts; the purge option is provided to optionally do this on start.

JOB ID FILES

The job id file given must contain a list of KEY=VALUE pairs, one per line. Lines starting with # are ignored. If the filename given is /search, a list of default locations is checked (/var/run/ldms_jobinfo.data, /var/run/ldms.slurm.jobinfo, /var/run/ldms.jobinfo). A list of variables in the jobid file is checked for, with the first found being used. The variable names checked are: JOBID, JOB_ID, LSB_JOBID, PBS_JOBID, SLURM_JOBID, SLURM_JOB_ID.

MESSAGE FORMATS

Message formats tuned to SLURM, LSF, and Linux without a batch scheduler are published, based on what the notifier detects and the users choice of ProducerName and component_id. The version of the tuned formats is specified by number. If started with the -J option, an example of each available message format it dumped to the specified file.

Format 0 omits the start time from slurm process end messages (since it is only sometimes known) and omits process duration, which depend on the start time.

Format 1 includes the start time for slurm process or the dummy value 0 when unknown) and includes process duration for all end messages. When the start time is unavailable, duration of -1.0 is published. Merging data from other sources may allow durations flagged as -1 to be computed in some later data cleanup step.

Format 2 extends process end messages with the executable name in field exe. When this is not available, exe of /no-exe-data is published. Merging data from other sources may allow exe flagged as /no-exe-data to be computed in some later data cleanup step.

Format 3 harmonizes schemas across linux, slurm, and lsf task types so that all may be stored in common tables for task_exit and task_init events if slurm specific fields are omitted from the storage.

NOTES

The core of this utility is derived from forkstat(8).

The output of this utility, if used to drive a sampler, usually needs to be consumed on the same node.

If not used with a sampler, the --component_id or --ProducerName options are needed to add a node identifier to the messages. Normally a process-following sampler that creates sets will add the node identifier automatically.

When the daemon is started after a process is started, the process start time and therefore process duration may not be available. Similarly exe may not be available. In message formats which report start time, 0 indicates data was unavailable. For processes without completely known time bounds, the duration is reported as -1.0. For processes without known program paths, exe is reported as /no-exe-data.

Several options affect only the trace output.

The check for sufficient privilege occurs after -J and --help options are processed.

EXAMPLES

To run for 30 seconds with screen and json.log test output connecting to the ldmsd from ‘ldms-static-test.sh blobwriter’ test:

netlink-notifier -t -D 30 -g -u 1 -x  -e exec,clone,exit  \
     -j json.log --exclude-dir-path=/bin:/sbin:/usr \
     --port=61061 --auth=none --reconnect=1

To run in a typical deployment (sock, munge, port 411, localhost, forever, 10 minute reconnect):

netlink-notifier

Run in a systemd .service wrapper, excluding root owned processes.

EnvironmentFile=-/etc/sysconfig/ldms-netlink-notifier.conf
ExecStart=/usr/sbin/ldms-netlink-notifier -u 1 -x -e exec,clone,exit

Run in a systemd .service wrapper, excluding root owned processes, with debugging files

EnvironmentFile=-/etc/sysconfig/ldms-netlink-notifier.conf
ExecStart=/usr/sbin/ldms-netlink-notifier -u 1 -x -e exec,clone,exit -j /home/user/nl.json -L /home/user/nl.log -t --ProducerName=%H

SEE ALSO

forkstat(8), ldmsd(8), ldms-static-test(8)

Previous Next

© Copyright 2025, Sandia National Laboratories and Open Grid Computing, Inc..

Built with Sphinx using a theme provided by Read the Docs.