ldmsd

Date:

28 Feb 2018

NAME

ldmsd - Start an ldms daemon

SYNOPSIS

ldmsd [OPTION…]

DESCRIPTION

The ldmsd command can be used to start an ldms daemon. Plugin configuration of the ldmsd can be done via the a configuration file or the ldmsd_controller.

Starting ldmsd with the configuration file option enables you to statically configure a sampler without requiring python. Dynamically configuring samplers with ldmsd_controller requires python. Currently, v2’s ldmsctl can still be used to dynamically configure a sampler without requiring python. This capability will be replaced and it is not recommended that you use this option.

ENVIRONMENT

The ldmsd-check-env program will dump currently set environment variables that may influence ldmsd and plugin behavior.

The following environment variables must often be set:

LD_LIBRARY_PATH

Path to ovis/lib and libevent2/lib, if not in a system default path. Depending on the system these may be lib64 instead of lib.

PATH

Include the path to sbin directory containing ldmsd.

The following environment variables may be set to override compiled-in defaults:

ZAP_LIBPATH

Path to ovis/lib/ovis-ldms

LDMSD_PLUGIN_LIBPATH

Path to ovis/lib/ovis-ldms

LDMSD_PIDFILE

Full path name of pidfile overriding the default /var/run/ldmsd.pid unless the command line argument “-r pidfilepath” is present.

LDMSD_LOG_TIME_SEC

If present, log messages are stamped with the epoch time rather than the date string. This is useful when sub-second information is desired or correlating log messages with other epoch-stamped data.

LDMSD_SOCKPATH

Path to the unix domain socket for the ldmsd. Default is created within /var/run. If you must change the default (e.g., not running as root and hence /var/run is not writeable), set this variable (e.g., /tmp/run/ldmsd) or specify “-S socketpath” to ldmsd.

LDMSD_MEM_SZ

The size of memory reserved for metric sets. Set this variable or specify “-m” to ldmsd. See the -m option for further details. If both are specified, the -m option takes precedence over this environment variable.

LDMSD_UPDTR_OFFSET_INCR

The increment to the offset hint in microseconds. This is only for updaters that determine the update interval and offset automatically. For example, the offset hint is 100000 which is 100 millisecond of the second. The updater offset will be 100000 + LDMSD_UPDTR_OFFSET_INCR. The default is 100000 (100 milliseconds).

CRAY Specific Environment variables for ugni transport

ZAP_UGNI_PTAG For XE/XK, the PTag value as given by apstat -P. For XC, The value does not matter but the environment variable must be set.

ZAP_UGNI_COOKIE

For XE/XK, the Cookie value corresponding to the PTag value as given by apstat -P For XC, the Cookie value (not Cookie2) as given by apstat -P

ZAP_UGNI_CQ_DEPTH

Optional value for the CQ depth. Default is 2048.

ZAP_UGNI_STATE_INTERVAL

Optional. If set, then ldmsd will check all nodes’ states via rca interface. States for all nodes are checked and stored at intervals determined by this environment variable. The stored values are checked against before contacting a node. If you choose to use this option, then the rule of thumb is to set ZAP_UGNI_STATE_INTERVAL and ZAP_UGNI_STATE_OFFSET such that the node states are checked before the metric set update occurs (see interval and offset in ldmsd_controller)

ZAP_UGNI_STATE_OFFSET

Optional. Only relevant if ZAP_UGNI_STATE_INTERVAL is set. Defaults to zero. Offset from zero for checking the nodes state (see ZAP_UGNI_STATE_INTERVAL, above).

OPTIONS

General/Configuration Options:

-F

Run in foreground mode; don’t daemonize the program. Default is false.

-B, –banner version-file-mode [0, 1, 2]

When run in daemon mode, controls the existence of the banner file. Mode 0 suppresses the version file. Mode 1 deletes it at daemon exit. Mode >= 2 leaves it in place for debugging after daemon exit. Default mode is 1. The banner contains the software and protocol versions information, which is also logged at the INFO level. The banner file name is always the pidfile name with .version appended.

-c CONFIG_PATH

The path to configuration file (optional, default: <none>). The configuration file contains a batch of ldmsd controlling commands, such as `load` for loading a plugin, and `prdcr_add` for defining a ldmsd producer to aggregate from (see ldmsd_controller(8) for a complete list of commands, or simply run ldmsd_controller then help). The commands in the configuration file are executed sequentially, except for prdcr_start, updtr_start, strgp_start, and failover_start that will be deferred. If failover_start is present, the failover service will start first (among the deferred). Then, upon failover pairing success or failure, the other deferred configuration objects will be started. Please also note that while failover service is in use, prdcr, updtr, and strgp cannot be altered (start, stop, or reconfigure) over in-band configuration. See also REORDERED COMMANDS below.

-m, –set_memory MEMORY_SIZE

MEMORY_SIZE is the maximum size of pre-allocated memory for metric sets. The given size must be less than 1 petabytes. For example, 20M or 20mb are 20 megabytes. The default is adequate for most ldmsd acting in the collector role. For aggregating ldmsd, a rough estimate of preallocated memory needed is (Number of nodes aggregated) x (Number of metric sets per node) x 4k. Data sets containing arrays may require more. The estimate can be checked by enabling DEBUG logging and examining the mm_stat bytes_used+holes value at ldmsd exit.
-n, –daemon_name NAME

The name of the daemon. By default, it is “HOSTNAME:PORT”. The failover feature uses the daemon name to verify the buddy name, and the producer name of kernel metric sets is the daemon name.
-r, –pid_file pid_file

The path to the pid file and prefix of the .version banner file for daemon mode.

-V

Display LDMS version information and then exit.

-u plugin_name

Display the usage for named plugin. Special names all, sampler, and store match all, sampler type, and store type plugins, respectively.

Communication Options:

-x XPRT:PORT:HOST

Specifies the transport type to listen on. May be specified more than once for multiple transports. The XPRT string is one of ‘rdma’, ‘sock’, or ‘ugni’ (CRAY XE/XK/XC). A transport specific port number must be specified following a ‘:’, e.g. rdma:10000. An optional host or address may be specified after the port, e.g. rdma:10000:node1-ib, to listen to a specific address.

The listening transports can also be specified in the configuration file using listen command, e.g. `listen xprt=sock port=1234 host=node1-ib`. Please see ldmsd_controller(8) section LISTEN COMMAND SYNTAX for more details.

-a, –default_auth AUTH

Specify the default LDMS Authentication method for the LDMS connections in this daemon (when the connections do not specify authentication method/domain). Please see ldms_authentication(7) for more information. If this option is not given, the default is “none” (no authentication). Also see ldmsd_controller(8) section AUTHENTICATION COMMAND SYNTAX for how to define an authentication domain.

-A, –default_auth_args NAME=VALUE

Passing the NAME=VALUE option to the LDMS Authentication plugin. This command line option can be given multiple times. Please see ldms_authentication(7) for more information, and consult the plugin manual page for plugin-specific options.

Log Verbosity Options:

-l, –log_file LOGFILE

LOGFILE is the path to the log file for status messages. Default is stdout unless given. The syslog facility is used if LOGFILE is exactly “syslog”. Silence can be obtained by specifying /dev/null for the log file or using command line redirection as illustrated below.
-v, –log_level LOG_LEVEL

LOG_LEVEL can be one of DEBUG, INFO, ERROR, CRITICAL or QUIET. The default level is ERROR. QUIET produces only user-requested output. (Note: this has changed from the previous release where q designated no (QUIET) logging).
-t

Truncate the log file if it already exists.

-L,–log_config <CINT:PATH> | <CINT> | <PATH>

Append configuration replay messages or configuration debugging messages to the log indicated by -l (when PATH is omitted) or to the file named PATH. Bit values of CINT correspond to:
 0: no messages
 1: debug messages from the generic 'request' handler
 2: config history messages in replayable format
 4: query history messages in replayable format
 8: failover debugging messages
16: include delta time prefix when using PATH
32: include epoch timestamp prefix when using PATH

These values may be added together to enable multiple outputs. All messages are logged at the user-requested level, LDMSD_LALL. CINT values 2, 26 and 27 are often interesting. When CINT is omitted, 1 is the default. When PATH is used, the log messages are flushed to as they are generated.

Kernel Metric Options:

-k, –publish_kernel

Publish kernel metrics.

-s, –kernel_set_file SETFILE

Text file containing kernel metric sets to publish. Default: /proc/sys/kldms/set_list

Thread Options:

-P, –worker_threads THR_COUNT

THR_COUNT is the number of event threads to start.

SPECIFYING COMMAND-LINE OPTIONS IN CONFIGURATION FILES

Users can use the ‘option’ command to specify some command-line options in a configuration file.

option <COMMAND-LINE OPTIONS>

Command-line options supported by the ‘option’ command and the corresponding attributes

-a,–default_auth

-A,–default_auth_args

-B,–banner

-k,–publish_kernel

-l,–log_file PATH

-m,–set_memory

-n,–daemon_name

-P,–worker_threads

-r,–pid_file

-s,–kernel_set_path

-v,–log_level

-L,–log_config <CINT[:PATH]>

Specifying the listen endpoints in configuraton files

Users can use the ‘listen’ command to define the listen endpoints. For example,

listen xprt=sock port=411

Example

> cat ldmsd.conf

# cmd-line options
option --log_file /opt/ovis/var/ldmsd.log --log_level ERROR
option -m 2GB -P 16
option -a munge
listen xprt=ugni port=411
# meminfo
load name=meminfo
config name=meminfo producer=nid0001 instance=nid0001/meminfo
start name=meminfo interval=1000000 offset=0

RUNNING LDMSD ON CRAY XE/XK/XC SYSTEMS USING APRUN

ldsmd can be run as either a user or as root using the appropriate PTag and cookie.

Check (or set) the PTag and cookie.

Cray XE/XK Systems:

> apstat -P
PDomainID           Type    Uid   PTag     Cookie
LDMS              system      0     84 0xa9380000
foo               user    22398    243  0x2bb0000

Cray XC Systems:
> apstat -P
PDomainID   Type   Uid     Cookie    Cookie2
LDMS      system     0 0x86b80000          0
foo         user 20596 0x86bb0000 0x86bc0000

Set the environment variables ZAP_UGNI_PTAG and ZAP_UGNI_COOKIE with the appropriate ptag and cookie.

Run ldmsd directly or as part of a script launched from aprun. In either case, Use aprun with the correct -p <ptag> when running.

REORDERED COMMANDS

Certain commands in are reordered when processing input scripts specified with -c. Items related to failover are handled as described in the ‘-c’ section above. Other commands are promoted to run before any non-promoted commands from the loaded script. In particular, env, loglevel, listen, auth, and option are promoted.

NOTES

OCM flags are unsupported at this time.

BUGS

None known.

EXAMPLES

$/tmp/opt/ovis/sbin/ldmsd -x sock:60000 -p unix:/var/run/ldmsd/metric_socket -l /tmp/opt/ovis/logs/1


$/tmp/opt/ovis/sbin/ldmsd -x sock:60000 -p sock:61000 -p unix:/var/runldmsd/metric_socket

SEE ALSO

ldms_authentication(7), ldmsctl(8), ldms_ls(8), ldmsd_controller(8), ldms_quickstart(7)