ldmsd

Start an ldms daemon

Date:

28 Feb 2018

Manual section:

8

Manual group:

LDMSD

SYNOPSIS

ldmsd [OPTION…]

DESCRIPTION

The ldmsd command is used to start an instance of an ldmsd server. Configuration of the ldmsd is accomplished statically with a configuration file provided on the command line, or dynamically with the ldmsd_controller or distributed configuration management server Maestro.

ENVIRONMENT

The following environment variables may be set to override compiled-in defaults:

ZAP_LIBPATH

Path to the location of the LDMS transport libraries.

LDMSD_PLUGIN_LIBPATH

Path to the location of the LDMS plugin libraries.

LDMSD_PIDFILE

Full path name of a file overriding the default of /var/run/ldmsd.pid. The command line argument “-r pid-file-path” takes precedence over this value.

LDMSD_LOG_TIME_SEC

If present, log messages are stamped with the epoch time rather than the date string. This is useful when sub-second information is desired or correlating log messages with other epoch-stamped data.

LDMSD_MEM_SZ

The size of memory reserved for metric sets. Set this variable or specify “-m” to ldmsd. See the -m option for further details. If both are specified, the -m option takes precedence over this environment variable.

LDMS_DELETE_TIMEOUT

The timeout period (in seconds) before ldmsd forcibly frees memory of deleted metric sets when network problems prevent normal cleanup. Under normal conditions, metric set memory is freed automatically and this timeout is not used. If network issues prevent proper cleanup, deleted sets will remain in memory and can be observed using the “set_stats” command. Consider increasing this value if network problems are temporary and you want to allow more time for normal cleanup to complete. If not set, defaults to 60 seconds.

LDMSD_UPDTR_OFFSET_INCR

The increment to the offset hint in microseconds for updaters that determine the update interval and offset automatically. For example, if the offset hint is 100000, the updater offset will be 100000 + LDMSD_UPDTR_OFFSET_INCR. The default is 100000 (100 milliseconds).

LDMSD_LOG_DATE_TIME

When set, this environment variable tells ldmsd to prefix each line of output with a timestamp in human readable form.

LDMSD_LOG_TIME_SEC

When set, this environment variable tells ldmsd to prefix each line of output with a timestamp in the form of seconds-since-epoch.

CRAY Specific Environment variables for ugni transport

ZAP_UGNI_PTAG

For XE/XK, the PTag value as given by apstat -P. For XC, The value does not matter but the environment variable must be set.

ZAP_UGNI_COOKIE

For XE/XK, the Cookie value corresponding to the PTag value as given by apstat -P For XC, the Cookie value (not Cookie2) as given by apstat -P

ZAP_UGNI_CQ_DEPTH

Optional value for the CQ depth. The default is 2048.

ZAP_UGNI_STATE_INTERVAL

Optional. If set, then ldmsd will check all nodes’ states via rca interface. States for all nodes are checked and stored at intervals determined by this environment variable. The stored values are checked against before contacting a node. If you choose to use this option, then the rule of thumb is to set ZAP_UGNI_STATE_INTERVAL and ZAP_UGNI_STATE_OFFSET such that the node states are checked before the metric set update occurs (see interval and offset in ldmsd_controller)

ZAP_UGNI_STATE_OFFSET

Optional. Only relevant if ZAP_UGNI_STATE_INTERVAL is set. Defaults to zero. Offset from zero for checking the nodes state (see ZAP_UGNI_STATE_INTERVAL, above).

OPTIONS

General/Configuration Options:

-c CONFIG_PATH

The path to configuration file (optional, default: <none>). The configuration file contains a batch of ldmsd controlling commands, such as `load` for loading a plugin, and `prdcr_add` for defining a ldmsd producer to aggregate from (see ldmsd_controller(8) for a complete list of commands, or simply run ldmsd_controller then help). The commands in the configuration file are executed sequentially, except for prdcr_start, updtr_start, strgp_start, and failover_start that will be deferred. If failover_start is present, the failover service will start first (among the deferred). Then, upon failover pairing success or failure, the other deferred configuration objects will be started. Please also note that while failover service is in use, prdcr, updtr, and strgp cannot be altered (start, stop, or reconfigure) over in-band configuration. See also REORDERED COMMANDS below.

-y CONFIG_PATH

The path to a YAML configuration file (optional, default: <none>). The YAML configuration file contains a description of an entire cluster of LDMS daemons. Please see “man ldmsd_yaml_parser” for more information regarding the YAML configuration file.

-m, –set_memory MEMORY_SIZE

MEMORY_SIZE is the maximum size of pre-allocated memory for metric sets. The given size must be less than 1 petabytes. For example, 20M or 20mb are 20 megabytes. The default is adequate for most ldmsd acting in the collector role. For aggregating ldmsd, a rough estimate of preallocated memory needed is (Number of nodes aggregated) x (Number of metric sets per node) x 4k. Data sets containing arrays may require more. The estimate can be checked by enabling DEBUG logging and examining the mm_stat bytes_used+holes value at ldmsd exit.
-n, –daemon_name NAME

The name of the LDMS daemon. By default it is “HOSTNAME:PORT”. When configuring a LDMSD with a YAML configuration file, the “daemon_name” identifies a daemon defined in the configuration file. For more information about YAML configuration files, please see “man ldmsd_yaml_parser”.
-r, –pid_file pid_file

The path to the pid file and prefix of the .version banner file

-V

Display LDMS version information and then exit.

-u plugin_name

Display the usage for named plugin. Special names all, sampler, and store match all, sampler type, and store type plugins, respectively.

Communication Options:

-x XPRT:PORT:HOST

Specifies the transport type to listen on. May be specified more than once for multiple transports. The XPRT string is one of ‘rdma’, ‘sock’, or ‘ugni’ (CRAY XE/XK/XC). A transport specific port number must be specified following a ‘:’, e.g. rdma:10000. An optional host or address may be specified after the port, e.g. rdma:10000:node1-ib, to listen to a specific address.

The listening transports can also be specified in the configuration file using listen command, e.g. `listen xprt=sock port=1234 host=node1-ib`. Please see ldmsd_controller(8) section LISTEN COMMAND SYNTAX for more details.

-a, –default_auth AUTH

Specify the default LDMS Authentication method for the LDMS connections in this process (when the connections do not specify authentication method/domain). Please see ldms_authentication(7) for more information. If this option is not given, the default is “none” (no authentication). Also see ldmsd_controller(8) section AUTHENTICATION COMMAND SYNTAX for how to define an authentication domain.

-A, –default_auth_args NAME=VALUE

Passing the NAME=VALUE option to the LDMS Authentication plugin. This command line option can be given multiple times. Please see ldms_authentication(7) for more information, and consult the plugin manual page for plugin-specific options.

Log Verbosity Options:

-l, –log_file LOGFILE

LOGFILE is the path to the log file for status messages. Default is stdout unless given. The syslog facility is used if LOGFILE is exactly “syslog”. Silence can be obtained by specifying /dev/null for the log file or using command line redirection as illustrated below.
-v, –log_level LOG_LEVEL

LOG_LEVEL can be one of DEBUG, INFO, WARN, ERROR, CRITICAL or QUIET. The default level is ERROR. QUIET produces only user-requested output.
-L,–log_config <CINT:PATH> | <CINT> | <PATH>

Append configuration replay messages or configuration debugging messages to the log indicated by -l (when PATH is omitted) or to the file named PATH. Bit values of CINT correspond to:
 0: no messages
 1: debug messages from the generic 'request' handler
 2: config history messages in replayable format
 4: query history messages in replayable format
 8: failover debugging messages
16: include delta time prefix when using PATH
32: include epoch timestamp prefix when using PATH

These values may be added together to enable multiple outputs. All messages are logged at the user-requested level, LDMSD_LALL. CINT values 2, 26 and 27 are often interesting. When CINT is omitted, 1 is the default. When PATH is used, the log messages are flushed to as they are generated.

SPECIFYING COMMAND-LINE OPTIONS IN CONFIGURATION FILES

While command-line options are useful for quick configuration, complex setups or repeated deployments benefit from configuration files. These files provide a centralized location to define all initial settings for an LDMS daemon, promoting readability, maintainability, and easy sharing across deployments.

Configuration commands equivalent to command-line options can be used in configuration files as an alternative approach to specifiying the initial state of an LDMS daemon. For a complete list of these commands and detailed information about configuration file syntax, environment variables, and command processing order, please see ldmsd_config_files(8).

Specifying the listen endpoints in configuraton files

Users can use the ‘listen’ command to define the listen endpoints. For example,

listen xprt=sock port=411

Example

> cat ldmsd.conf

# cmd-line options
log_file path=/opt/ovis/var/ldmsd.log
log_level level=ERROR
set_memory size=2GB
worker_threads num=16
default_auth plugin=munge
listen xprt=ugni port=411
# meminfo
load name=meminfo
config name=meminfo producer=nid0001 instance=nid0001/meminfo
start name=meminfo interval=1000000 offset=0

RUNNING LDMSD ON CRAY XE/XK/XC SYSTEMS USING APRUN

ldsmd can be run as either a user or as root using the appropriate PTag and cookie.

Check (or set) the PTag and cookie.

Cray XE/XK Systems:

> apstat -P
PDomainID           Type    Uid   PTag     Cookie
LDMS              system      0     84 0xa9380000
foo               user    22398    243  0x2bb0000

Cray XC Systems:
> apstat -P
PDomainID   Type   Uid     Cookie    Cookie2
LDMS      system     0 0x86b80000          0
foo         user 20596 0x86bb0000 0x86bc0000

Set the environment variables ZAP_UGNI_PTAG and ZAP_UGNI_COOKIE with the appropriate ptag and cookie.

Run ldmsd directly or as part of a script launched from aprun. In either case, Use aprun with the correct -p <ptag> when running.

REORDERED COMMANDS

Certain commands in are reordered when processing input scripts specified with -c or -y. Items related to failover are handled as described in the ‘-c’ and ‘-y’ sections above. Other commands are promoted to run before any non-promoted commands from the loaded script. In particular, env, loglevel, listen, auth, and option are promoted.

NOTES

OCM flags are unsupported at this time.

BUGS

None known.

EXAMPLES

$/tmp/opt/ovis/sbin/ldmsd -x sock:60000 -p unix:/var/run/ldmsd/metric_socket -l /tmp/opt/ovis/logs/1


$/tmp/opt/ovis/sbin/ldmsd -x sock:60000 -p sock:61000 -p unix:/var/runldmsd/metric_socket

SEE ALSO

ldms_authentication(7), ldmsctl(8), ldms_ls(8), ldmsd_controller(8), ldms_quickstart(7)