Plugin_app_sampler
- Date:
30 Sep 2019
NAME
ldmsd_app_sampler - LDMSD app_sampler plugin
SYNOPSIS
config name=app_sampler producer=PRODUCER instance=INSTANCE [ schema=SCHEMA ] [ component_id=COMPONENT_ID ] [ stream=STREAM_NAME ] [ metrics=METRICS ] [ cfg_file=PATH ]
DESCRIPTION
``app_sampler`` collects metrics from ``/proc/<PID>`` according to current SLURM jobs/tasks running on the system. ``app_sampler`` depends on ``slurm_notifier`` SPANK plugin to send SLURM job/task events over ``ldmsd_stream`` (``stream`` option, default: slurm). A set is created per task when the task started in the following format: ``PRODUCER_NAME/JOB_ID/TASK_PID``. The set is deleted when the task exited.
By default ``app_sampler`` sampling all available metrics (see ``LIST OF METRICS`` section). Users may down-select the list of metrics to monitor by specifying ``metrics`` option (comma-separated string) or writing a JSON configuration file and specifying ``cfg_file`` option (see ``EXAMPLES`` section).
CONFIG OPTIONS
- name
Must be app_sampler.
- producer
The name of the data producer (e.g. hostname).
- instance
This is required by sampler_base but is not used by app_sampler. So, this can be any string but must be present.
- schema
The optional schema name (default: app_sampler).
- component_id
An integer identifying the component (default: 0).
- stream
The name of the ``ldmsd_stream`` to listen for SLURM job events. (default: slurm).
- metrics
The comma-separated list of metrics to monitor. The default is ‘’ (empty), which is equivalent to monitor ALL metrics.
- cfg_file
The alternative config file in JSON format. The file is expected to have an object that may contain the following attributes:
{ 'stream': 'STREAM_NAME' 'metrics': [ METRICS ] }
The default values are assumed for the attributes that are not specified. Attributes other than ‘stream’ and ‘metrics’ are ignored.
If the ``cfg_file`` is given, ``stream`` and ``metrics`` options are ignored.
LIST OF METRICS
/* from /proc/[pid]/cmdline */ cmdline_len, cmdline, /* the number of open files */ n_open_files, /* from /proc/[pid]/io */ io_read_b, io_write_b, io_n_read, io_n_write, io_read_dev_b, io_write_dev_b, io_write_cancelled_b, /* /proc/[pid]/oom_score */ oom_score, /* /proc/[pid]/oom_score_adj */ oom_score_adj, /* path of /proc/[pid]/root */ root, /* /proc/[pid]/stat */ stat_pid, stat_comm, stat_state, stat_ppid, stat_pgrp, stat_session, stat_tty_nr, stat_tpgid, stat_flags, stat_minflt, stat_cminflt, stat_majflt, stat_cmajflt, stat_utime, stat_stime, stat_cutime, stat_cstime, stat_priority, stat_nice, stat_num_threads, stat_itrealvalue, stat_starttime, stat_vsize, stat_rss, stat_rsslim, stat_startcode, stat_endcode, stat_startstack, stat_kstkesp, stat_kstkeip, stat_signal, stat_blocked, stat_sigignore, stat_sigcatch, stat_wchan, stat_nswap, stat_cnswap, stat_exit_signal, stat_processor, stat_rt_priority, stat_policy, stat_delayacct_blkio_ticks, stat_guest_time, stat_cguest_time, stat_start_data, stat_end_data, stat_start_brk, stat_arg_start, stat_arg_end, stat_env_start, stat_env_end, stat_exit_code, /* from /proc/[pid]/status */ status_name, status_umask, status_state, status_tgid, status_ngid, status_pid, status_ppid, status_tracerpid, status_uid, status_real_user, status_eff_user, status_sav_user, status_fs_user, status_gid, status_real_group, status_eff_group, status_sav_group, status_fs_group, status_fdsize, status_groups, status_nstgid, status_nspid, status_nspgid, status_nssid, status_vmpeak, status_vmsize, status_vmlck, status_vmpin, status_vmhwm, status_vmrss, status_rssanon, status_rssfile, status_rssshmem, status_vmdata, status_vmstk, status_vmexe, status_vmlib, status_vmpte, status_vmpmd, status_vmswap, status_hugetlbpages, status_coredumping, status_threads, status_sig_queued, status_sig_limit, status_sigpnd, status_shdpnd, status_sigblk, status_sigign, status_sigcgt, status_capinh, status_capprm, status_capeff, status_capbnd, status_capamb, status_nonewprivs, status_seccomp, status_speculation_store_bypass, status_cpus_allowed, status_cpus_allowed_list, status_mems_allowed, status_mems_allowed_list, status_voluntary_ctxt_switches, status_nonvoluntary_ctxt_switches, /* /proc/[pid]/syscall */ syscall, /* /proc/[pid]/timerslack_ns */ timerslack_ns, /* /proc/[pid]/wchan */ wchan,
BUGS
No known bugs.
EXAMPLES
Example 1
Get everyting:
config name=app_sampler
Example 2
Down-select and with non-default stream name:
config name=app_sampler metrics=stat_pid,stat_utime stream=mystream
Example 3
Down-select using config file, using default stream:
config name=app_sampler cfg_file=cfg.json
# cfg.json { "metrics" : [ "stat_pid", "stat_utime" ] }
NOTES
Some of the optionally collected data might be security sensitive.
The status_uid and status_gid values can alternatively be collected as “status_real_user”, “status_eff_user”, “status_sav_user”, “status_fs_user”, “status_real_group”, “status_eff_group”, “status_sav_group”, “status_fs_group”. These string values are most efficiently collected if both the string value and the numeric values are collected.
SEE ALSO
ldmsd(8), ldms_quickstart(7), ldmsd_controller(8), ldms_sampler_base(7), proc(5), sysconf(3), environ(3).