Sampler Plugin Developer Documentation
Section 1: Threading Overview
Execution Context
Plugin interfaces (constructor(), config(), sample(), destructor()) are
not always called from the same thread. Understanding which thread calls which interface
is important because it determines what operations are safe to perform and whether
additional synchronization is needed. ldmsd uses three categories of threads:
Worker threads are created and managed by ldmsd. Their responsibilities are:
Schedule samples
Call
sample()Handle configuration commands (
load,config,start,stop,term) in configuration files either from-cor-y
Dedicated sampling thread — if the plugin instance is configured to use a dedicated
thread, ldmsd creates one when start is called and deletes it when stop is
called. Its sole responsibility is to call sample() for that plugin instance.
IO threads (also referred to as transport threads) are created and managed by Zap. Their responsibilities are:
Handle configuration commands (
load,config,start,stop,term) sent by a remote interface (ldmsd_controllerand maestro)Deliver Stream and Message to sampler plugins
Section 2: API Threading Guarantees
Each plugin interface may be called by different threads depending on how the command
that triggers it is delivered — whether from a configuration file processed at startup or
from a remote client such as ldmsd_controller or maestro. The following describes
which thread is responsible for calling each interface.
Config Commands to Sampler Plugin Interface Calls
constructor()is called by the thread handling the load configuration command:worker thread when the load command is in config files
IO thread when the load command is sent by a remote client such as
ldmsd_controlleror maestro
usage()is called by the thread handling the usage configuration command:IO thread typically calls
usage()because it makes most sense for users to use it inldmsd_controllerusage()is expected to return an immutable string of a short description of the plugin and how to configure it
config()is called by:worker thread when the command is in config files
IO thread when the command is sent by a remote client such as
ldmsd_controlleror maestro
sample()is only called by a single thread:a worker thread shared with other ldmsd’s operations (e.g., other sampler plugin’s
sample()calls)a dedicated thread created by ldmsd at the config time if the plugin instance is configured to have a dedicated sampling thread
destructor()can be called by either a worker thread or an IO thread. It is called when the reference of the plugin configuration object reaches zero.
Beyond knowing which thread calls which interface, plugin authors can rely on a set of ordering and availability guarantees that ldmsd enforces across the plugin lifecycle. These are safe to assume regardless of which thread is active:
Section 3: Concurrency Scenarios
This section describes which plugin operations can run concurrently and which are mutually exclusive. ldmsd serializes most plugin operations to prevent race conditions. usage() is the exception — it is not serialized because it only returns an immutable string and therefore does not modify any plugin state.
Concurrent Operations
usage() may run concurrently with the other operations between constructor() and
destructor() calls.
Mutually Exclusive Operations
constructor(), config(), sample(), and destructor() are mutually
exclusive. All these operations are serialized by ldmsd — only one executes at a time.
Section 4: Synchronization Mechanism
This section describes how to write thread-safe sampler plugins. It covers two distinct mechanisms: plugin-level locking for protecting plugin-owned data when the plugin manages its own threads, and the LDMS set transaction API for protecting metric consistency during updates.
Thread Safety of LDMS Sets
LDMS sets have built-in transaction safety via “consistent/inconsistent” flags
sample()must callldms_transaction_begin()before updating a metric value and callldms_transaction_end()after updating all metric valuesSet readers can call
ldms_set_is_consistent()to verify that a set is not currently being updatedSampler plugin authors do NOT need additional locks for the LDMS set data itself
LDMS transaction APIs are only needed to enable external set readers to verify inconsistent metric values. For example, an ldmsd aggregator automatically checks if the set is inconsistent so that it will not store a set that is in the inconsistent state
Available Locks
Do not call ldmsd_cfgobj_lock(). The cfgobj lock protects internal ldmsd properties.
Sampler plugin authors should not use or depend on this lock. Use pthread_mutex_t for
plugin data.
Section 5: Plugin Lifecycle
This section describes the lifecycle of a sampler plugin instance. A plugin instance transitions through a defined set of states in response to configuration commands. The state determines which plugin interfaces ldmsd may call and which commands are valid at that point. Note that a plugin instance must be stopped before it can be terminated — the term command is only valid from the INIT or CONFIGURED states.
State Diagram
Sampler Plugin Lifecycle State Diagram
State Descriptions
The table below describes each state in the plugin lifecycle, how it is entered, and which operations are permitted in that state.
State |
How to Enter |
Plugin Status |
Allowed Operations |
Operation Calling Thread |
|---|---|---|---|---|
INIT |
load command calls |
Plugin loaded but not configured |
|
Worker or IO |
CONFIGURED |
|
Plugin configured |
|
Worker or IO |
RUNNING |
|
Plugin actively sampling data |
|
Worker or dedicated sampling thread (for |
TERMINATING |
|
Plugin being destroyed |
|
Worker or IO (whoever drops last reference) |
Section 6: Using the LDMS Message Service in Sampler Plugins
Terminology
- LDMS Message Bus (declared in
ldms_msg.h, API prefixldms_msg_) The pub/sub routing layer in LDMS. Each message is published with a tag name; subscribers register callbacks against a tag name or regex pattern to receive matching messages. The Message Bus manages the routing and delivery of messages to authorized subscribers. See
ldms_msg(7).- Message channel (declared in
ldms_msg_chan.h, API prefixldms_msg_chan_) A resilient mechanism that sits on top of the LDMS Message Bus. A message channel manages multiple message buses, reconnects if a peer disconnects, and queues data while the peer is down. See
ldms_msg_chan(7).
Checking Whether the Message Service Is Enabled
Both APIs require the message service to be enabled. Check with
ldms_msg_is_enabled() before using either API:
if (!ldms_msg_is_enabled()) {
/* handle disabled case */
}
How to handle a disabled message service depends on the plugin:
If the plugin requires the message service to function at all, return an error from
constructor()to prevent the plugin from loading.If the plugin has modes that do not use the message service, return an error from
config()only when the user configures a mode that requires it.
Plugins must not call ldms_msg_enable() to enable the service themselves.
Whether the message service is enabled is an ldmsd instance-level decision, not a
per-plugin one.
Which API to Use
The choice depends on whether the plugin needs to own a connection to a peer.
A plugin that relies on connections already managed by ldmsd uses the LDMS Message
Bus API (ldms_msg_) directly — no connection management required.
A plugin that needs its own dedicated and reliable connection to a peer — whether
connecting out to a remote peer or listening for incoming connections — uses the
message channel API (ldms_msg_chan_).
Use case |
API to use |
|---|---|
Plugin does not need to own a connection |
|
Plugin needs to own a connection to a peer |
|
Key points:
Within each case, the plugin can subscribe, publish, or both as needed.
A single message channel can publish on any number of different tag names. A plugin needs at most one channel per peer.
Without a message channel, if the peer the plugin depends on disconnects, there is no reconnect logic and the plugin will not recover.
A plugin that needs to own a connection only for subscribing should be aware that
ldms_msg_chanin subscribe mode creates a listening transport endpoint bound to a local port. This is appropriate for plugins that act as a passive listener for an incoming publisher connection.
Appropriateness by Plugin Interface
The following table summarizes which functions are appropriate to call from each
plugin interface. For threading details and what is safe to call from within the
callback, see ldms_msg(7) and ldms_msg_chan(7).
API |
|
|
|
|
callback |
|---|---|---|---|---|---|
|
Yes |
Yes |
Not appropriate [1] |
Not appropriate [1] |
Yes |
|
No |
Yes |
Not appropriate [1] |
Yes |
Yes |
|
Yes |
Yes |
Yes |
No |
Yes [2] |
|
Yes |
Yes |
Not appropriate [1] |
No |
No |
|
No |
Yes |
Yes [3] |
No |
Yes |
|
No |
Yes |
Not appropriate [1] |
Yes |
Yes [4] |
|
Yes |
Yes |
No |
No |
No |
Notes
Callback
For the LDMS Message Bus API, the callback is invoked by whichever thread called
ldms_msg_publish() for messages published on the same bus, or by an IO thread
for messages arriving from a remote peer. The LDMS_MSG_EVENT_CLIENT_CLOSE
event is delivered to signal that teardown is complete and resources may be freed.
For the message channel API, the callback is always invoked by an IO thread.
LDMS_MSG_EVENT_CLIENT_CLOSE is handled internally by the channel and is not
delivered to the application callback.
For full details on callback threading, what is safe to call, and teardown
patterns, see ldms_msg(7) and ldms_msg_chan(7).
Section 7: LDMSD Internal Implementation Details (For Maintainers Only)
This section documents ldmsd-internal implementation details for ldmsd maintainers. Plugin authors do not need to read this section.
Lock Acquisition Order
The following describes the acquisition order for the plugin cfgobj lock:
Take when handling
start; release before replying backTake before calling
sample(); release after returningTake when handling
stop; release before replying backTake when handling
term:Release before calling
ldmsd_set_deregister()Re-acquire the lock
Release before removing the cfgobj from the tree