Storage Plugin Developer Documentation
Section 1: Threading Overview
Execution Context
Storage plugin interfaces (constructor(), config(), open(), store(),
flush(), commit(), close(), destructor()) are not always called from
the same thread. Understanding which thread calls which interface is important because
it determines what operations are safe to perform and whether additional synchronization
is needed. ldmsd uses three categories of threads relevant to storage plugins:
Worker threads are created and managed by ldmsd. Their responsibilities are:
Handle configuration commands (
load,config,strgp_add,strgp_prdcr_add,strgp_start,strgp_stop,term) in configuration files from-cor-y
IO threads (also referred to as transport threads) are created and managed by Zap. Their responsibilities are:
Handle configuration commands (
load,config,strgp_add,strgp_prdcr_add,strgp_start,strgp_stop,term) sent by remote clients, e.g. (ldmsd_controllerandmaestro)Deliver metric set updates from producers to the update callback, which triggers the storage path
Storage worker threads are created and managed by ldmsd. Their responsibilities are:
Dequeue storage events from the storage worker queue
Call
store(),flush(), andcommit()on storage plugins
Section 2: Storage Paths
A storage plugin supports one or both of the following invocation paths. Which path
ldmsd uses for a given storage policy depends on whether that policy was configured
with the decomp= attribute.
Legacy Path
In the legacy path, open() is called once when the first matching producer set
update arrives, and store() is called on every subsequent update. The plugin
receives a raw ldms_set_t snapshot and an array of metric indices. This path is
appropriate for backends that can store LDMS metric sets without a fixed schema.
Decomposed Path
In the decomposed path, a decomposition layer runs in the IO thread before the
storage event is posted. It transforms the LDMS set snapshot into a list of typed
rows (ldmsd_row_t), each representing a flattened record with named, typed columns.
The storage worker thread then calls commit() with these rows. open() and
store() are not called on this path.
New storage plugins must implement commit(). The decomposed path decouples the
plugin from the LDMS schema layout, supports schema evolution through the decomposition
configuration, and enables backends that require a fixed schema (SQL tables, Avro
streams, columnar formats, etc.).
Important
Do NOT free row_list inside commit(). ldmsd calls
decomp->release_rows() after commit() returns. Freeing it yourself
causes a double-free.
Event-Based Storage
Both paths use an event-based model that decouples storage from data collection. When
a metric set update arrives, the IO thread creates a snapshot of the set, runs
decomposition if configured, and posts a storage event to a storage worker queue, then
returns immediately. A storage worker thread dequeues the event and calls
store()/flush() or commit() on the plugin. Because store() and
commit() run asynchronously on storage worker threads, the IO thread is free to
continue processing updates without waiting for storage operations to complete. The
storage worker pool absorbs bursts and variations in storage latency — the IO thread
is only affected if all storage workers are at capacity, at which point it blocks
until a worker becomes available.
The storage worker pool is shared across all storage policies. Worker assignment is
round-robin per event — a given storage policy’s events are not pinned to a specific
worker. For the architecture, configuration (storage_threads, storage_queue),
and tuning of the storage worker pool, see ldmsd_event_based_storage(7).
Section 3: API Threading Guarantees
Each plugin interface may be called by different threads depending on how the command
that triggers it is delivered — whether from a configuration file processed at startup
or from a remote client such as ldmsd_controller or maestro. The following
describes which thread is responsible for calling each interface.
Interface-by-Interface Breakdown
constructor()is called by the thread handling theloadconfiguration command:A worker thread if the command comes from a configuration file (
-cor-y)An IO thread if the command comes from
ldmsd_controllerormaestro
usage()is called by the thread handling theusageconfiguration command:A worker thread if the command comes from a configuration file (
-cor-y)An IO thread if the command comes from
ldmsd_controllerormaestro. This makes most sense for users to use it inldmsd_controllerIt is expected to return an immutable string describing the plugin and how to configure it.
usage()is not serialized with any other operation.
config()is called by the thread handling theconfigcommand:A worker thread if the command comes from a configuration file
An IO thread if the command comes from
ldmsd_controllerormaestroThe plugin instance’s cfgobj lock is held by ldmsd when
config()is called.
open()is called by an IO thread when the first matching producer set update arrives and the storage policy is RUNNING but has no open container yet. The strgp lock for that storage policy is held whenopen()is called. This interface is only used on the legacy path (see Legacy Path).store()is called by a storage worker thread, once per storage event. No lock is held by ldmsd whenstore()is called — neither the strgp lock nor the cfgobj lock. This interface is only used on the legacy path (see Legacy Path).flush()is called by the same storage worker thread that calledstore()for that event, immediately afterstore()returns, when the storage policy’sflush_intervalhas elapsed. No lock is held.commit()is called by a storage worker thread, once per storage event, when the storage policy is configured with a decomposition (decomp=). It is called instead ofstore()and follows the same threading rules: no lock is held. See Decomposed Path for details on whencommit()vsstore()is used.close()is called by the thread handlingstrgp_stop:A worker thread if the command comes from a configuration file
An IO thread if the command comes from
ldmsd_controllerormaestroThe strgp lock is held during
strgp_stop.
destructor()is called by the thread that drops the last reference to the plugin instance — either a worker thread or an IO thread. By the timedestructor()is called, all storage policies associated with this plugin instance have been stopped and closed.
Section 4: Concurrency Scenarios
This section describes which plugin operations can run concurrently and which are
mutually exclusive. ldmsd does not serialize store(), flush(), and commit()
with any lock — storage plugin authors must handle concurrency explicitly.
Concurrent Operations
config()andstore()/commit()— ldmsd does not serialize these.config()holds the plugin instance’s cfgobj lock, but the storage worker holds no lock when callingstore()orcommit(). If your plugin instance state can be modified byconfig()and read or written bystore()orcommit(), that state is subject to a data race. The plugin must protect it.store()/commit()for different storage policies — Multiple storage workers may call into the same plugin instance simultaneously for different storage policies. For example, if two storage policiesstrgp1andstrgp2both use the same plugin instance,store()forstrgp1andstore()forstrgp2may run concurrently on different storage worker threads.open()for different storage policies — Each storage policy has its own strgp lock, soopen()for different storage policies may run concurrently inside the same plugin instance. Ifopen()modifies shared plugin instance state (such as a container registry), the plugin must protect that state.Multiple
store()/commit()calls for the same storage policy — may execute simultaneously on different storage workers if events are queued faster than a single storage worker processes them. Storage worker assignment is round-robin per event, not sticky per storage policy, so two events for the same storage policy may be dispatched to different storage workers and overlap.config()andopen()/close()—config()holds the cfgobj lock;open()andclose()hold the strgp lock. These are different locks with no ordering relationship. Ifconfig(),open()andclose()can both modify the same plugin instance state, the plugin must protect it.
Mutually Exclusive Operations
open() and close() for a given storage policy do not overlap with each
other or with any store(), flush(), or commit() for that storage
policy.
Section 5: Synchronization Mechanism
This section describes when storage plugin authors need locks, what kind to use, and which ldmsd-internal mechanisms are off-limits.
What Locking Primitives to Use
Use standard pthread_mutex_t for protecting plugin-owned data.
Do NOT use ldmsd’s internal cfgobj lock (ldmsd_cfgobj_lock()). It protects internal
ldmsd properties and must not be acquired by plugins.
A mutex protecting plugin instance state should be initialized in constructor() and
destroyed in destructor().
A mutex protecting per-store-handle state should be initialized in open() and
destroyed in close().
Lock Order Guideline
If your plugin uses multiple locks, document your lock acquisition order and follow it consistently.
Do not call ldmsd APIs from within a plugin lock. ldmsd APIs that operate on configuration objects (storage policies) acquire internal ldmsd locks. Calling them while holding a plugin lock risks deadlock because ldmsd may itself hold those internal locks when it calls your plugin.
Release all plugin locks before returning from any interface.
Section 6: Plugin Lifecycle
The storage plugin does not define plugin-level states the way the sampler plugin does. The lifecycle is instead described by the sequence of interface calls that ldmsd makes on the plugin instance and on the storage policies attached to it.
Plugin Instance Sequence
load → constructor()
config → config() [may repeat]
term → destructor()
The plugin instance has a configured flag set after the first successful config()
call. There is no plugin-level RUNNING or STOPPED state — those belong to the storage
policy, not the plugin instance.
Storage Policy Sequence (per storage policy)
strgp_start → storage policy transitions to RUNNING
→ open() called on first matching producer set update
→ store()/flush() [legacy path only]
or commit() called per update (storage worker thread)
strgp_stop → storage policy transitions to STOPPED
→ close() called (same thread as strgp_stop)
strgp_del → storage policy removed (must already be STOPPED)
A plugin instance may have zero or more storage policies attached to it at any time,
each independently in STOPPED or RUNNING state. The destructor() is only called
after all associated storage policies have been stopped and deleted.
Section 7: Using the LDMS Message Service in Storage Plugins
For terminology, how to check whether the message service is enabled, which API to use, and callback threading details, see Section 6 of the Sampler Plugin Developer Documentation. The information there applies equally to storage plugins.
This section covers only what differs for storage plugins: which message service functions are appropriate to call from each storage plugin interface and the concurrency implications.
Appropriateness by Plugin Interface
The following table summarizes which functions are appropriate to call from each plugin
interface. For threading details and what is safe to call from within the callback, see
ldms_msg(7) and ldms_msg_chan(7).
The table below covers the plugin instance interfaces (constructor(), config(),
destructor(), open(), store(), commit(), close()). flush()
follows the same rules as store().
API |
constructor() |
config() |
open() |
store() / commit() |
close() |
destructor() |
callback |
|---|---|---|---|---|---|---|---|
|
Yes |
Yes |
Yes |
No [1] |
No [1] |
No [1] |
Yes |
|
No |
Yes |
No |
No [1] |
Yes |
Yes |
Yes |
|
Yes |
Yes |
Yes |
Yes [3] |
No |
No |
Yes [2] |
|
Yes |
Yes |
Yes |
No [1] |
No |
No |
No |
|
No |
Yes |
No |
Yes [4] |
No |
No |
Yes |
|
No |
Yes |
No |
No [1] |
Yes |
Yes |
Yes [5] |
|
Yes |
Yes |
Yes |
No |
No |
No |
No |
Notes:
[1] Not a thread safety issue. These are lifecycle operations that generally
belong in constructor(), config(), open(), close(), or
destructor().
[2] ldms_msg_publish(NULL, ...) invokes subscriber callbacks inline before
returning. Avoid holding plugin-internal locks across this call inside a callback — see
ldms_msg(7).
[3] store() and commit() run on storage worker threads with no ldmsd lock held.
Publishing from these functions is safe but adds latency to the storage worker.
[4] ldms_msg_chan_publish() blocks until space is available in the channel’s queue.
In store() / commit(), a full queue will stall the storage worker thread and
increase event queue depth. Consider whether blocking is acceptable or whether the
message should be queued for a separate thread.
[5] Avoid calling ldms_msg_chan_close() with cancel=0 from within the callback —
it will block the IO thread for the duration of teardown. Use cancel=1 if closing
from within a callback.
Callback
For the LDMS Message Bus API, the callback is invoked by whichever thread called
ldms_msg_publish() for messages published on the same bus, or by an IO thread for
messages arriving from a remote peer. The LDMS_MSG_EVENT_CLIENT_CLOSE event is
delivered to signal that teardown is complete and resources may be freed.
For the message channel API, the callback is always invoked by an IO thread.
LDMS_MSG_EVENT_CLIENT_CLOSE is handled internally by the channel and is not
delivered to the application callback.
For full details on callback threading, what is safe to call, and teardown patterns,
see ldms_msg(7) and ldms_msg_chan(7).
Concurrency with storage operations: Message callbacks run on IO threads, which are
distinct from storage worker threads. A callback may run concurrently with store()
or commit() for any storage policy. If the callback and store()/commit()
access shared plugin state, the plugin must protect that state with a mutex.
Section 8: LDMSD Internal Implementation Details (For Maintainers Only)
This section documents ldmsd-internal implementation details for ldmsd maintainers. Plugin authors do not need to read this section.
Lock Acquisition Order
The following describes the acquisition order for the strgp lock and related locks:
The strgp lock is taken in
ldmsd_strgp_update_prdcr_set()whenopen()is called on the first matching producer set update. It is released before the IO thread returns.The strgp lock is taken when handling
strgp_stopand held duringclose(). It is released before replying.The strgp lock is NOT held during
store(),flush(), orcommit(). Storage events run entirely outside the strgp lock to avoid blocking the IO thread and other producers.The plugin instance’s cfgobj lock is held during
config()but not duringstore()orcommit(), so ldmsd provides no serialization between them.The strgp lock is taken when handling
strgp_del: the storage policy must be STOPPED (EBUSYis returned otherwise), the reference count is checked (EBUSYis returned if in-flight references remain), and the plugin instance’s reference to the storage policy is released and the storage policy is removed from the cfgobj tree.
SEE ALSO
ldmsd(8) ldmsd_controller(8) ldmsd_event_based_storage(7)
ldms_msg(7) ldms_msg_chan(7)