Strategic Architecture for Native Centralized Time-Series Storage in LibreNMS

tristanbob · 11 May 2026 06:01

LibreNMS could benefit greatly from modernizing its data collection and graphing capabilities. This will make LibreNMS even more valuable than it already is today.

I’ve setup a LibreNMS → VictoriaMetrics → Grafana pipeline with custom dashboards and it’s working very well. But I want to see similar capabilities in LibreNMS natively.

(FYI, there was a previous discussion on this topic by @willhseitz from 2021.)

Below is a proposed architecture I generated about this topic. Please feel free to provide corrections or alternate ideas. Thanks! - Tristan

Strategic Architecture for Native Centralized Time-Series Storage in LibreNMS

The Architectural Imperative for Metric Storage Modernization

The foundation of LibreNMS, a widely deployed open-source network monitoring platform, is intricately tied to Round Robin Database (RRD) files for time-series metric storage. Inherited from its predecessor, Observium, this architecture relies heavily on RRDtool to define rigid data structures, store polled metrics, and dynamically render server-side graph images. While RRD files ensure predictable disk space utilization by automatically consolidating and discarding older data points through predefined Round Robin Archives (RRAs), the fundamental design introduces severe scalability limitations in modern, high-density network environments. As deployments scale to tens of thousands of ports and devices, the constant read-modify-write cycle of thousands of individual RRD files generates massive input/output operations per second (IOPS). Although the implementation of RRDCached mitigates immediate disk I/O bottlenecks by buffering writes in memory before flushing them to disk in batches, it merely defers the fundamental limitations of decentralized file-based storage.

Furthermore, the distributed polling architecture in LibreNMS, which utilizes a Dispatcher Service and Redis for horizontal scaling, requires all poller nodes to share a common storage backend. In the context of RRD, this necessitates complex and often brittle shared filesystems like NFS or GlusterFS to ensure all pollers and the central web interface can read and write to the same .rrd files across the /opt/librenms/rrd/ directory hierarchy. This shared filesystem requirement introduces significant latency and single points of failure, which degrade the efficiency of the polling cycle. Additionally, the graphical presentation layer is tightly coupled to the storage engine itself. LibreNMS utilizes modular PHP scripts to construct complex shell commands that execute the rrdtool graph binary, which parses the RRD files and outputs a static Portable Network Graphics (PNG) image to a temporary directory before streaming it to the client browser. This server-side rendering paradigm prevents the implementation of modern, interactive client-side graphing features such as dynamic zooming, real-time tooltip value inspection, and fluid panning without repeatedly querying the backend to generate a newly rendered static image.

While LibreNMS has incrementally introduced integrations with external Time-Series Databases (TSDBs) such as InfluxDB, Prometheus, Graphite, and OpenTSDB, these implementations operate strictly as write-only export mechanisms. The system replicates the data destined for RRD and pushes it to these external endpoints over HTTP protocols. However, the core LibreNMS web interface possesses no native capability to read from these TSDBs to render its built-in device, port, and health graphs. Consequently, administrators seeking modern visualization must deploy supplementary platforms like Grafana, manually rebuilding dashboards that replicate the automatic discovery topologies provided natively by LibreNMS. A comprehensive architectural redesign is required to sever the dependency on RRDtool entirely. This necessitates decoupling the storage ingestion layer from the visualization layer, introducing a centralized open-source TSDB as the primary backend, and refactoring the PHP presentation logic to consume JSON-based API payloads rendered via a modern JavaScript charting library.

Deconstructing the Legacy LibreNMS Metric Pipeline

To successfully abstract and replace the metric storage engine, it is necessary to map the exact code paths and structural dependencies that bind LibreNMS to RRDtool. The legacy system architecture operates across three highly interdependent phases: data definition, data ingestion, and graphical rendering. Understanding these components is critical to ensuring that a replacement architecture achieves absolute feature parity.

The Polling and Ingestion Subsystem

During the device discovery and polling phases, LibreNMS executes modular PHP scripts and YAML definitions tailored to specific vendor operating systems. The primary metadata, configuration settings, and sensor state information are stored in a relational database (typically MariaDB or MySQL). However, the time-series measurements themselves are directed to the RRD subsystem. When an SNMP query returns a valid numeric metric, the system invokes the LibreNMS\RRD\RrdDefinition class to establish the strict parameters of the metric. This class defines the exact structural requirements of the RRD file, specifying the Data Source (DS) type—such as GAUGE, COUNTER, DERIVE, or ABSOLUTE—as well as the minimum and maximum data bounds. The definition dictates how the metric will be treated over time, ensuring that 32-bit and 64-bit counter wraps and gauge fluctuations are handled correctly according to the strict mathematical rules of the RRD specification.

Once the definition is established, the polling mechanism utilizes the DataStorageInterface, specifically invoking the put method on the Datastore object, which routes the data into the storage backend. If a target .rrd file does not exist within the /opt/librenms/rrd/<hostname>/ directory structure, the system executes an rrdtool create command to initialize it, establishing the predetermined Round Robin Archives (RRAs) that govern how data will be averaged, minimized, and maximized over explicit time intervals. Subsequently, the system executes an rrdtool update command. If RRDCached is configured, this update is routed through a UNIX socket (e.g., unix:/run/rrdcached.sock) or over a TCP connection (${IPADDRESS}:42217), deferring the physical disk write to a background daemon.

Because RRD files strictly enforce time steps based on the configured polling interval, configuring parameters like the STEP and HEARTBEAT values is critical. By default, LibreNMS polls devices every 300 seconds (5 minutes), with a heartbeat of 600 seconds. If a polling cycle experiences latency and fails to deliver data within the heartbeat window, RRD registers a null value, resulting in discontinuous graphs and missing data points. Migrating to a 1-minute polling interval requires executing scripts like lnms maintenance:rrd-step to physically restructure the binary RRD files, an intensely disk-heavy operation that highlights the inflexibility of the format .

The Disk I/O Bottleneck and Distributed Polling Complexity

The decentralized nature of RRD storage creates profound infrastructure challenges as LibreNMS environments scale. Each network interface, sensor, CPU core, and memory pool generates its own discrete .rrd file. For a moderate deployment of 2,700 devices and 65,000 ports, the system must constantly update hundreds of thousands of individual files every five minutes. Even with the buffering capabilities of RRDCached, the underlying hypervisor must manage massive quantities of random write operations, often requiring administrators to migrate the entire /opt/librenms/rrd directory to dedicated NVMe storage arrays or even in-memory RAM disks to prevent the polling queue from stallin g.

This storage paradigm becomes exponentially more complex when implementing the LibreNMS Distributed Polling feature. Distributed polling is designed to spread the SNMP data collection workload across multiple discrete servers for horizontal scaling, coordinated via Redis and a Dispatcher Service. However, because all pollers must write metric data, and the central web server must read that data to generate UI graphs, the /opt/librenms/rrd directory must be accessible to all nodes. This relies on Network File System (NFS) mounts or clustered file systems. Network instability or lock contention on the NFS share immediately degrades poller performance, creating a fragile architectural dependency where a storage network issue can halt all network telemetry collection simultaneously. A centralized TSDB is uniquely equipped to resolve this by transforming all poller nodes into ephemeral, stateless agents that transmit payloads over stateless HTTP APIs rather than performing lock-based file operations.

The Server-Side Graphical Rendering Engine

The most complex barrier to replacing RRD is the deeply embedded graph generation logic located within the includes/html/graphs/ directory. Unlike modern decoupled web architectures that serve raw JSON time-series data to a frontend rendering library, LibreNMS dynamically builds extensive command-line strings that are fed directly into the rrdtool binary executab le.

When an administrator requests a graph via the web interface, the request is parsed through html/graph.php, which subsequently loads includes/html/graphs/graph.inc.php. Depending on the specific context of the request (e.g., viewing an interface’s packet discard rate), the system loads specific mapping files, such as includes/html/graphs/generic_simplex.inc.php. These mapping files define the visual parameters of the image: color hex codes for the lines and shaded areas, the unit text labels for the Y-axis, and the specific Data Sources (DS) to be extracted from the binary file.

The system then invokes the rrdtool_graph function, located in includes/rrdtool.inc.php. This function utilizes rrdtool_build_command to compile the variables into a massive string containing RRD graphing instructions. Crucially, this string contains imperative logic for mathematical transformations. RRDtool utilizes Reverse Polish Notation (RPN) via CDEF (Compute Data Definition) commands to manipulate data on the fly before drawing it. For example, converting raw SNMP octet byte counters into bits per second is executed dynamically via a command sequence like CDEF:out_bits=out_bytes,8,*. The command may also include VDEF (Variable Data Definition) instructions to calculate the 95th percentile, the total aggregated volume over the time period, and standard deviati ons.

Once the command string is constructed, it is piped to a background process via Proc->sendCommand($cmd), instructing the rrdtool binary to parse the historical data, execute the RPN mathematics, and output a static PNG image to the /tmp/ directory. This image is then read by the PHP processor, encoded as a base64 string, and served back to the browser. This architecture inherently prevents client-side interactivity; features such as interactive data point inspection, responsive canvas resizing without refreshing, and localized dynamic zooming are structurally impossible because the frontend browser has no access to the underlying time-series data, only a flattened raster i mage.

Component	Legacy RRDtool Implementation	Proposed Centralized TSDB Architecture
Storage Medium	Decentralized Binary Files (`.rrd`)	Centralized Database Cluster
Disk I/O Profile	High Random Write IOPS (Even with caching)	Optimized Sequential Batch Writes
Distributed State	Requires Shared Filesystem (NFS)	Stateless HTTP Pollers (via `vmagent`)
Mathematical Logic	Server-Side RPN (`CDEF` / `VDEF`)	Database Query Engine (e.g., MetricsQL)
Graphical Rendering	Server-Side Static PNG Generation	Client-Side JavaScript (HTML5 Canvas)
Data Interactivity	None (Static images require re-polling)	Native (Hover tooltips, dynamic zoom)

Evaluating Existing TSDB Integration Shortcomings

Over the years, community contributors have attempted to circumvent the limitations of RRD by building export integrations for modern Time-Series Databases. LibreNMS currently features configuration parameters to push metrics to Graphite, InfluxDB, OpenTSDB, and Prometheus. While these integrations demonstrate that LibreNMS can format its polled data for external ingestion, they are explicitly designed as unidirectional transport mechanisms. The official documentation clearly states that these backends cannot be used to display graphs within the LibreNMS interface. Users must rely entirely on external dashboarding software, such as Grafana, to visualize the exported metrics. Furthermore, analyzing the architectural implementations of these export modules reveals significant flaws that prevent them from serving as a 1:1 primary storage replacement for RRD in their current state.

Selecting the Optimal Centralized Backend: VictoriaMetrics

Given the structural limitations of RRDtool and the integration flaws of InfluxDB and Prometheus, VictoriaMetrics emerges as the most viable, performant, and architecturally sound open-source TSDB to serve as the primary backend for LibreNMS. Engineered specifically to handle massive volumes of telemetry data while maintaining extreme cost-efficiency, VictoriaMetrics serves as a high-performance drop-in replacement for both Prometheus and InfluxDB environ ments.

Columnar Storage and Unmatched Compression Ratios

The primary advantage of VictoriaMetrics is its extraordinary data compression capabilities. RRD files rely on static, pre-allocated block structures, meaning a newly created file instantly occupies its maximum required disk footprint on the filesystem, regardless of how much data it currently contains. VictoriaMetrics, utilizing optimized Log-Structured Merge (LSM) trees organized into columnar data files, achieves compression ratios up to 50:1 compared to uncompressed TSDBs.

Benchmarking demonstrates that VictoriaMetrics requires significantly less RAM and CPU compute power than Prometheus, scanning up to 50 million raw samples per second per CPU core during querying operations. In production environments, migrating from InfluxDB or Prometheus to VictoriaMetrics routinely results in a 60% to 90% reduction in disk space utilization and a massive decrease in memory pressure. For a LibreNMS deployment managing billions of data points across a multi-year retention period, this compression efficiency fundamentally alters the hardware economics of the monitoring platform.

Ingestion Throughput and Protocol Flexibility

Crucially for the LibreNMS architecture, VictoriaMetrics natively ingests the InfluxDB line protocol over standard HTTP endpoints. This allows LibreNMS to completely bypass the restrictive and synchronous Prometheus Pushgateway, utilizing rapid, high-volume HTTP POST requests. In performance comparisons, VictoriaMetrics has demonstrated the ability to ingest data at rates nearly six times faster than early versions of InfluxDB IOx, processing over 4 million rows per second on a single node. This guarantees that the LibreNMS polling threads will not be stalled waiting for storage acknowledgments.

Furthermore, unlike InfluxDB, VictoriaMetrics provides a fully open-source cluster version, allowing deployments to scale horizontally by decoupling the ingestion (vminsert), storage (vmstorage), and query (vmselect) layers into discrete microse rvices.

MetricsQL: Bridging the RRD CDEF/VDEF Gap

Replacing the RRD graphing engine requires a query language capable of executing complex mathematical transformations on the fly. VictoriaMetrics utilizes MetricsQL, a query language that is entirely backward-compatible with PromQL but introduces powerful proprietary extensions. MetricsQL provides built-in functions for calculating rates, derivatives, percentages, and complex arithmetic operations across multiple time-series .

When LibreNMS currently relies on an RRD CDEF command to multiply a byte counter by 8 to display bits per second, MetricsQL can replicate this natively during the data retrieval phase. When LibreNMS relies on a VDEF command to calculate the 95th percentile of bandwidth utilization for billing purposes, MetricsQL’s built-in histogram_quantile and aggregation operators can execute the calculation precisely across millions of stored data points. This mathematical parity is the linchpin that allows the server-side image generation to be retired in favor of raw data extraction.

Designing the Unified Open-Source Architecture

Transforming LibreNMS to utilize VictoriaMetrics as its primary data store involves a rigorous multi-layered software engineering effort. The objective is to construct an architecture that captures data efficiently via optimized HTTP payloads, abstracts the querying logic to remain backend-agnostic, and delegates the graphical rendering to the client’s web browser.

Resilient Ingestion via Local vmagent Relays

By architecting the new LibreNMS driver to route all metric writes exclusively through a local vmagent instance on each poller node, the system gains profound operational resilience and flexibility. vmagent is a lightweight metrics collection and routing agent developed by VictoriaMetrics that accepts push-based payloads and forwards them to the central database cluster.

The primary advantage of deploying vmagent locally is absolute protection against data loss. Network monitoring platforms often experience gaps in telemetry data when the central TSDB is taken offline for version upgrades, scaling, or routine maintenance. If a Wide Area Network (WAN) link between a remote LibreNMS distributed poller and the central data center drops, direct API writes would fail entirely. vmagent resolves this by acting as a highly durable queue. If the central VictoriaMetrics cluster becomes unreachable, vmagent seamlessly buffers the unsent metrics to local persistent disk files. It continually collects and stores data in this local safety buffer while waiting for connectivity to be restored. Once the database is back online, vmagent automatically flushes the persistent queue to the remote storage, systematically backfilling the historical data to ensure no gaps exist in the resulting graphs. Administrators can enforce storage limits on this buffer using the -remoteWrite.maxDiskUsagePerURL parameter, ensuring that an extended outage does not inadvertently exhaust the poller node’s entire local hard drive.

In addition to preventing data loss, passing metrics through vmagent introduces several other structural benefits:

Bandwidth Optimization: When transmitting data from the local vmagent to the central cluster, the agent packages the data using the native VictoriaMetrics remote write protocol. This protocol is highly compressed, reducing network bandwidth utilization by 2x to 5x over WAN links compared to pushing standard, uncompressed JSON or InfluxDB line protocol payloads directly.
Minimal Resource Footprint: Unlike other metrics agents that rely heavily on a Write-Ahead Log (WAL), vmagent is explicitly engineered without one. This design choice drastically reduces its CPU and RAM consumption and allows the agent to restart nearly instantaneously (skipping broken chunks if necessary) without needing to perform slow WAL replays.
Stream Aggregation and Deduplication: vmagent features a built-in processing pipeline that can manipulate data in-flight before it is transmitted. It can perform stream aggregation—such as automatically calculating and forwarding 5-minute averages rather than raw high-frequency samples—and real-time deduplication to aggressively reduce the volume of data stored centrally.

Constructing the Data Retrieval Abstraction Layer

The most significant structural alteration is the dismantling of the server-side image generation. Currently, LibreNMS lacks a unified, native class for extracting queried data back into the application, operating strictly through the Rrd facade to pipe comm ands t o t he sh ell.

A new core interface, logically named LibreNMS\Data\Retrieve, must be instituted to act as the translation layer between the web application’s request for data and the underlying syntax of the TSDB. When a user requests a traffic graph for a specific interface via the web UI, the frontend will issue an asynchronous AJAX request to an internal LibreNMS REST API endpoint (e.g., /api/v0/devices/:hostname/ ports/ :i d /metr ics).

The LibreNMS\Data\Retrieve driver will intercept this request, parse the requested parameters, and construct a targeted MetricsQL query. For instance, replacing the RRD logic that calculates outbound bits per second from a raw octet counter involves translating the mathematical steps. The system will issue a query to the VictoriaMetrics /api/v1/query_range endpoint :

rate(ifOutOctets{hostname="switch-01", ifIndex="101"}[5m]) * 8

This query leverages the VictoriaMetrics native rate() function to calculate the per-second derivative of the counter over a five-minute window, and immediately applies an arithmetic operator (* 8) to convert bytes to bits. This operation elegantly replaces the CDEF:out_bits=out_bytes,8,* logic previously handled exclusively by the RRDtool binary. VictoriaMetrics processes this command across the clustered backend with sub-second latency and returns a standardized JSON payload containing an array of timestamps and their corresponding floating-point values.

Re-engineering the Presentation Layer

Receiving a JSON array of timestamps and values fundamentally shifts the burden of graph rendering from the backend PHP server to the client’s local browser context. The legacy PHP files situated within the includes/html/graphs/ directory must be repurposed. Instead of executing shell command arrays, these files will act as declarative mapping templates, defining which MetricsQL query templates correspond to which LibreNMS UI elements.

Implementing Client-Side Rendering Integration

The LibreNMS WebUI must integrate a modern, high-performance JavaScript charting library to replace the static PNGs. Frameworks such as Apache ECharts, Chart.js, or uPlot are optimal candidates due to their proven ability to render large, dense time-series datasets efficiently using HTML5 Canvas or WebGL technologies.

When the user navigates to a device page, the browser will request the layout structure from the PHP server. The PHP server will return the HTML shell, embedding the necessary JavaScript parameters and the required API endpoint URLs. The client’s browser will then execute asynchronous fetch() requests to the /api/v0/devices/... endpoints, retrieving the JSON time-series data processed by VictoriaMetrics. The chosen JavaScript library will ingest this JSON and instantly render the graph within the browser window.

Achieving Graphing Capability Parity

To satisfy the requirement that the new system must possess at least the same capabilities as the existing system, the client-side implementation must meticulously replicate the nuanced features of the RRD engine:

Dynamic Zooming and Panning: Client-side libraries natively support interactive click-and-drag zooming. Because the raw data is held in the browser’s memory, zooming does not require a round-trip request to the backend server to generate a new image, vastly improving the flu idity of the i nterface.
Precise Tooltip Inspection: RRD images cannot show discrete values when hovered over. The JavaScript implementation provides instant, precise tooltip readouts of the data point values at any given timestamp, resolving a long-standing user inter face limitation.
Custom Graph Definitions: LibreNMS relies on user-contributed configuration files located in resources/definitions/config_definitions.json (or locally in config.php) to define custom graphs, such as application-specific session counts or active users. The new architecture maps these JSON definitions directly to the MetricsQL queries, ensuring that the custom graph ecosystem continues to function flawlessly without requiring users to manually configur e Grafana panels.
Advanced Aggregations: Parity with RRD’s VDEF features (such as generating 95th percentile lines for bandwidth billing) will be achieved by appending additional MetricsQL queries to the payload. The JavaScript library will receive the 95th percentile static value from VictoriaMetrics and render it as an overlay line on the canvas.

Data Retention, Downsampling, and Historical Migration

A fundamental shift in data storage architecture requires addressing the lifecycle of the data itself. RRD files are celebrated for their simplicity in data retention; they automatically and rigidly downsample data into lower-resolution archives over time (e.g., aggregating 5-minute data points into 2-hour averages after 30 days). This guarantees that the RRD file never expands beyond its initial byte allocation.

Replicating RRA with Recording Rules

VictoriaMetrics is an append-only TSDB that stores every raw metric pushed to it. While its 50:1 compression ratio ensures that storing billions of raw metrics requires a fraction of the space of uncompressed databases, storing raw 5-minute polling data indefinitely is computationally inefficient for performing multi-year trend analysis. The open-source version of VictoriaMetrics does not natively feature automatic downsampling, a capability that is strictly reserved for its commercial enterprise tier.

However, the architecture can precisely replicate RRD’s automatic aggregation behavior by leveraging the open-source vmalert component to execute recording rules. The LibreNMS configuration architecture can dynamically generate vmalert YAML configuration files that define continuous background MetricsQL queries. For example, to replicate an RRA that stores 1-hour averages of network interface traffic, a recording rule can be configured to execute every hour on the hour:

YAML

groups:
  - name: downsample_traffic
    interval: 1h
    rules:
      - record: ifOutOctets:1h_avg
        expr: avg_over_time(ifOutOctets[1h])

The vmalert daemon evaluates these expressions against the raw data stored in VictoriaMetrics and writes the resulting aggregated data point back into the database under a new metric designation (ifOutOctets:1h_avg). To enforce strict data pruning and replicate RRD’s fixed-size constraints, administrators can deploy multiple vmstorage nodes (or utilize vmagent routing parameters) with distinct retention configurations. The raw, high-resolution 5-minute metrics can be routed to a storage pool with a -retentionPeriod of 30 days, while the downsampled metrics generated by vmalert can be directed to a separate storage pool configured with a multi-year retention policy. This guarantees perpetual data visibility without uncontrolled disk consumption.

The Historical RRD Migration Pathway

For established LibreNMS deployments, preserving historical data stored in legacy .rrd files is an absolute operational requirement. Migrating binary RRD structures into a centralized TSDB necessitates the development of a highly specific translation utility.

The system will require the development of a CLI utility command, conceptually similar to the existing lnms maintenance:rrd-step or lnms migrate scripts. This script will recursively iterate through the /opt/librenms/rrd/ hierarchy. For each identified RRD file, it will invoke the rrdtool xport command to extract the historical time-series data into a parseable JSON or XML format. The script will then parse this output, structurally mapping the hierarchical directory names and filenames into native VictoriaMetrics labels (e.g., algorithmically converting the file path /opt/librenms/rrd/switch-core-01/port-id10.rrd into the label payload {hostname="switch-core-01", port_id="10"}).

The script will then stream this historical payload via the InfluxDB line protocol into the local vmagent relay for buffering and eventual ingestion. Because extracting data from thousands of RRD files is intensely I/O bound, this migration script must be executed asynchronously and optionally across multiple worker threads to prevent locking the system. The script will append the original backdated timestamps (converted from UNIX epoch seconds to milliseconds) to ensure the historical data aligns perfectly seamlessly alongside newly incoming active telemetry.

Strategic Conclusion

The persistence of RRDtool within modern network monitoring solutions represents a technological debt that fundamentally constrains scalability, limits user interface modernization, and isolates critical telemetry data in rigid, inaccessible silos. While the integration of external Time-Series Databases as secondary, write-only endpoints within LibreNMS has provided a functional stopgap for advanced users, it has comprehensively failed to resolve the core architectural limitations of the platform’s primary storage and rendering engines.

By architecting a solution that completely excises RRDtool and elevates VictoriaMetrics to the primary native datastore, the monitoring system is fundamentally transformed. Writing locally to vmagent relays adds a critical layer of operational resilience, ensuring that network telemetry is securely buffered during maintenance windows or network interruptions. Furthermore, VictoriaMetrics provides the requisite ingestion velocity, extreme data compression efficiency, and the mathematical query language complexity (MetricsQL) necessary to absorb the demanding, high-cardinality workloads generated by active SNMP network polling. Finally, transitioning from static, server-side CDEF image compilation to JSON-driven, client-side graphical rendering modernizes the presentation layer to match contemporary analytical standards.

While the requisite codebase refactoring—specifically the translation of thousands of legacy rrdtool graph definitions into the new LibreNMS\Data\Retrieve API configurations and the implementation of vmalert for downsampling—is an extensive undertaking, the strategic dividends are incontrovertible. The resulting unified architecture decouples storage from compute, eradicates dangerous shared-filesystem dependencies in distributed polling environments, prevents data loss at the edge, and establishes a highly scalable, fully interactive network observability platform capable of meeting the demands of next-generation enterprise networks.

tristanbob · 20 May 2026 00:16

Title: RFC: Incremental graph data API and ECharts modernization (keeping RRD as default)

This is a design discussion, not a code PR. I’d like to get maintainer feedback before writing any code.

Problem

LibreNMS graphs are image-only today. The graph pipeline goes from RRD directly to a PNG — there is no intermediate data contract. This makes it hard to:

Build interactive, browser-rendered graphs
Power external dashboards or API integrations without screen-scraping
Swap or supplement the metric backend without rewriting graph rendering

Proposed approach

The idea is to introduce a storage-neutral JSON graph-data API in front of the existing RRD path, then add an optional ECharts renderer that consumes it, then optionally add VictoriaMetrics as a dual-write backend later. Each step is independently useful and doesn’t require the next one.

RRD remains the default throughout. Existing /graph.php and API image endpoints are never touched.

What this does NOT change

RRD remains the default metric store and rendering path
Existing image graph endpoints (/graph.php, /api/v0/.../port_bits, etc.) are unchanged
VictoriaMetrics is entirely optional — no new services required by any early PR
ECharts is opt-in behind a feature flag (graphs.renderer = 'echarts'; default: 'rrd')
No existing behavior changes unless an operator explicitly enables a flag

Proposed PR sequence

Each PR is independently mergeable. None depends on the next.

RRD-backed JSON for device_poller_perf — new /api/v0/devices/{hostname}/graphs/device_poller_perf/data endpoint returning normalized JSON; no renderer changes, no new services
Optional ECharts renderer for device_poller_perf — feature-flagged front-end renderer; RRD image rendered by default
Port bits graph parity — extend the JSON API and ECharts renderer to port_bits
VictoriaMetrics write datastore — optional dual-write adapter; no reads yet
VictoriaMetrics query adapter — backend swap behind the same JSON API for graphs that already have JSON endpoints
Port packets / errors / discards
Health and wireless sensor graphs
Billing ECharts renderer
Application and custom graph helpers

Example JSON payload

For device_poller_perf (one of the simplest device graphs — a good proof-of-concept target):

{
  "status": "ok",
  "graph": {
    "id": "device_poller_perf:123",
    "type": "device_poller_perf",
    "title": "Poller Performance",
    "subtitle": "router1.example.net",
    "unit": "seconds",
    "from": 1778540000,
    "to": 1778543600,
    "step": 300,
    "series": [
      {
        "name": "Poller time",
        "key": "poller_time",
        "type": "line",
        "unit": "seconds",
        "data": [[1778540000000, 12.5]],
        "style": { "area": true, "stack": null },
        "stats": { "min": 10.1, "max": 15.7, "avg": 12.8, "last": 12.5 }
      }
    ],
    "markers": [],
    "meta": { "source": "rrd", "fallback_used": false }
  }
}

The series data array uses [unix_ms, value] tuples — compatible with ECharts and most charting libraries out of the box.

Proposed class structure (Stage 1)

New namespace LibreNMS\Graph\:

LibreNMS/Graph/
  GraphQuery.php            — time range + graph type + device context
  GraphSeries.php           — one data series (name, type, unit, data points, stats)
  GraphDataResult.php       — wraps title/subtitle/unit/series/markers
  RrdGraphDataProvider.php  — reads RRD, builds GraphDataResult
  Definitions/
    Device/
      PollerPerfGraph.php   — graph-specific RRD fetch spec

Route: GET /api/v0/devices/{hostname}/graphs/{graph_type}/data

Handler: get_device_graph_data() in includes/html/api_functions.inc.php (following get_bill_graphdata() precedent)

Questions for maintainers

Is the proposed route shape (/api/v0/devices/{hostname}/graphs/{graph_type}/data) acceptable, or is there a preferred convention?
Is LibreNMS\Graph\ an acceptable new namespace, or would you prefer another location?
Should the handler go in api_functions.inc.php (existing pattern) or in a new dedicated controller?
Any concerns about introducing ECharts as an optional JS dependency (lazy-loaded, zero bundle impact when flag is rrd)?
Any naming concerns before the first code PR?

The goal is to make the graph-data contract storage-neutral without disrupting any existing behavior. Happy to adjust the approach based on feedback before writing Stage 1 code.

tristanbob · 20 May 2026 07:50

The top graphs uses RRD, the bottom graph uses echarts.
I wanted to get one chart working close to the existing RRD design before working on others.
I’ll keep posting my updates as I progress.

laf · 20 May 2026 09:16

v0 api will be marked as deprecated in the not to distant future, a v1 api pull request is in the works and will be the path forward so I wouldn’t base your code on v0.

LibreNMS\Graph is probably ok but that can be easily changed so I wouldn’t worry about it for now.

No issues with echart as long as it’s performant.

murrant · 20 May 2026 16:08

The main challenge is changing things without breaking any existing installs.

The metadata for current rrd metrics is atrocious. I don’t see much here about metadata management.

There are a LOT of graphs in LibreNMS, it will not be possible to move them all in a reasonable timeframe.

The webui in LibreNMS typically does not use the API to fetch data. Perhaps v1 could make this possible.

tristanbob · 20 May 2026 17:27

Thanks for the replies and information.
I will look for the v1 API to make sure I submit my PRs using that version.

My goal is to make this change as pain-free as possible, both for PR reviewers and for end-users.
Users can continue using RRD for everything, or they can enable a feature to use the new graphs. This will only effect graphs that have been updated for the new system, so we can slowly migrate graphs in phases. Data for these graphs will still come from RRD (via the new API).

The design of these new graphs will initially be made to match RRD as close as possible, but we have the flexibility to change this at some point in the future.

Lastly, users can choose for the graph data to come from a different backend (instead of RRD), such as VictoriaMetrics.

What do you mean by metadata management? Do you have goals, suggestions, advice for this?

murrant · 21 May 2026 14:30

Individual small changes is my suggestion. Identify foundational problems that need to change first. Also there is a change to our rrdtool execution pending that I hope to merge this release.

I mean how do you identify a metric? Typically metrics are port.in_errors or something like that, but you have thousands of that metric in a typical install. How do you identify a subset of those metrics? Currently this is inconsistent and unorganized and some even have high cardinality.

Last note. LibreNMS should be opinionated, we want users seeing metrics without making choices as much as possible.

tristanbob · 21 May 2026 21:27

I’ve made progress on my echarts integration (it reads data from RRD)

tristanbob · 22 May 2026 16:11

Thanks for this advice!
My plan is to build the full vertical slice of the new data flow and iterate on that design using the lessons learned along the way.
Once I (and my agents) are happy, we will design an initial PR that has the minimal scope required.

tristanbob · 22 May 2026 16:13

Here is my current branch, if you want to follow the process:

murrant · 27 May 2026 01:51

is victoriametrics datastore just a clone of prometheus?
Too much boiler plate in the graph definitions, they should be dead simple to define (but still allow for complex code when needed).
Add your own copyright on files
we cannot use database id fields as tags unless we implement soft deletes more widely across LibreNMS.
it is a also a good idea to minimize data copying/transforms if we can. Ideally only once at most. this is just a note, not a specific observation.

Focus on a few things so you can keep iterating. Remember we want the best implementation, not the first

tristanbob · 27 May 2026 04:32

Thank you for this feedback!
To be honest, I got really far but realized I was making a monster (the UI worked really well but the code was overly complex).
I’ll start bringing things back to reality and working towards the goals and priorities you have described.
Cheers!

murrant · 27 May 2026 04:40

yep, I’ve worked on this a few times.
Like I said before identify something you can upstream. Small bites.

One thing I think would help a lot is normalizing metrics and tags for those metrics. Right now RRD is a little difficult to “retrieve” tag data from as the encoding is terse and lossy.

Having a prototype no matter how unwieldy helps think about things and expiriment.

tristanbob · 30 May 2026 23:59

Here is a video demonstrating LibreNMS with dynamic graphs powered by ApacheECharts with an optional backend datasource of VictoriaMetrics.

This integration works well, but the code needs to be prepared in a way that can be accepted by the LibreNMS maintainers. I will try my best to make that happen!

tristanbob · 1 June 2026 21:51

Here is the branch I’m using as the “goal” for my PRs. (Note: This branch is never meant to be merged.) I use it as a source to draft the incremental PRs that head towards it.

Here are some docs to help understand what this includes:

github.com/tristanbob/librenms

doc/Developing/Graph-Data-Architecture.md

goal

# Graph Data Architecture

LibreNMS separates graph concerns into three layers: definitions describe what a
graph means, backends fetch metric samples, and the renderer draws the result.
This separation means a graph definition written once works with any supported
metric backend and any renderer.

---

## Definition Layer

Every graph type implements `GraphDefinition`. A definition is a plain PHP class
with no framework dependencies; it declares what data a graph needs and how it
should be presented, without knowing how that data is fetched or drawn.

### Interface at a Glance

| Method | Returns | Purpose |
|--------|---------|---------|
| `graphType()` | `string` | Unique graph type key, e.g. `device_icmp_perf` |

This file has been truncated. show original

github.com/tristanbob/librenms

doc/Developing/VictoriaMetrics.md

goal

# VictoriaMetrics — Developer Reference

This document covers the internal design of LibreNMS's VictoriaMetrics integration
for contributors adding or extending VM support in code. For operator setup,
configuration, and the RRD migration command, see
[Extensions/metrics/VictoriaMetrics](../Extensions/metrics/VictoriaMetrics.md).

## Why VictoriaMetrics?

LibreNMS polls hundreds or thousands of devices every few minutes, producing a high
write throughput of small samples. VictoriaMetrics was chosen over alternatives
(InfluxDB, Prometheus, TimescaleDB) for the following reasons:

- **High-throughput ingestion with low memory overhead.** VM sustains millions of
  samples per second on modest hardware and uses significantly less RAM than
  Prometheus or InfluxDB under equivalent write load, because its storage engine
  defers most work to background merge passes rather than holding large in-memory
  indexes.
- **Compact on-disk storage.** VM's two-level compression scheme typically reaches
  ~0.4 bytes per sample on real-world network monitoring data — roughly 10× smaller

This file has been truncated. show original

tristanbob · 4 June 2026 00:54

I published the repo that I use to work on LibreNMS:

This repo has a single ./start-dev-server.sh command that builds and starts a complete local stack:

LibreNMS web server
MariaDB
SNMP test device
poller loop
VictoriaMetrics
vmagent
seeded admin user
seeded graph data

willhseitz · 9 June 2026 20:34

Holy crap, this is awesome! Keep up the great work and hoping to see these changes in release!

tristanbob · 9 June 2026 21:56

Thank you! I am working with the maintainers on an upgrade of the graphs page to modernize it to Laravel and that is the first thing they requested me to do. We are also waiting for the v1 API to be finalized so this change will be compatible with it.