Releases: netdata/netdata
v2.6.0
Table of Contents
Release Summary
This release brings AI-powered monitoring intelligence and expanded platform support to all Netdata users.
Feature | What's New |
---|---|
AI Integration | • MCP server support enables AI assistants to query your infrastructure • Natural language questions in AI Insights ("What went wrong at 3 PM?") |
Enterprise Integration | • Full SCIM group provisioning for Okta • Automatic space/room access based on Okta groups |
Network Monitoring | • SNMP profile-based collection with 100+ device profiles (alpha) • Auto-detection for Cisco, Palo Alto, F5, and more |
Platform Expansion | • Native packages for RHEL 10, AlmaLinux 10, Rocky Linux 10 • systemd journal support for static builds via Rust implementation |
Release Highlights
Model Context Protocol (MCP) Server Integration
Every Netdata Agent and Parent now functions as an MCP server, enabling AI assistants like Claude Desktop to query and analyze your infrastructure monitoring data through a built-in WebSocket interface.
What MCP Enables
AI assistants gain read-only access to your monitoring data:
- Infrastructure Discovery: Hardware specs, OS details, and streaming topology
- Metric Intelligence: Full-text search across all contexts, instances, dimensions, and labels
- System Insights: Execute functions for processes, network connections, systemd journals, and Windows events
- Alert Analysis: View real-time alerts and complete alert history
- Advanced Analytics: Complex metric queries with ML-powered anomaly detection
- Root Cause Analysis: Correlate metrics and anomaly scores to identify issues
Security First
- Sensitive functions (logs, process monitoring) require temporary API keys
- Existing Netdata permissions control all data access
- WebSocket connections need explicit configuration in AI clients
Scalable Visibility
AI assistant visibility scales with your connection point:
Connection Point | Visibility Scope |
---|---|
Netdata Child/Standalone | Single node only |
Netdata Parent | Parent + all connected children |
Netdata Cloud | Full infrastructure (coming soon) |
AI Insights: Enhanced with Natural Language Investigation
AI Insights now understands your questions. Simply ask "What went wrong yesterday at 3 PM?" and get a comprehensive report targeting your specific concern—no more manual metric correlation or dashboard hunting during incidents.
Available Reports
Report Type | Analysis Period | Answers Questions Like |
---|---|---|
Infrastructure Summary | 24 hours - 1 month | "How healthy is my infrastructure?" |
Capacity Planning | 3 months - 2 years | "When will I run out of resources?" |
Performance Optimization | 24 hours - 1 quarter | "Where are my bottlenecks?" |
Anomaly Analysis | 6 hours - 7 days | "What caused the outage?" |
Investigation (NEW) | Custom timeframe | "Why did latency spike at 3 PM?" |
Alert Troubleshooting | Real-time | "How do I fix this alert?" (Preview) |
What's New
- Natural Language Queries: Ask questions in plain English about any timeframe or issue
- Targeted Analysis: Get reports focused on your specific problem, not generic overviews
- Alert Resolution Guidance: Coming soon—automated investigation of active alerts with fix recommendations
Privacy and Limits
- Reports are generated on-demand and immediately disposed
- Your infrastructure data is never used for AI training
- All reports share the monthly limit of 10 reports
Note
Alert Troubleshooting is currently in preview and will be gradually rolled out to all users.
Okta Integration: Full SCIM Group Provisioning Support
The Okta integration now supports complete SCIM group provisioning, enabling automatic synchronization of both users and groups between Okta and Netdata Cloud.
What's New
Previously limited to user provisioning, the integration now includes:
Capability | Before | Now |
---|---|---|
User Provisioning | ✅ Create, update, deactivate users | ✅ Create, update, deactivate users |
Group Sync | ❌ Manual group management | ✅ Automatic group synchronization |
Space/Room Access | ❌ Manual assignment | ✅ Auto-assignment based on Okta groups |
Automated Access Management
When you add or remove users from groups in Okta, these changes instantly reflect in Netdata Cloud. This enables powerful automation scenarios:
- Assign users to specific Netdata spaces based on their Okta department groups
- Grant room access automatically based on team membership
- Revoke access immediately when users leave groups
Learn how to configure SCIM group provisioning in our documentation or explore the Netdata integration in Okta's marketplace.
Automated SNMP Monitoring with Device Profiles
Netdata v2.6.0 adds SNMP profile-based collection (alpha), transforming complex SNMP monitoring into a plug-and-play experience. The profile system makes enterprise network monitoring accessible to everyone, from home labs to data centers, with the simplicity Netdata is known for.
Getting started is simple:
- Existing users: Profiles are automatically enabled and your devices will be detected and monitored with no additional configuration
- New users: Just configure SNMP credentials, and Netdata handles the rest
Important
As an alpha release, expect rapid improvements and possible profile format changes in future versions.
What's New
Before | Now with Profiles |
---|---|
Manual OID configuration | Auto-detection with 100+ device profiles |
Limited to IF-MIB metrics | Full device metrics: CPU, memory, temperature, status |
Complex setup per device | Drop-in YAML profiles |
No vendor intelligence | Vendor-specific metrics and transformations |
Fixed monitoring only | Support for custom profiles for specialized devices |
Extensive Device Coverage
Netdata ships with profiles for major network vendors, adapted from Datadog’s battle-tested definitions:
Category | Vendors Included |
---|---|
Switches & Routers | Cisco (Catalyst, Nexus, ASR, ISR), Arista, Juniper, HP/HPE, Dell, Extreme |
Firewalls | Palo Alto, Fortinet FortiGate, Cisco ASA, Checkpoint, SonicWall |
Wireless | Aruba, Cisco WLC, Ubiquiti, Alcatel-Lucent |
Load Balancers | F5 BIG-IP, Citrix NetScaler, A10 Thunder |
Infrastructure | APC UPS/PDU, Dell servers, standard MIBs (BGP, OSPF, TCP/UDP) |
Tip
This is just the beginning. We're actively expanding coverage based on user feedback. Missing metrics for your devices? Let us know!
Native Package Support for RHEL 10 and Derivatives
Netdata now provides native packages for RHEL 10, AlmaLinux 10, and Rocky Linux 10.
These packages ensure seamless integration with corporate deployment tools, automated updates, and compliance requirements typical in enterprise environments. Whether you're running RHEL 10 in production or using AlmaLinux or Rocky Linux as alternatives, you get the same reliable, optimized Netdata experience.
Rust-Based systemd-journal Plugin for Static Builds
Static build users can now access sy...
v2.5.4
Netdata v2.5.4 is a patch release to address issues discovered since v2.5.3.
This patch release provides the following bug fixes and updates:
- Improved label sanitization in Go plugins by removing null bytes from values (commit, @ilyam8)
- Improved Go plugin startup performance by loading SNMP profiles only when used instead of all at startup (commit, @ilyam8)
- Added
-NoProfile
parameter to Windows installer PowerShell execution for cleaner environment setup (#20550, @thiagoftsm) - Optimized memory usage by switching label structures to use the ARAL allocator and reducing memory footprint (#20502, @stelfrag)
- Fixed CPU architecture matching for Go plugin builds in 32-bit static builds (#20502, @Ferroin)
- Fixed Redis collector to properly maintain TLS configuration for
rediss
connections (#20478, @ilyam8) - Fixed Go weblog collector to exclude HTTP 429 status codes from 4xx error category (#20443, @Slind14)
- Fixed registry save operation by correcting integer overflow issues and adding exponential backoff for failed save attempts (#20437, @ktsaou)
- Improved agent shutdown responsiveness by reducing streaming connection timeout from 1000ms to 250ms (#20434, @stelfrag)
- Fixed database statement handling with improved thread cleanup and validation before finalization (#20433, @stelfrag)
- Fixed memory corruption issue in query progress updates by preventing access to freed web client structures (#20431, @ktsaou)
- Added vendored Protobuf and Abseil libraries to static builds with necessary patches for cross-platform compatibility (#17774, @Ferroin)
Support options
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
- Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
- GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
- GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
- Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
- Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!
v2.5.3
Netdata v2.5.3 is a patch release to address issues discovered since v2.5.2.
This patch release provides the following bug fixes and updates:
- Fixed context update handling by adjusting conditions for hub queue management (#20416, @stelfrag)
- Added ability to debug individual jobs in go.d.plugin instead of all jobs within a module (#20394, @ilyam8)
- Added debug logging for HTTP response validation in go.d.plugin HTTP check collector (#20392, @ilyam8)
- Fixed duplicate name handling in go.d.plugin dynamic configuration userconfig action (#20346, @ilyam8)
- Fixed Oracle database collector to correctly calculate tablespace usage percentages and prevent negative values (#20373, #20378, @ilyam8)
- Fixed database engine performance by optimizing file rotation and indexing operations with better job scheduling and concurrency handling (#20354, @stelfrag)
Support options
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
- Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
- GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
- GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
- Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
- Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!
v2.5.2
Netdata v2.5.2 is a patch release to address issues discovered since v2.5.1.
This patch release provides the following bug fixes and updates:
- Fixed crash by preventing dynamic configuration initialization for virtual nodes (#20324, @ilyam8)
- Updated eBPF library dependency to version 1.5.1 (#20316, @thiagoftsm)
- Fixed dynamic configuration issue that incorrectly assigned plugin configurations to virtual nodes instead of the localhost context (#20312, @ktsaou)
- Changed user transition log messages from debug to info level (#20308, @ilyam8)
- Fixed memory issue by preventing use-after-free when accessing parent information (#20305, @ktsaou)
- Fixed use-after-free memory issue in plugins.d inflight function handling (#20304, @ktsaou)
- Fixed metadata synchronization shutdown to proceed even when event loop command submission fails (#20303, @stelfrag)
- Fixed SNMP collector to properly format system information (#20293, #20301, @ilyam8)
- Fixed database maintenance scheduling to properly sequence journal indexing after file rotation operations (#20264, @stelfrag)
- Fixed various minor issues including improved shutdown logging, division by zero protection, and updated dimension status messages (#20263, @stelfrag)
- Improved MSSQL collector performance by moving database queries to a separate thread (#20230, @thiagoftsm)
Support options
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
- Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
- GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
- GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
- Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
- Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!
v2.5.1
Netdata v2.5.1 is a patch release to address issues discovered since v2.5.0.
This patch release provides the following bug fixes and updates:
- Fixed obsolete chart cleanup to properly handle virtual nodes (#20254, @ilyam8)
- Fixed SNMP collector to use 32-bit counters for network interfaces when 64-bit counters aren't available (#20249, @ilyam8)
- Fixed SNMP collector to fall back to interface description (ifDescr) when interface name (ifName) is empty (#20248, @ilyam8)
- Fixed SNMP discovery by correcting SNMPv3 credential parameter names to match expected values (#20247, #20256, @ilyam8)
- Fixed compilation on older distributions by removing uv_sleep function call that isn't available in older libuv versions (#20243, @stelfrag)
- Fixed claiming in Docker by improving detection of localhost environments and providing correct claim command instructions (#20240, @stelfrag)
- Added user configuration option to override default thread stack size (#20236, @stelfrag)
- Fixed CouchDB collector to use correct units (bytes instead of KiB) for database size charts (#20235, @ilyam8)
Support options
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
- Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
- GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
- GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
- Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
- Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!
v2.5.0
Table of Contents
Release Summary
Netdata v2.5.0 continues our commitment to stability with significant improvements to system robustness. This release focuses on eliminating potential crashes, resolving memory issues, and enhancing thread management across the codebase. We've implemented comprehensive deadlock detection, improved resource cleanup procedures, and added protection against corrupted data files.
Acknowledgments
- @barracuda156 for fixing compilation on macOS versions earlier than 11.
- @luiizaferreirafonseca for fixing grammar in the main README file.
- @rhoriguchi for fixing filtering of systemd-nspawn container payload in cgroups monitoring.
Contributions
Collectors
Improvements
- Added default filtering for systemd-nspawn container payload in cgroups monitoring (#20155, #20168, @ilyam8, @rhoriguchi
- Added per-database lock metrics to Windows MSSQL collector (#20141, @thiagoftsm)
Other
- Reorganized code in Windows plugin IIS module for better maintainability (#20182, @thiagoftsm)
- Added initial work-in-progress implementation of Netdata exporter for OpenTelemetry (#20171, #20199, @ilyam8)
- Removed legacy code that handled the WMI to Windows collector renaming in Go module configurations (#20166, @ilyam8)
- Cleaned up SNMP collector by removing unused code from vendored Datadog profile components (#20164, @ilyam8)
- Added detailed UPS response logging in debug mode for APC UPS collector (#20157, @ilyam8)
- Improved test coverage for OpenTelemetry journald exporter remote client functionality (#20143, @ilyam8)
- Added metric descriptions and proper unit definitions to SNMP collector profiles for improved chart rendering (#20100, #20163, @Ancairon)
Packaging/Installation
All changes
Documentation
All changes
- Added documentation for centralizing and managing namespaced logs (#20217, @ktsaou)
- Improved security and privacy design documentation (#20208, @kanelatechnical)
- Added comprehensive documentation for the Dynamic Configuration system, including component usage guidelines and developer information (#20187, #20232, @ktsaou)
- Improved systemd journal logs documentation (#20184, @kanelatechnical)
- Updated platform support documentation to reflect current compatibility with the latest FreeBSD and macOS versions (#20165, @ilyam8)
- Improved dashboard and charts documentation with better formatting, consistent language, and enhanced visual elements for easier navigation (#20162, @kanelatechnical)
- Fixed grammar and improved clarity in the main README file (#20144, @luiizaferreirafonseca)
- Changed installation documentation to use proper admonition syntax for informational blocks (#20136, @kanelatechnical)
- Changed deployment documentation title from singular to plural form (#20133, @kanelatechnical)
- Updated installation documentation with improved structure, user-friendly language, visual aids, and proper Docusaurus syntax (#20122, @kanelatechnical)
Other Notable Changes
Bug Fixes
- Fixed potential crashes by adding null pointer checks when accessing journal and data files (#20226, @stelfrag)
- Added local collection of analytics data to support API information requests, while still respecting telemetry preferences for external reporting (#20221, @stelfrag)
- Fixed exporting engine issues including crash on shutdown in static builds and timeout handling when waiting for threads to exit (#20212, @ktsaou)
- Fixed thread allocation to consider system memory constraints, preventing crashes during startup on systems with high CPU counts but limited RAM (#20192, @ktsaou)
- Fixed potential crash during thread termination in exporting engine (#20191, @ktsaou)
- Fixed signal handling to ignore maintenance signals during shutdown process to prevent conflicts (#20190, @ktsaou)
- Fixed potential crash when handling repeating alerts that were not properly queued (#20186, @stelfrag)
- Fixed race condition when logging pending messages by ensuring atomic operations (#20185, #20188, #20189 @ktsaou)
- Fixed health configuration schema parameter for database lookup
absolute
option to prevent UI validation failures (#20161, @ilyam8) - Fixed multiple memory issues including optimized context queues, buffer overflow protection, thread synchronization for metadata transitions, improved dictionary cleanup, and proper ML model resource management (#20159, @ktsaou)
- Fixed label memory accounting to prevent negative values in memory tracking (#20158, @stelfrag)
- Fixed crash in Windows MSSQL collector during performance data processing (#20131, #20032 @thiagoftsm)
- Fixed database engine startup to safely handle corrupted journal files by skipping them during metrics registry population (#20128, @stelfrag)
- Fixed memory leak by properly freeing ACLK message payloads when MQTT connection is unavailable (#20125, @stelfrag)
- Fixed memory leaks and improved cleanup procedures across multiple modules, including plugins.d threads, diskspace plugin, and pattern arrays (#20120, @ktsaou)
v2.4.0
Table of Contents
Release Summary
Netdata v2.4.0 is a stability-focused release that addresses many issues that were identified thanks to the new agent reporting system introduced in v2.3.0. This release significantly improves reliability by fixing multiple crash scenarios and memory leaks throughout the codebase.
Key Highlights
Category | Improvements |
---|---|
Memory Optimization | • Resolved significant memory leaks in container monitoring systems, particularly affecting Kubernetes deployments • Fixed memory leaks across database engine components, health alarm entries, and alert pattern matching • Improved SQLite memory management with maximum heap limits and dynamic memory release under system pressure |
Stability Improvements | • Fixed numerous crashes in the Windows performance counters handling and container monitoring systems • Improved error handling when dbengine files reside on disks with errors • Enhanced journal file handling with better error logging • Optimized shutdown sequences to prevent resource leaks and crashes • Fixed ACLK synchronization issues to properly handle dynamic host configuration changes |
New Features | • Windows Service Monitoring: Added capability to track running states (running, stopped, pending, paused) of Windows services through the windows.plugin/PerflibServices collector (disabled by default, requires manual activation) |
Acknowledgments
- @dave818 for fixing a cron job syntax error in the updater script by correcting the time format.
- @ycdtosa for adding missing --offline-install-source option documentation to kickstart script usage information, adding Synology-specific user and group creation commands to kickstart script for improved DSM compatibility, and updating Synology installation documentation to clearly differentiate steps required for older DSM versions.
Contributions
Collectors
Improvements
- Added Windows service monitoring to track running states including running, stopped, pending, and paused services (windows.plugin/PerflibServices) (#19990, @thiagoftsm)
Bug fixes
- Fixed Prometheus collector to use appropriate units instead of "ratio" for measurements (go.d/prometheus) (#20069, @ilyam8)
- Fixed crash in Windows Hyper-V collector caused by unpopulated shared buffer values (windows.plugin/PerflibHyperV) (#20060, @thiagoftsm)
- Fixed MegaCLI collector to properly handle adapter configurations with no connected drives (go.d/megacli) (#20046, @ilyam8)
Other
- Added socket and remote client capabilities to OpenTelemetry journald exporter (#20038, #20033, #20121, @ilyam8)
- Added hostname labels to virtual nodes in Go-based collectors (#20030, @ilyam8)
- Added preliminary support for custom YAML files in SNMP collector that will be used for single metrics in future releases (go.d/snmp) (#20020, @Ancairon)
Packaging/Installation
All changes
- Fixed cron job syntax error in updater script by correcting the time format (#20039, @dave818)
- Added missing --offline-install-source option documentation to kickstart script usage information (#20025, @ycdtosa)
- Added Synology-specific user and group creation commands to kickstart script for improved DSM compatibility (#20024, @ycdtosa)
- Added Docker tag rotation system to track the four most recent nightly builds with relative numeric identifiers (#19734, #20089 @Ferroin)
Documentation
All changes
- Improved clarity, structure, and examples throughout the Alerts & Notifications documentation (#20085, @kanelatechnical)
- Updated documentation to provide clearer guidance on transitioning to static builds for end-of-life platforms (#20075, #20110 @ralphm)
- Added documentation for the
remove-stale-node
command in the Nodes Ephemerality guide (#20057, @ralphm) - Fixed code block formatting in Log2Journal documentation to comply with MDX 3 requirements (#20056, @Ancairon)
- Simplified OIDC configuration by removing parameters no longer needed after adding Discovery support (#20053, @juacker)
- Improved documentation for observability centralization, including streaming, replication, and node management, with clearer language and structure (#20052, #20073 @kanelatechnical)
- Removed on-premises documentation files relocated to a dedicated repository (#20023, @Ancairon)
- Improved Windows installer and Machine Learning documentation with simpler language and better organization (#20021, @kanelatechnical)
- Improved deployment guides with clearer explanations of standalone installations and centralization options (#20004, @kanelatechnical)
- Updated Synology installation documentation to clearly differentiate steps required for older DSM versions (#19989, #19993, #20010 @ycdtosa)
- Improved installation documentation with more concise instructions for macOS, offline installation, IPv4 configuration, native packages, and Docker deployment (#19987, @kanelatechnical)
- Improved installation documentation for Ansible, Azure, AWS, Kickstart script, and Kubernetes deployments with better organization and clarity (#19981, @kanelatechnical)
- Fixed documentation order to provide a more logical top-to-bottom reading flow in kickstart installation guide (#19975, @kanelatechnical)
- Updated SCIM documentation to include new Groups support functionality (#19969, @juacker)
Other Notable Changes
Bug Fixes
- Fi...
v2.3.2
Netdata v2.3.2 is a patch release to address issues discovered since v2.3.1.
This patch release provides the following bug fixes and updates:
- Fixed journal file creation reliability with improved error handling and simplified allocation process (#20018, @ktsaou)
- Fixed leakage of build environment identifiers by blacklisting GitHub runner machine IDs (#20016, @ktsaou)
- Fixed potential memory access violations by adding validation for journal file headers and page boundaries (#20013, @stelfrag)
- Fixed a rare crash condition by properly reinitializing data collection for obsolete or archived dimensions (#20007, @ktsaou)
- Fixed MegaCLI collector to properly handle missing battery backup units (#20008, @ilyam8)
- Changed UUID generation to use version 4 format for better uniqueness (#20002, @ktsaou)
- Added detection for additional CI environment variables to automatically disable telemetry (#19999, @ktsaou)
- Fixed Agent status system to handle null UUIDs and improved tracking of shutdown time, crash counts, and connection states (#19996, #20003, #20011 @ktsaou)
- Added detailed worker thread status information and enhanced crash diagnostics capabilities (#19992, @ktsaou)
- Fixed potential crash in Windows perflib collector when handling null pointers (#19985, @ktsaou)
- Fixed error reporting to preserve errno values during out-of-memory conditions (#19984, @ktsaou)
- Fixed potential crashes when handling empty data arrays (#19983, @ktsaou)
- Fixed Agent shutdown by properly joining ACLK and metadata threads before closing database connections (#19980, @stelfrag)
- Fixed random crashes during shutdown by avoiding precompiled database statements for host metadata (#19978, @stelfrag)
- Limited maximum database file size to 1GB to optimize memory usage during file operations (#19977, @stelfrag)
- Fixed crash in variable lookup function when processing search results with scores (#19972, @ktsaou)
- Fixed ACLK synchronization thread shutdown with better termination sequence and timeout handling for stuck operations (#19966, @stelfrag)
- Improved Parent node startup performance by preloading UUIDs into metrics registry for faster initialization (#19964, @ktsaou)
- Fixed Windows installer to properly manage configuration files and handle upgrades correctly (#19962, @thiagoftsm)
- Fixed potential crash in health alarm cleanup when unlinking alerts from charts (#19956, @stelfrag)
- Fixed buffer overflow when processing cloud rooms during Agent claiming on startup (#19954, @stelfrag)
- Updated Agent status reporting system with enhanced crash diagnostics, anonymized stack traces, and ACLK connection status tracking (#19953, #19957, #19959 @ktsaou)
- Fixed thread creation issues by adding retry logic when system resource limits are temporarily reached (#19951, @stelfrag)
- Added monitoring of IIS Application Pool metrics to Windows collector (#19950, @thiagoftsm)
- Improved metadata thread stability with better shutdown handling and enhanced event loop management (#19929, @stelfrag)
- Fixed potential deadlocks by processing alert configuration database operations asynchronously through the metadata thread (#19885, @stelfrag)
- Reworked shared memory management in eBPF plugin for more reliable interprocess communication (#19844, @thiagoftsm)
Support options
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
- Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
- GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
- GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
- Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
- Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!
v2.3.1
Netdata v2.3.1 is a patch release to address issues discovered since v2.3.0.
This patch release provides the following bug fixes and updates:
- Fixed debug information handling by including it in default builds while disabling separate debuginfo packages for Debian-based distributions (#19946, #19948 @Ferroin)
- Fixed static build configuration to avoid unnecessary libunwind compilation (#19939, @Ferroin)
- Improved detection of low memory conditions with more aggressive monitoring (#19938, @ktsaou)
- Fixed installation path for updater script crontab configuration (#19935, @ralphm)
- Fixed validation of database page size limits for 32-bit compression format (#19932, @stelfrag)
- Fixed compilation issues when building without database engine support or with address sanitizer enabled (#19930, @stelfrag)
- Added additional system resource metrics to status file including memory usage and enhanced out-of-memory protection information (#19928, #19937 @ktsaou)
- Fixed security issue by preventing exposure of absolute file paths in web server responses (#19925, @ktsaou)
- Fixed security vulnerability in daemon status file handling by using file descriptor-based permissions to prevent race conditions (#19924, @Ferroin)
- Removed insecure SVG generation endpoint to prevent potential code injection vulnerabilities (#19919, @ilyam8)
- Fixed unaligned memory access in socket message buffer by properly aligning memory structures (#19917, @vkalintiris)
- Fixed ACLK synchronization by ensuring thread initialization completes before proceeding with startup (#19916, @stelfrag)
- Fixed issue where commands could be queued before ACLK initialization was complete (#19914, @ktsaou)
- Fixed potential crash when database engine encounters null data files during range operations (#19913, @ktsaou)
- Fixed Agent status reporting to handle first-run scenarios when no previous status file exists (#19912, @ktsaou)
- Added initial implementation of libbacktrace for improved crash diagnostics (#19910, @ktsaou)
- Fixed reliability calculation to properly handle normal Agent exit cases (#19909, @ktsaou)
- Added enhanced shutdown diagnostics with timeouts and improved system information in crash reports including cloud provider details (#19903, @ktsaou)
Support options
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
- Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
- GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
- GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
- Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
- Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!
v2.3.0
Table of Contents
Netdata Growth
- 1.5 million downloads per day
- 73.7k GitHub stars!
- 656.1M Docker Hub pulls!
Netdata continues to experience phenomenal growth, with over 1.5 million downloads daily through Cloudflare and Docker Hub, fueling user observability worldwide.
Thanks to your unwavering support ❤️, Netdata is the leader in the observability category in the CNCF landscape, ahead of all other solutions, including Elasticsearch, Grafana, and Prometheus, in GitHub stars. This demonstrates the trust and admiration of our community.
This success drives rapid adoption among enterprises, reflecting the growing recognition of Netdata as the go-to observability solution for both cloud-native and on-premises environments. Our commitment remains steadfast: to deliver cutting-edge, AI-powered observability with unmatched performance and simplicity—all while being significantly more affordable.
We are also proud to see our users and customers experience high-scale setups, achieving reliable multi-million samples/s setups, effortlessly, streamlining their operations with Netdata.
As we evolve, our focus on empowering businesses with higher-fidelity AI insights ensures Netdata remains the easiest and fastest way to optimize infrastructure and applications at any scale. 🚀
Do you like Netdata? Give Netdata a ⭐ too, on GitHub!
Release Summary
Netdata 2.3 delivers significant enhancements to monitoring reliability and scalability:
- Crash Handling & Reporting: A zero-sampling system that captures and analyzes agent crashes with complete diagnostic information, significantly improving reliability across diverse environments.
- Extreme Cardinality Protection: Automatic safeguards that maintain performance in high-scale environments with millions of time series while intelligently managing metadata retention.
- Nodes Ephemerality & Streaming Alerts: A sophisticated approach to handling node connections in distributed environments, reducing alert noise by distinguishing between permanent and ephemeral nodes.
- SNMP Service Discovery: A new system automatically finds and monitors SNMP-enabled devices on configured networks, eliminating manual configuration.
Release Highlights
Nodes Ephemerality & Streaming Alerts
Netdata 2.3 implements a more sophisticated approach to handling node connections in distributed environments. We now define ephemeral nodes as "nodes that are expected to disconnect without raising alerts", enabling smarter monitoring of dynamic infrastructure.
Feature | Description |
---|---|
Smart Node Classification | Distinguish between permanent infrastructure (servers) and ephemeral resources (containers, auto-scaling instances) |
Targeted Alerting | Disconnection alerts trigger only for permanent nodes, reducing alert noise and focusing attention on genuine issues |
Dynamic Infrastructure Support | Configure auto-scaling cloud instances, containers, and test environments as ephemeral to prevent unnecessary alerts |
Simple Configuration | Mark nodes as ephemeral with a single setting in netdata.conf: is ephemeral node = yes |
Automated Cleanup | Configurable retention periods to automatically remove disconnected ephemeral nodes from dashboards |
Selective Cloud Notifications | Netdata Cloud now sends node-unreachable notifications exclusively for permanent nodes |
Node Management CLI | Use netdatacli mark-stale-nodes-ephemeral to clear alerts for permanently offline nodes |
Learn more about managing ephemeral nodes.
Extreme Cardinality Protection
Netdata 2.3 introduces automatic protection against extreme cardinality issues when combining high-dimensional metrics with long retention periods. This system:
Feature | Description |
---|---|
Intelligent Detection | Automatically identifies contexts with excessive ephemeral metrics (≥1000 instances with >50% ephemerality) |
Balanced Protection | Preserves all actively collected metrics while selectively clearing retention for ephemeral ones |
Resource Optimization | Prevents memory bloat and performance degradation from abandoned time-series metadata |
Configurable Thresholds | Adjustable settings for instance count and ephemerality percentage to match your environment |
Transparent Operation | Detailed logging of all protection activities for easy monitoring and verification |
This protection maintains Netdata's performance even in high-scale environments with millions of time series, while still allowing unlimited cardinality for high-resolution data. Learn more about configuring this feature.
Crash Handling & Reporting
We've implemented a powerful, zero-sampling crash monitoring system that captures and analyzes agent restarts and crashes with complete diagnostic information. This solution leverages systemd's journal for flexible, scalable event tracking without additional licensing costs. With anonymous telemetry enabled, this system helps us identify critical issues across diverse environments, significantly improving Netdata's reliability for all users. Read more about our approach in this blog post.
Feature | Description |
---|---|
Zero-Sampling Collection | Captures every single crash event without sampling, providing complete visibility into system behavior |
Comprehensive Diagnostics | Records detailed stack traces, error messages, and system context for accurate root cause analysis |
Efficient Deduplication | Intelligent system that prevents redundant reporting (only one crash type per agent per day) |
Privacy-Focused | No IP addresses collected, only anonymous telemetry with user opt-out option |
Lightweight Implementation | Minimal performance impact, only activates when Agent starts, stops, or crashes |
Cost-Effective Architecture | Leverages existing systemd journal infrastructure instead of expensive third-party solutions |
High Scalability | Processes up to 20,000 events per second per instance with horizontal scaling capability |
Flexible Analysis | Transforms complex JSON data into flattened journal entries for powerful filtering and correlation |
Proven Results | Already identified and resolved dozens of critical issues across diverse environments |
SNMP Discovery
Netdata 2.3 adds an SNMP service discovery system that automatically finds and monitors SNMP-enabled devices on your networks.
Feature | Description |
---|---|
Automated Device Detection | Scans configured networks to discover SNMP-enabled devices without manual configuration |
Flexible Network Configuration | Supports various IP range formats including single IPs, ranges, and CIDR notation (up to 512 IPs per subnet) |
Customizable Credentials | Configure multiple credential sets with support for SNMPv2c and SNMPv3 with various security levels |
Performance Optimization | Controls network impact through concurrent scan limits and configurable caching of discovery results |
Seamless Integration | Automatically... |