Rrdtool getting NaN only on recently added devices?


#1

Hello All,

Facing a weird issue here and hoping to get help. My rrdtool is only storing NaN for any devices i have added in the last few months, devices prior to that(almost 80) are working fine still to this day. I have looked at permissions and config and i cannot justify how this is happening?? When i go into RRD, go to a device i recently added which I know is pushing 5+ gb of traffic, and run rrdtool dump to get an xml I see this on all the graphs…

NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN

[[email protected] librenms]# su librenms
bash-4.2$ ./validate.php

Component Version
LibreNMS 1.38
DB Schema 247
PHP 7.0.27
MySQL 5.5.56-MariaDB
RRDTool 1.6.0
SNMP NET-SNMP 5.7.2

====================================

[OK] Composer Version: 1.6.3
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
bash-4.2$


#2

When I look at the debug output for one of the devices not graphing it looks like it is properly getting the data, I will post a small snippet here without any identifying info, if someone needs more info i can share it privately

https://pastebin.com/2drfB3zF


#3

When I recall correctly I had this behavior when the volume where the RDDs are on was full.
Might the issue with your install as well.


#4

Good thought, currently not full but it does remind me to add some space to the / partiton…

[[email protected] librenms]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/cl-root 39G 32G 6.7G 83% /
devtmpfs 4.8G 0 4.8G 0% /dev
tmpfs 4.8G 0 4.8G 0% /dev/shm
tmpfs 4.8G 8.7M 4.8G 1% /run
tmpfs 4.8G 0 4.8G 0% /sys/fs/cgroup
/dev/sda1 1014M 308M 707M 31% /boot
/dev/mapper/cl-home 19G 195M 19G 2% /home
tmpfs 983M 0 983M 0% /run/user/1000
tmpfs 983M 0 983M 0% /run/user/995
[[email protected] librenms]#


#5

Run

rrrdtool update REDACTED/port-id3210.rrd N:184739983693173:1927335122941610:0:0:1077804656665:1675757568658:U:U:0:0:U:20:11630753:1434295:1733328 --daemon unix:/run/rrdcached.sock

Manually and see what the output is (change redacted to the hostname) - must be ran in the rrd dir afaik


#6

bash-4.2$ rrdtool update REDACTED/port-id3210.rrd N:184739983693173:1927335122941610:0:0:1077804656665:1675757568658:U:U:0:0:U:20:11630753:1434295:1733328 --daemon unix:/run/rrdcached.sock
bash-4.2$
Graphs are still blank.

Some additional troubleshooting info:
validate.php: https://pastebin.com/VgqzPnEc
I tried rolling back to rrdtool 1.4.8, no change, went back to 1.6.0

Looking at device poller graphs i see this: https://cdn.discordapp.com/attachments/327529583601647617/431209558246162443/unknown.png

The one on the left is working perfect, the one on the right is not.

permissions on the .11.14 rrd files: https://pastebin.com/YqqPCiGR and the 10.85 thats working https://pastebin.com/m61gzkHi

11.14 show rrd command in the web interface https://pastebin.com/tYjYrBJB

10.85 show rrd command in the web interface https://pastebin.com/4JTYf65u

this is the full debug of poller.php for 11.14 showing data being placed in the files: https://pastebin.com/JHRSK3Eb

Tried with and without rrdached, no change.

Im really at a loss here


#7

Someone else has reported similar but I’ve not seen any reason why this would be happening.

Genuinely I’m at a loss to why you’re experiencing this issue.


#8

I too am at a loss, I was really hoping to avoid having to do a fresh install due to the time requirements to get my 90+ devices loaded back into a clean install. But it sounds like thats where I am at. Any easy way to export/import device ip and community info? Otherwise I will just plan on spending a day building a new image


#9

I’m not really sure a re-install would be of any use but if you do, you may as well just dump the db and re-import it, copy the rrd + config.php. None of that should effect the rrd files being updated.


#10

Welp, on a FRESH clean install devices graph perfect. Now im going to work on reimporting the DB and rrd files and see what happens…


#11

Well, on a fresh clean install everything is graphing perfect.


#12

I have very much the same problem and it is blocking deployment.

In my instance it seemed closely associated with the installation of the check_mk agent. That may be coincidence.

This problem seems to appear in some form, in other systems (apart from librenms) that run rrdtool.

This is running in Ubuntu 18.04 - the flavour may be important - on a RHEV vm. I suggest that we

I have checked with tcpdump to see data flowing into librenms, I have checked by running the scripts manually on localhost. validate is uninformative. I have tried wiping the cache so it can be re-established and the data re-established (per the rrdtool dump) as NaN.

I definitely need some help with this,

Reinstalling from scratch when (not if) it breaks again isn’t the best option. One thing that occurs to me is that the database is invoked in part, to do this work. I may remove some (all) devices and see if it can be resync’d .


#13

So far I have had no luck but I have noticed that I cannot add a device using check_mk and not snmp.

In other words, if I add a device with the intention of getting data solely from the unix_agent, it is not possible to enable the agent for the device? This is probably a separate question, but it appeared here as I started out from the point of view that it was somehow getting confused by having both running.

Not working yet. It is applications that are broken for this installation, the system health obtained through snmp has worked… so far.


#14

Definitely, if I enable the agent but disable the snmp I get nothing. This could be a red-herring but that connection is unusual.

NaN comes from data returned as a string when numbers are expected, or divide-by-zero… and it is instructive that a “fresh install” didn’t have the problem.

Clearing individual (broken) servers from the system did not fix this. Going with a snmpd only configuration now on a “new” server. SNMPD did “discover” the mysql in place there. I have to wait some time to see if it gets data for it. :frowning_face:

Nope

Have a notion that there is another way to get it, if there are strings being parsed and the data being returned isn’t in a compatible character set. The librenms mariadb is utf8mb4 - not sure what happens in the data for rrdtool - but this is stubborn as.

Trying a hexdump on the rrd entries leads me to believe this isn’t an easy problem. I cannot (yet) see a problem with the file that returns the NaN entries on dump. The NaN appears in that rrd dump however, on an unrelated server that has had a separate installation of rrdtool.

Fundamentally no version of this gets mysql information back into the system.

Scraped out the server and reinstalled. System cannot see mysql data, but detects that the app is installed, on both the check_mk and snmpd installations. Both scripts run without trouble locally and return valid data. The System data is returned OK, app data seems to be toast.

Can I install something else that has some apps on it? We’ll try the postgres scripted installation… and we see that it, and its associated ntp client are both running. This problem actually appears to be quite specific. Mysql, and since I added the postgres server after the two mysql boxes we know it is limited to that display.