Possible new SNMP interface

murrant · 6 December 2016 02:30

I’ve been wanting to re-write our SNMP functions to use a class so I can mock them for tests easily. I tossed out just moving the existing code into a class, allowing us to avoid legacy code and keep things clean.

Here is my initial thoughts on an interface:

gist.github.com

https://gist.github.com/murrant/d3aad3a0bd2709fc143423942b9dc5e9

SNMP.php

namespace LibreNMS;

class SNMP
{
    /**
     * @param array $device
     * @param string|array $oids single or array of oids to get
     * @param string $mib Additional mibs to search, optionally you can specify full oid names
     * @param string $mib_dir Additional mib directory, should be rarely needed, see definitions to add per os mib dirs
     * @return array array or object of results, format to be determined

This file has been truncated. show original

This will have built in caching, so we will never call snmpget/snmpwalk twice for the same data and the user doesn’t have to be aware if it is cached or not.

I think we can build a nicely parsed result for most use cases, but we leave the getRaw() and walkRaw() functions public for any oddball use cases.

I am not sure if returning an array or an object would be better. I am leaning towards object as it can be more obvious what to expect rather than having print_r() the array to understand what you get. I’ll add clear documentation either way so the result is expected.

I’d especially like to hear if there are thoughts on improving what I have so far or any issues that can be foreseen.

murrant · 6 December 2016 16:50

I believe I will be using object for supplying the data to callers.

This will allow the consumer to access the data as they desire without having to resort to SNMP::getRaw().

I will make a collection class LibreNMS\SNMP\DataSet

DataSet will have functions such as getByID(), so if you walk multiple related oids, you can group them by ID.
Anotther example is getByKey(), which will get all data that was walked under a certain key.

Individual pieces of data will be encapsulated in LibreNMS\SNMP\OID objects. Which will have functions such as getKey(), getID(), getData()

Gorian · 6 December 2016 20:05

I definitely would go with an object over an array, much more flexible and useful in the long run.

murrant · 6 December 2016 20:15

I was looking at using collections from Laravel as I have been reading this:

But, that requires PHP 5.6

adaniels21487 · 6 December 2016 23:00

Hi @murrant,
I’d like to see a status flag that is checked before every query and set on a timeout.

The thought being that if SNMP on a device is unavailable, we should only timeout the connection once, then it is not tried again for that polling cycle.

Thanks,
Aaron

murrant · 7 December 2016 00:59

Good idea, that should be no problem.

laf · 7 December 2016 01:14

Seems like a good path forward - especially the caching side.

I do have a question on that, not knowing enough about PHP and it’s memory allocation of variables, is it a bad thing if we cache 500 interfaces snmp data for all of the polling?

murrant · 7 December 2016 01:17

We can check to see how much memory it is. I would think it would be in the single digit megabytes. If using the Python dispatcher, shouldn’t the poller be run for only one device anyway?

Also, this is just the frontend. I plan to accommodate pluggable backends.

murrant · 7 December 2016 23:09

Building initial prototype here:

https://github.com/murrant/librenms/tree/snmp-object

A lot of refinement is needed still. Right now it has basic functionality. Next I’m going to work on tests. Then refining and documentation.

No caching or any optimizations yet.

laf · 8 December 2016 16:00

Yes poller-wrapper will only fire one poller.php but you may have 16 (or less / or more) firing so if poller.php ends up as 2MB then that’s 16MB just for polling - does that matter? Not everyone throws a lot of resources at things.

Like I said, it’s just a question.

murrant · 23 December 2016 14:37

I’m not sure how I should handle errors such as unreachable, or oid not present, or when the connection gets interrupted mid fetch. Should I add an item to the collection that gets returned, set a property on the collection, throw an exception? I think for the oid not present, setting an error on the oid item (in the collection) seems reasonable. But I’m not sure what to do with the more generic errors.

laf · 23 December 2016 18:14

Do exceptions stop the execution? If so that doesn’t seem the right way. Returning an error though seems sensible for oid not present as we want to see in debug that snmp didn’t return data we expect.

If a connection gets interrupted mid fetch, I’d say we should discard that data - I can’t see how having part data will be good.

murrant · 23 December 2016 18:17

Thanks, that’s the same conclusions I was coming to.

murrant · 25 January 2017 21:51

Also, I ported laravel/collections to PHP 5.3

You can find it here: https://github.com/murrant/collect

murrant · 3 February 2017 17:58

A little progress report.

Code is working for NetSNMP, and a bit of POC for PHP-SNMP.
Some testing is written, but needs to be more complete.

The calls all return Collections, which allows for some really nice data manipulation.
Specifically, a DataSet object containing OIDData object(s) (which are both subclasses of Collection) is returned.
Any frequent data manipulation patterns will be added to DataSet and maybe OIDData if appropriate.

OIDData contains many properties with the results of the snmp call.

Example, for core snmp polling this was our previous code:

github.com

librenms/librenms/blob/master/includes/polling/core.inc.php

<?php
/*
 * LibreNMS Network Management and Monitoring System
 * Copyright (C) 2006-2011, Observium Developers - http://www.observium.org
 *
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * See COPYING for more details.
 */

(new \LibreNMS\Modules\Core())->poll($os);

First we get several variables in a single snmp get. We don’t have to worry if any or all of these failed.
We incorporate a workaround for a device bug and trim \ from the outside of all data values.

$oids = array( 'sysUpTime.0', 'sysLocation.0', 'sysContact.0 ', 'sysName.0', 'sysDescr.0', 'sysObjectID.0');
$sys_data = SNMP::get($device, $oids, 'SNMPv2-MIB')->map(function($item) {
    // Remove leading & trailing backslashes added by VyOS/Vyatta/EdgeOS
    $item['value'] = trim($item['value'], '\\');
    return $item;
});

We populate $poll_device for backward compatibility. (->all() makes it a plain array)

$poll_device = $sys_data->pluck('value', 'name')->all();

Now comes the nifty part. We fetch additional uptime data and add in sysUptime, which we already have.

$uptime_oids = array( 'SNMP-FRAMEWORK-MIB::snmpEngineTime.0', 'HOST-RESOURCES-MIB:hrSystemUptime.0');
$uptime_data = SNMP::get($device, $uptime_oids)
    ->push($sys_data->where('name', 'sysUpTime')->first());

If the device is windows, don’t use hrSystemUptime:

if ($device['os'] == 'windows') {
    $uptime_data = $uptime_data->reject(function ($item) {
        return $item['name'] == 'hrSystemUptime';
    });
}

There is another rejection for devices with bad snmpEngineTime, which is essentially the same.

Finally, we get the highest uptime that hasn’t been removed. The seconds field is a convenience field that normalizes all time and other integer data that returns seconds.

$uptime = $uptime_data->max('seconds');

Not too shabby.

Additional Collections reading:

murrant · 14 February 2017 18:50

I moved work here, to simplify rebasing without losing all the history just yet.

https://github.com/murrant/librenms/tree/snmp-rewrite

I think it’s getting closer, but has a lot of rough edges. I need to re-organize and review the code and write more unit tests.

laf · 14 February 2017 23:28

So some thoughts after playing with the code:

Allow the cache time to be configurable by the user including completely disabling

poller and disco with either -v or -d should disable caching

we need to look at a way to display the debug data back in a nicer format potentially. Maybe we need to look at -d and -dd with the former just having a nicer output for users and -dd more for dev work with more verbose info).

Great work

murrant · 15 February 2017 06:46

Thanks for the comments. I finished implementing the enable/disable of the cache, it was there already, but not complete. $config['cache']['enable'] = false; or Config::set('cache.enable', false);
Cache time is at cache.time (it is in seconds)
Also, cache.dir controls the storage location in the case where file caching is used.

I’m not so sure about disabling the cache when debugging. How do you debug an issue that involves cached data? I’ve implemented it for now to test out.

laf · 15 February 2017 23:28

Then we need a flag to show it’s cached and an option to not cache the data, I wouldn’t want to sit there hitting poller for 5 minutes seeing the same data whilst debugging / developing.

murrant · 16 February 2017 03:17

The debug output is different for cached data. Instead of showing the the snmp command, we print “Cached” and the cache key.