After inspecting the IP/MAC addresses with network team, we found possible root cause of this amount of entries. All examples given below are for the 660 IPs per MAC entry.
Total entries per MAC - 660:
SELECT COUNT(*) FROM ipv4_mac WHERE mac_address="ffffffffffff" ORDER BY port_id;
Total entries per MAC with most popular IP address - 526:
SELECT COUNT(*) FROM ipv4_mac WHERE ipv4_address="10.10.10.10" ORDER BY port_id;
Total unique entries per MAC with most popular IP address - 64:
SELECT DISTINCT * FROM ipv4_mac WHERE ipv4_address="10.10.10.10" ORDER BY port_id;
So basically, in a lot of those entries there is duplicate data, since there is no uniqueness criteria in the table:
+--------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| port_id | int(11) | NO | MUL | NULL | |
| device_id | int(11) | YES | | NULL | |
| mac_address | varchar(32) | NO | MUL | NULL | |
| ipv4_address | varchar(32) | NO | | NULL | |
| context_name | varchar(128) | NO | | NULL | |
+--------------+--------------+------+-----+---------+-------+
So we made a copy of PROD DB and removed the duplicates:
ALTER TABLE ipv4_mac ADD id INT PRIMARY KEY AUTO_INCREMENT;
DELETE entry1 FROM ipv4_mac entry1
INNER JOIN ipv4_mac entry2
WHERE entry1.id > entry2.id AND entry1.port_id=entry2.port_id AND entry1.device_id=entry2.device_id AND entry1.mac_address=entry2.mac_address AND entry1.ipv4_address=entry2.ipv4_address AND entry1.context_name=entry2.context_name;
Total entries per MAC - 96:
SELECT COUNT(*) FROM ipv4_mac WHERE mac_address="ffffffffffff" ORDER BY port_id;
Total entries per MAC with most popular IP address - 64:
SELECT COUNT(*) FROM ipv4_mac WHERE ipv4_address="10.10.10.10" ORDER BY port_id;
Total unique entries per MAC with most popular IP address - 64:
SELECT DISTINCT * FROM ipv4_mac WHERE ipv4_address="10.10.10.10" ORDER BY port_id;
And now top results for IPv4 per MAC overall look like this:
SELECT COUNT(mac_address) AS macs FROM ipv4_mac GROUP BY mac_address HAVING macs > 1 ORDER BY macs DESC LIMIT 10;
+------+
| macs |
+------+
| 547 |
| 401 |
| 230 |
| 180 |
| 177 |
| 171 |
| 148 |
| 117 |
| 115 |
| 114 |
+------+
Hm… I guess this does not solve all the problems. OK, here is how the network team explains this:
One of the specifics to mention is that one device can have over X VLANS and a separate unique /30 subnet in each that is used for VRRP signalling. That already makes at least 2 IP addresses per VLAN. If we add, like, 4 devices that have an IP in each VLAN - their count increases rapidly.
For testing purposes, here are some results for the FDB table SQL query, when duplicate entries are absent:
SELECT F.port_id AS port_id, F.device_id, ifInErrors, ifOutErrors, ifOperStatus, ifAdminStatus, ifAlias, ifDescr, mac_address, V.vlan_vlan AS vlan, hostname, hostname AS device, group_concat(M.ipv4_address SEPARATOR ', ') AS ipv4_address
FROM ports_fdb AS F
LEFT JOIN devices AS D USING(device_id)
LEFT JOIN ports AS P USING(port_id, device_id)
LEFT JOIN vlans AS V USING(vlan_id, device_id)
LEFT JOIN (SELECT port_id, COUNT(*) portCount FROM ports_fdb GROUP BY port_id) AS C ON C.port_id=F.port_id
LEFT JOIN ipv4_mac AS M USING(mac_address)
WHERE 1 GROUP BY device_id, port_id, mac_address, vlan, hostname, ifAlias, ifAdminStatus, ifDescr, ifOperStatus, ifInErrors, ifOutErrors ORDER BY C.portCount ASC LIMIT 0,50;
Before - ERROR 3 (HY000): Error writing file ‘/tmp/MYf3v9dc’ (Errcode: 28 - No space left on device) (10GB all used up).
After - ERROR 3 (HY000): Error writing file ‘/tmp/MYe0NDnX’ (Errcode: 28 - No space left on device) (10GB all used up).
SELECT F.port_id AS port_id, F.device_id, ifInErrors, ifOutErrors, ifOperStatus, ifAdminStatus, ifAlias, ifDescr, mac_address, V.vlan_vlan AS vlan, hostname, hostname AS device, group_concat(distinct M.ipv4_address SEPARATOR ', ') AS ipv4_address
FROM ports_fdb AS F
LEFT JOIN devices AS D USING(device_id)
LEFT JOIN ports AS P USING(port_id, device_id)
LEFT JOIN vlans AS V USING(vlan_id, device_id)
LEFT JOIN (SELECT port_id, COUNT(*) portCount FROM ports_fdb GROUP BY port_id) AS C ON C.port_id=F.port_id
LEFT JOIN ipv4_mac AS M USING(mac_address)
WHERE 1 GROUP BY device_id, port_id, mac_address, vlan, hostname, ifAlias, ifAdminStatus, ifDescr, ifOperStatus, ifInErrors, ifOutErrors ORDER BY C.portCount ASC LIMIT 0,50;
Before - ERROR 3 (HY000): Error writing file ‘/tmp/MYUDeZ2a’ (Errcode: 28 - No space left on device) (10GB all used up).
After - ERROR 3 (HY000): Error writing file ‘/tmp/MYfJYXB9’ (Errcode: 28 - No space left on device) (10GB all used up).
SELECT F.port_id AS port_id, F.device_id, ifInErrors, ifOutErrors, ifOperStatus, ifAdminStatus, ifAlias, ifDescr, mac_address, V.vlan_vlan AS vlan, hostname, hostname AS device, group_concat(M.ipv4_address SEPARATOR ', ') AS ipv4_address
FROM ports_fdb AS F
LEFT JOIN devices AS D USING(device_id)
LEFT JOIN ports AS P USING(port_id, device_id)
LEFT JOIN vlans AS V USING(vlan_id, device_id)
LEFT JOIN (SELECT port_id, COUNT(*) portCount FROM ports_fdb GROUP BY port_id) AS C ON C.port_id=F.port_id
LEFT JOIN ipv4_mac AS M USING(mac_address)
WHERE 1
GROUP BY device_id, port_id, mac_address, vlan ORDER BY C.portCount ASC LIMIT 0,50;
Before - 50 rows in set, 3286 warnings (24.28 sec)
After - 50 rows in set, 3247 warnings (22.05 sec)
SELECT F.port_id AS port_id, F.device_id, ifInErrors, ifOutErrors, ifOperStatus, ifAdminStatus, ifAlias, ifDescr, mac_address, V.vlan_vlan AS vlan, hostname, hostname AS device, group_concat(distinct M.ipv4_address SEPARATOR ', ') AS ipv4_address
FROM ports_fdb AS F
LEFT JOIN devices AS D USING(device_id)
LEFT JOIN ports AS P USING(port_id, device_id)
LEFT JOIN vlans AS V USING(vlan_id, device_id)
LEFT JOIN (SELECT port_id, COUNT(*) portCount FROM ports_fdb GROUP BY port_id) AS C ON C.port_id=F.port_id
LEFT JOIN ipv4_mac AS M USING(mac_address)
WHERE 1
GROUP BY device_id, port_id, mac_address, vlan ORDER BY C.portCount ASC LIMIT 0,50;
Before - 50 rows in set (24.43 sec)
After - 50 rows in set (22.70 sec)