Multiple NICs on the same Subnet – Avoiding ARP Flux

This is an interesting testing dilemma I haven’t had to deal with in a long time. The story starts out like so:

I have a machine with 2 or more NICs. I have a target system that will be used to do network testing. All NICs on my Server Under Test (SUT) need to be on the same subnet to talk to my target server.

One way of accomplishing this could be to have each NIC on the SUT on it’s own subnet like so:

eth0: 10.0.0.100/24
eth1: 10.0.1.100/24

and have the inbound NIC on my target server support two addresses like this:

eth0.0: 10.0.0.1/24
eth0.1: 10.0.1.1/24

While that’s easy in practice, in theory it becomes cumbersome if you don’t know how many SUTs you’ll have, nor how many NICs in each. I’ve run network tests before on servers with up to 200 NIC ports, meaning my target server would have to have over 200 alias IP addresses on its inbound port.

So the easiest way to handle this is with all the SUT NICs on a single subnet along with my target machine. This allows me to hook up as man NICs as I have address space, meaning I could hook up a LOT of SUTs and run them all simultaneously.

The problem with this approach, however, is ARP flux. This occurrs when a packet is sent to one address, but the target responds with the MAC address for a different NIC. To demonstrate this, see the following. This is me using arping to ping both NICs on my 1U test server:

ubuntu@critical-maas:~$ sudo arping -I eth0 10.0.0.123
ARPING 10.0.0.123 from 10.0.0.1 eth0
Unicast reply from 10.0.0.123 [00:30:48:65:5E:0C] 0.745ms
Unicast reply from 10.0.0.123 [00:30:48:65:5E:0C] 0.779ms
Unicast reply from 10.0.0.123 [00:30:48:65:5E:0C] 0.757ms

ubuntu@critical-maas:~$ sudo arping -I eth0 10.0.0.128
ARPING 10.0.0.128 from 10.0.0.1 eth0
Unicast reply from 10.0.0.128 [00:30:48:65:5E:0C] 0.887ms
Unicast reply from 10.0.0.128 [00:30:48:65:5E:0C] 0.901ms
Unicast reply from 10.0.0.128 [00:30:48:65:5E:0C] 0.849ms

As you can see, no matter which interface IP address I ping, the MAC from eth0 responds saying “Yeah, that’s me!” This can lead to ARP cache poisoning, especially when you’re looking at a system with more than just 2 NICs. All manner of fun things happen when that occurs. But primarily, the problem when testing is that with this scenario, you’re not fully testing your target NIC, you’re testing packets going out on eth1 and coming in on eth0. So we need to fix this issue.

The first things we need to fix involve ARP handling in the Linux kernel. The default behaviour that induces ARP Flux is actually safe in most cases, and gives you a better chance of packets reaching their target. It’s not so good for testing however. So we use sysctl to change some kernel settings.


$ sysctl -w net.ipv4.conf.all.arp_announce=1
$ sysctl -w net.ipv4.conf.all.arp_ignore=2

First the arp_announce change to 1 does this:

Try to avoid local addresses that are not in the target’s subnet for this interface. This mode is useful when target hosts reachable via this interface require the source IP address in ARP requests to be part of their logical network configured on the receiving interface. When we generate the request we will check all our subnets that include the target IP and will preserve the source address if it is from such subnet. If there is no such subnet we select source address according to the rules for level 2.

Then the arp_ignore change to 2 does this:

Reply only if the target IP address is local address configured on the incoming interface and both with the sender’s IP address are part from same subnet on this interface.

Now, on older kernels (2.4 and earlier), that was enough. But on newer kernels, an additional change is necessary due to the changes in how rp_filter is handled. So this would apply to kernels starting at 2.6 and onward through the current 3.x versions. So to make this work on 2.6+ kernels, we set the additional rp_filter value:

$ sysctl -w net.ipv4.conf.all.rp_filter=0

And then, magically, it all starts working the way it should. NOW when we arping the SUT’s eth1 address, we can see that eth1 is actually responding:

ubuntu@critical-maas:~$ sudo arping -I eth0 10.0.0.128
[sudo] password for ubuntu:
ARPING 10.0.0.128 from 10.0.0.1 eth0
Unicast reply from 10.0.0.128 [00:30:48:65:5E:0D] 0.937ms
Unicast reply from 10.0.0.128 [00:30:48:65:5E:0D] 0.888ms
Unicast reply from 10.0.0.128 [00:30:48:65:5E:0D] 0.844ms

To demonstrate this with some traffic load.

Before changing the kernel settings:

ubuntu@supermicro:~$ netstat -ni
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 2864 0 0 0 583 0 0 0 BMRU
eth1 1500 0 58 0 0 0 2035 0 0 0 BMRU
lo 65536 0 0 0 0 0 0 0 0 0 LRU
ubuntu@supermicro:~$ sudo ping -I eth1 -f -c 10000 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.0.128 eth1: 56(84) bytes of data.
--- 10.0.0.1 ping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 298ms
rtt min/avg/max/mdev = 0.210/0.258/0.497/0.024 ms, ipg/ewma 0.295/0.260 ms
ubuntu@supermicro:~$ netstat -ni
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 12897 0 0 0 600 0 0 0 BMRU
eth1 1500 0 60 0 0 0 12035 0 0 0 BMRU
lo 65536 0 0 0 0 0 0 0 0 0 LRU

See here that we send 10000 ICMP packets out eth1 on the SUT, but the reply packets are all accepted on eth0 instead.

Now after changing:

ubuntu@supermicro:~$ netstat -ni
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 47106 0 0 0 1483445 0 0 0 BMRU
eth1 1500 0 42805 0 0 0 8179 0 0 0 BMRU
lo 65536 0 0 0 0 0 0 0 0 0 LRU
ubuntu@supermicro:~$ sudo ping -c 10000 -I eth1 -f 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.0.128 eth1: 56(84) bytes of data.
--- 10.0.0.1 ping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 2496ms
rtt min/avg/max/mdev = 0.166/0.216/0.367/0.010 ms, ipg/ewma 0.249/0.216 ms
ubuntu@supermicro:~$ netstat -ni
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 57046 0 0 0 1503462 0 0 0 BMRU
eth1 1500 0 52811 0 0 0 18179 0 0 0 BMRU
lo 65536 0 0 0 0 0 0 0 0 0 LRU

Now you can see that the 10000 ICMP Packets exit eth1 and the acks are all accepted on eth1 as they should be.

Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *