Building a redundant router setup with Open Source Software (part 2)

In part 1 I went over the basics of failover. The problem with such a setup is that TCP connections will not survive a failover from ar0 to ar1, or vice versa. The main issue is that both routers have a different WAN IP, and in this case, seamless failover will never work.

In a corporate setup, a company usually has their own subnet, and it announces that subnet on each of its routers. It doesn't matter which router routes your packets, the source and destination IP are the same, so the TCP sessions can survive.

On most home connections you will not be able to do this, so we need to work around this problem somehow. I had already mentioned that both ar0 and ar1 get an internal IP from the ISP router. In my case, this is via DHCP. Now it doesn't really matter if you get an internal or external IP, the following workaround works for both scenarios.

I came up with this trick when I still lived in Belgium. Before I moved to Bulgaria, I had a business subscription from Belgium's largest ISP. It came with a static external IP, and that IP had a PTR record resolving to a hostname in my company's domain. But I also had a guest network, for visitors, and I didn't want their traffic to come from that IP. Fortunately, the ISP allowed me to request more than 1 IP via DHCP. My modem was already connected to a switch, so I could connect a 2nd ethernet interface to the switch, and have it request another IP via DHCP and use policy routing to force the guest network to use the secondary interface. But this would require an extra cable and waste an additional switch port. Surely there had to be a better way...

Enter macvlan! With macvlan, you can create a virtual interface on top of a physical ethernet interface, with a different MAC address. Any packets sent out on the macvlan interface will come from a different MAC address than the one from the physical interface. It's like connecting physical interface to a switch and connecting that switch to the modem (in this case), but without the switch.

So I did a quick test, made a macvlan interface on both routers with the same MAC address, et voila, I can get the same WAN IP on both routers. The solution to the "different WAN IP" problem!

So here's how it works...

The first thing we need to do: create the macvlan interface. This is the same on both routers and goes into /etc/config/network:

config device 'eth2_wan'
        option type 'macvlan'
        option name 'wan'
        option ifname 'eth2'
        option macaddr '02:68:71:2e:62:67'

That's it for the L2 interface. Now the L3 part, again the same on both routers:

config interface 'wan'
        option ifname 'wan'
        option proto 'dhcp'
        option auto '0'

Since we have a redundant setup, in normal circumstances the active router will have the WAN connection up, so we don't want to disrupt it by enabling it on boot.

Next thing we need is to activate this macvlan interface whenever a router takes over. For this, we need to modify the keepalived configuration to run a script whenever it changes its state.

/etc/keepalived/keepalived.conf on ar0:

! Configuration File for keepalived

global_defs {
   router_id ar0
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0.54
    virtual_router_id 54
    priority 254
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.54.1/24 dev eth0.54
    }
    notify_master "/etc/keepalived/primary-backup.sh primary"
    notify_backup "/etc/keepalived/primary-backup.sh backup"
}

/etc/keepalived/keepalived.conf on ar1:

global_defs {
   router_id ar1
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0.54
    virtual_router_id 54
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    track_interface {
        eth2
    }
    virtual_ipaddress {
        192.168.54.1/24 dev eth0.54
    }
    notify_master "/etc/keepalived/primary-backup.sh primary"
    notify_backup "/etc/keepalived/primary-backup.sh backup"
}

This script will bring the connection up or down and start or stop some services, depending on the state the router is transitioning to. We'll start with a basic version, again the same on both devices:

#!/bin/sh

services() {
        case "$1" in
                start|stop)
                        /etc/init.d/dnsmasq "$1"
                        /etc/init.d/odhcpd "$1"
                        /etc/init.d/miniupnpd "$1"
                ;;
        esac
}

case "$1" in
        primary)
                ifup wan
                services start
                ;;
        backup)
                ifdown wan
                services stop
                ;;
        *)
                logger "ERROR: unknown state transition"
                echo "Usage: primary-backup.sh {primary|backup}"
                exit 1
                ;;
esac

exit 0

That's it. Whenever the main router reboots, the backup router will detect it, assign the virtual IP to its LAN interface and bring up the virtual WAN interface. As the WAN interface uses the same MAC address as on the other router, the assigned IP will be the same. Very cool, now our TCP sessions can survive a failover!

In theory this should work, but in practice you will probably be using stateful firewall rules. A stateful firewall uses connection tracking to allow return packets for connections that have already been initiated. For this to work, all connections are tracked in connection tracking tables. This is something that happens in the kernel, so the information is only kept locally by each router. So in case a failover happens, existing sessions will not be in the connection tracking tables of the backup router, and the firewall will still terminate them after a succesful failover... DAMN! Now what?

Don't worry. We can fix this by synchronising these connection tracking tables between both routers. But that's for part 3.

Tags: 

Topics: