Many corporate networks implement router redundancy for obvious reasons. Most of them are probably using very expensive hardware with proprietary protocols, but in fact it is possible to build such a setup for less than EUR100 and use it at home. Just get a couple of cheap routers that are well supported by LEDE/OpenWrt.

As you may wonder "why on earth would someone want this at home", I'll start with some background info. Before I started using OpenWrt, I used x86-based routers for many years. At some point, we had 2 Internet connections at home, and I decided to do an experiment. I had a P3-500 with 512MB RAM and 4 fast ethernet NICs. This system was running Debian and Xen, with 4 VMs running on it: ar0, ar1, br0, br1 (access/border router). Each of these VMs had one physical and 1 virtual NIC. The virtual NICs were all linked together on a bridge on the host system. The physical NIC of br0 was connected to a cable modem, while the physical NIC of br1 was connected to an ADSL model. The physical NICs of ar0 and ar1 where connected to a single switch. All 4 VMs were running Quagga with OSPF, and both br0 and br1 announced 0.0.0.0/0 to ar0 and ar1. Both ar0 and ar1 were running vrrpd to share a virtual IP. Of course this wasn't real redundancy as everything ran on a single machine, but it was fun to do and I learned a lot from it. Maybe the best part was that I could download Torrents at higher speeds, as this kind of traffic could be load-balanced over both uplinks.

After ditching one of the ISPs, I replaced the setup with a WRT54GL running OpenWrt, back in 2007. As ISPs in Belgium started offering higher speeds, and the 4MB flash was rather tiny for all the software I used to run, the WRT54GL was quickly being pushed to its limits. So, at some point in 2009 I decided to buy an Ubiquiti RouterStation Pro to replace it. I bought 2 of them in October 2009, with the idea to maybe build a redundant setup again at some point, but the 2nd one ended up being used as an AP instead. I initially assembled it in a cigar box, which is the first thing I found that could fit it. Later I bought an indoor enclosure from Netgate. For pictures of V1 and V2, see https://www.linux-ipv6.be/OpenWrt/RSPro/.
At that point I started getting more involved with OpenWrt, with my first contribution being committed in January 2010.

Then many years passed, without ever doing a redundant setup again. I did buy some other hardware, 2x Mikrotik RB2011L-IN, and later 2x Ubiquiti EdgeRouter Lite. You'll notice they both lack WiFi. This is for the simple reason that I could now reflash my access points without the Internet uplink going down. Days of flashing a device several times with a new image weren't uncommon, so I switched to using dedicated routers and dumb access points.

And then the day arrived. In 2015 I moved from Belgium to Bulgaria. I was still playing around with OpenWrt a lot, as I had been running trunk since I started using the RSPros. And knowing that I would travel between Belgium and Bulgaria every now and then, and possibly stay in other countries for some time, the time arrived to finally do a redundant setup again, this time with OpenWrt. OK, it's not 100% redundancy, but my current setup allows me to flash an image remotely and not lose connectivity when a device doesn't come back online after a bad flash. So I took my 2 ERLs with me to Bulgaria, and that's where the background info ends, and the technical part begins ;-)

Internal subnet is 192.168.54.0/24. The main router is called ar0 and has IP 192.168.54.254, the backup router is called ar1 and has IP 192.168.54.253. Until now, this is still the case.

Version 1: simple failover

My ISP Mtel installed a Huawei HG8245T GPON router. Intially this was set up as a router, and my ERLs received an internal IP. The key component to have ar1 take over from ar0, in case it goes down, is using a virtual IP on the LAN side. Instead of using vrrpd, I decided to use keepalived this time.

/etc/keepalived/keepalived.conf on ar0:

! Configuration File for keepalived

global_defs {
   router_id ar0
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0.54
    virtual_router_id 54
    priority 254
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.54.1/24 dev eth0.54
    }
}

/etc/keepalived/keepalived.conf on ar1:

global_defs {
   router_id ar1
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0.54
    virtual_router_id 54
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    track_interface {
        eth2
    }
    virtual_ipaddress {
        192.168.54.1/24 dev eth0.54
    }
}

That should be enough to enable the virtual IP, and have it float from ar0 to ar1 in case ar0 goes down. Now for this to be used by the devices on your LAN, the only thing left to do is change dnsmasq to use the virtual IP as default gateway and DNS server (and optionally NTP, should you use it).

/etc/config/dhcp, lan section, identical on both routers:

config dhcp 'lan'
	option interface 'lan'
	option ignore '0'
	option start '100'
	option limit '100'
	list dhcp_option '3,192.168.54.1'
	list dhcp_option '6,192.168.54.1'
	list dhcp_option 'option:ntp-server,192.168.54.1'

That's it. Simple failover. TCP sessions will not survive, as ar0 and ar1 have a different WAN IP. I will document how I worked around this in part 2.