6.4 Partial failover
A Partial Failover is the scenario where a tenant still has his infrastructure up and running, and only one or more virtual machines are having issues. In this situation, nobody wants to run a complete site failover to solve issues of a few VM’s; partial failover allows to start the replica VM of one or more VM’s at the service provider side, and let all the other VM’s running at the tenant side.
To make this possible, the technology integrated into the Network Extension Appliance extends (hence the name) any customer network to the service provider site, so that production VM’s can communicate with replicas without any change in the IP addressing.
This happens because NEA creates for each involved network a Layer 2 VPN tunnel that transparently extents the tenant network to the corresponding service provider network.
6.27: Veeam Cloud Connect Partial Failover
The Cloud Gateway at the service provider is responsible for interconnecting the two NEA’s, at the tenant and at the service provider. Thanks to this interconnection, OpenVPN Client running at the tenant can initiate a VPN tunnel towards the OpenVPN Server running in the service provider tenant. The final result is that a Layer2 tunnel is created between the two networks, and thanks to a Proxy-ARP solution running in both the appliances, packets can travel inside the tunnel and VM’s can communicate with each other, regardless in which site they are powered on.
NOTE: virtual machines running at the service provider can reach internet by using the internet connection of the tenant. Any packet created at the service provider and with a destination other that its own subnet is forwarded to the default gateway, which is running at the tenant side.
Partial failover operation
To initiate a partial failover, the tenant selects from the ready replica’s the virtual machine he wants to failover. Note that Veeam doesn’t verify if the original virtual machine is still running, thus possible IP address conflicts may occur if the tenant doesn’t verify this information prior to starting the partial failover.
6.28: Start the failover of a single VM
In the wizard, the tenant can add additional VM’s to the partial failover, and for each of them he can select the restore point he wants to use:
6.29: Select the restore point to be used for the failover
The wizard is finished, and after a few seconds the operation is completed:
6.30: Partial failover is completed successfully
What has happened behind this screen? A few things.
On both sides, NEA’s are started so that the VPN tunnel and the proxy-ARP components are up and running. This is the NEA at the tenant side:
6.31: NEA at tenant side is started
On the service provider side, both the NEA and the requested replica VM are started. The replica VM has the same configurations and IP address as its original copy:
6.32: test-2012 VM is started at the service provider side
Service provider can also verify in the Veeam Backup & Replication console that the failover has been started by Tenant 1:
6.33: test-2012 is in failover state at the service provider
and he can also see the two tasks originated by the failover:
6.34: Task list at the service provider
There is a completed Cloud Failover task, related to the power on of the replica VM, and a VPN Tunnel task in active state, as the failover is still in the process.
The final result, for the tenant, is that any connection towards the failed-over VM happens as usual:
6.35: Connection to the original VM and its replica
We can see here two ping operations: the first one against the original VM has a time below 1 ms and a TTL of 128, signs that the ping packet was reaching a local VM. The second test has higher latency and a TTL of 126, a clear sign that the connection is still over a Layer2 network (both machines are in the same subnet) but link is towards a remote location.
The partial failover is correctly working, and can be kept up and running as long as the tenant needs it. Once the failover is not needed anymore, the tenant can choose among different options:
6.36: Options for a failed over VM
Undo failover stops the replica VM at the service provider side, and any change applied to that VM is lost. If the tenant has made some changes to the replica VM and wants that version to be the one to be used from now on, he can use choose Failback to production to replicate the replica VM back into his production site.