We never cease to be astounded by customers’ responses when we engage in a conversation with them about vMotion or Live Migration. There is a general perception that if you have ‘deployed’ vMotion then you will not experience any downtime. How can we put this gently – other than stating that, quite categorically, this is wrong!
vMotion or Live Migration is a fantastic piece of technology that allows you to move a running virtual machine from one host to the other. There is no downtime or damage to the application and services continue to run. There is a fundamental flaw though (when considering downtime) in that both hosts – the one you are migrating from and the one you are migrating to – have to be up and running. If you have a host with ‘a problem’ then forget it; it doesn’t work and you will have downtime.
So the concept that this fantastic migration capability will ‘save’ you if your host were to die is a complete nonsense. If your host has disappeared off the face of your infrastructure then you are well and truly into the realm of HA (high availability). This is a different technology that relies on complex ‘failure recovery’ mechanisms that incur varying degrees of downtime. During these failover/recovery periods, which can range from a few seconds to minutes, a backup system automatically restarts the applications and logs on users. This failover process can degrade performance, compromise throughput and will result in the loss of ‘in-flight’ data.
The best way to ensure continuous availability of critical applications is to adopt a strategy that prevents downtime from happening in the first place, rather than just recovering from downtime once it has occurred. There are two different ways that this can be achieved and the suitability of either will always depend on the circumstances.
One option is to use high availability software solutions with built-in fault tolerance. This strategy combines the physical resources to two standard servers into a single operating environment with complete redundancy of all underlying hardware and data to prevent unplanned downtime and keep applications running, even in the event of component or system failures.
A second recommendation is to deploy fault-tolerant servers that are specifically designed to ‘detect, isolate and correct system problems before they cause costly system downtime or data loss’. These ultra high availability, fault tolerant systems do this through use of a ‘lockstep’ architecture that simultaneously processes work instructions and synchronises memory in two separate hardware components so the user experiences no interruption to operations even if a component fails. These systems monitor themselves and hundreds of critical conditions, enabling proactive resolution of issues before they impact system performance or availability.
Both these solutions offer operational simplicity, allowing your existing applications to run without the risk and expense of modifications that could negatively affect applications that may not have been altered in years.
Don’t rely on live migration if you truly need an “Always-On” world.
Article written by Andy Bailey, Solutions Architect, Stratus Technologies.