2015-12-07



Advertise here with BSA

With any new hypervisor or one that someone has not worked with previously, high availability will be one of the early questions. What happens to the VMs running on a host that crashes for one reason or another. Acropolis hypervisor (AHV) is built on CentOS KVM, which does not offer HA natively. Nutanix has built additional functionality as part of AHV to offer virtual machine High Availability (HA) as a feature to ensure virtual machine availability in the event of a host or block outage. In the event of a host failure the VMs previously running on that host will be restarted on other healthy nodes throughout the cluster. There are three HA configuration options available to account for different cluster configuration scenarios.

By default, all AHV clusters will provide a best effort level of HA even when the cluster was not specifically configured for HA. Best effort HA works without reserving any resources. Admission control is not enforced and hence there may not be sufficient capacity available to start all the VMs from the failed host. What this means is that even when AHV clusters have not been configured for HA, clusters are still protected. Depending on how many compute resources are available within the cluster all or a subset of the affected VMs may be restarted.

When an Acropolis cluster is configured for HA, the process is accomplished through Prism and is enabled with a single click. Prism will examine the cluster and will configure the cluster reservation for a specific number of host failures or segment reservations. The reservation method decision is based upon the uniformity of the hosts configuration within the cluster and selects the method with the least amount of overhead.

Host reservations – With this method an entire host is reserved for failover protection. The least used host in the cluster is selected as a reserve node, and all VMs on that node are migrated off to other nodes in the cluster so that the full capacity of that node is available for VM failover. This is the default HA method when all hosts within the cluster have the same amount of RAM. Prism will configure the number of failover hosts to match the number of failures the cluster will tolerate for the configured Replication Factor (RF).

Segment reservations – This method divides the cluster into fixed size segments of CPU and memory. Each segment corresponds to the largest VM that is guaranteed to be restarted in case the failure occurs. The other factor is the number of host failures that can be tolerated. Using these inputs, the scheduler implements admission control to always have enough resources reserved so that VMs can be restarted upon failure of any host in the cluster.  This is the default method used when hosts in the cluster have different amounts of RAM.

As you add more blocks and nodes to you AHV cluster as part of the expand cluster function, Prism will configure the new AHV hosts with the same profile settings as other hosts in the cluster. This ensures that the new resources are added to the HA calculations for the cluster and any changes are made to all hosts.

How to configure HA via Prism

Configuring HA on AHV in Prism is one-click, just like many other functions that Nutanix has built so far. Once you are logged into Prism you click on the gear in the upper right, find the Manage VM High Availability choice and select.



This will open a pop-out window that will look like the one below. To enable HA you just need to tick the box, the system will tell you how many resources (Memory) will be reserved and how many host failures the cluster will be protected against. Just click the Save button and Prism will configure all of the AHV hosts in the cluster for the HA settings.



How to modify HA via ACLI

The Acropolis command line interface (ACLI) is another method that can be used to configure or modify HA settings. With HA being so easy to setup via Prism, I only anticipate admins needing this method to modify the default settings for corner cases or non-standard behaviors. The ACLI can be accessed by connecting to one of the CVM’s via SSH and typing ACLI to get started.

There are two main commands for HA in the ACLI

ha.get

The ha.get command will return the current HA configuration for the AHV cluster. The image below shows an example, which confirms that HA is setup and is set to tolerate a single host failure. It also shows that the reservation type is set to reserve a host, rather than the segment option. This is because my cluster has hosts that are all identically configured.

ha.update

The ha.update command allows for the initial configuration of HA or to modify settings on a cluster already configured for HA. This command allows you to change the failover type, evacuation mode, number of failures, reservation type and whether to wait for evacuation.

As always if you wish to learn even more about Nutanix or HA, you can head on over to the Nutanix Bible and look at the section on High Availability for extra details.

Show more