Fault domains and update domains – High Availability and Redundancy Concepts

Within Azure’s data centers, the hardware has built-in fault domains. A fault domain is a group of VMs that share the same power and network switches, which means any failure to those components would affect all VMs in that fault domain. Similarly, update domains are VM groupings that define standard maintenance windows such as patching and reboots.

With this in mind, when building VMs, you can choose to place them in an availability set, which instructs the Azure platform to spread any VMs in that set across update and fault domains. There are up to 3 fault and 20 update domains within any 1 availability set; therefore, if you have more than 3 VMs in a cluster or farm, then at least 2 will reside in the same fault domain.

Important note

Availability sets only protect your VMs from hardware and power failures, and they do not provide any protection against OS failures or application failures. Generally, these types of issues would need to be mitigated with application-level architecture.

When building a VM, the default and recommended disk type is a managed disk. Managed disks follow a similar pattern for protecting against hardware failure. When a VM is built in an availability set, the disks are also created in different storage fault zones aligned to the VM. The following diagram shows an example of this:

Figure 14.2 – Availability Zones example

Using availability sets with multiple VMs increases the SLA from 99.9% to 99.95% or an allowed outage of 21.92 minutes a month.

Availability sets help protect you against isolated hardware failures. However, Availability Zones can protect you against an entire data center failure.

Availability Zones

Each Azure region contains multiple separate data centers that are at least 10 miles (16 km) apart. Each data center has its power, networking, and security. They are purposefully placed to ensure some protection against possible natural events such as flooding.

Not all regions support Availability Zones. However, the majority of them do. Each region that does support Availability Zones will have at least three zones within the region. For details of each region and whether they support Availability Zones, see the following link: https://azure.microsoft.com/en-us/global-infrastructure/geographies/#geographies.

When building VMs, you have the option to build them in set Availability Zones. By spreading your VMs among zones and placing them behind a load balancer, the SLA on them increases to 99.99%, or a potential downtime of only 4.38 minutes per month.

Important note

When building resilient architecture with VMs, you will typically use a load balancer. However, having a load balancer zone redundant is optional and only supported on the standard SKU. Therefore, to ensure resilience against zone failures, you must build your VMs across availability sets and use a zone redundant, standard SKU load balancer.

For many organizations, zone redundancy is adequate as each data center is distanced within that region. For organizations that required regional resilience, either for protection or for performance benefits, services can be built across regions.

To achieve region resilience, additional components are required to distribute traffic accordingly, such as a Azure Traffic Manager or Azure Front Door. These are covered in Chapter 8, Network Connectivity and Security.

VMs built across regions are not managed in the same way as VMs in an availability set. The Azure platform will not automatically place VMs or distribute them between the regions. The following diagram shows an example of how this might look:

Figure 14.3 – Cross region, zone redundant architecture example

Typically, you would build VMs in both regions across Availability Zones and using load balancers, and then configure the regional network distribution component, such as Traffic Manager, accordingly. In other words, cross-region resilience usually requires more forethought and architecture.

Creating VMs across Availability Zones is a great way to protect your application. However, it is a manual process in which you must configure each VM. Load balancing workloads across VMs also provides performance benefits. However, it can again be cumbersome to create new VMs when you need to scale manually.

Azure provides an automated way of load balancing your apps across VMs using a feature called scale sets.


Tags:


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *