In the previous two chapters, we examined how to create scalable databases and the different options for integrating them using data flows.
This chapter looks at how we can ensure our solutions are highly available and automatically respond to failures.
Many Azure components, especially Platform as a Service (PaaS) and serverless options such as Azure Functions, automatically implement high availability. We examined how to best leverage and architect applications to take advantage of those features in Chapter 11, Comparing Application Components.
However, Infrastructure as a Service (IaaS) components such as virtual machines need more thought to respond to outages. Azure storage and Azure databases offer more options on top of the default configuration to expand the concept of high availability across regions. Again, we touched on this in Chapter 12, Creating Scalable and Secure Databases, when investigating the use of data replicas for scalability.
In this chapter, we will re-visit the use of replicas for storage and databases, specifically with high availability in mind. However, as you will see, scalability and redundancy go hand in hand.
We will also look at implementing high-availability options with virtual machines (VM) by looking at VM scale sets.
This chapter will specifically cover the following concepts:
- Understanding virtual machine availability
- Understanding storage availability
- Understanding SQL database availability
- Understanding Cosmos DB availability
Technical requirements
This chapter will use the Azure portal (https://portal.azure.com) for examples.
Understanding virtual machine availability
A common misconception in Azure is that VMs are automatically highly available. Although this may be true to a certain extent as the failure of hardware results in a VM being moved to healthy hardware, this process temporarily interrupts the accessibility of that VM.
Additionally, during maintenance events, the Azure platform may need to forcefully reboot your VM. This is performed gracefully, but again it causes a brief outage for your workload.
Finally, in the unlikely event of an entire region outage, for example, due to networking failure, your VMs will be inaccessible until that outage is rectified.
Another aspect of your VM availability is the type of disks you choose to build it with. Standard magnetic HDDs have the lowest availability, whereas premium SSDs have the greatest due to how they are used and distributed.
These factors can have a significant impact on the Service-Level Agreement (SLA) of your service. For example, a single-instance VM with a standard HDD has an SLA of 95%, whereas a single VM with a premium Solid State Drive (SSD) has an SLA of 99.9%. To put this into context, an SLA of 95% means you could have downtime of up to 36.53 hours a month without any compensation, whereas 99.9% reduces this to 44.83 minutes.
This does not mean you will suffer these outages; it means that if your service does fail for longer than these periods, you are eligible for service credits. It is a good measure of the overall potential availability for when (not if) things go wrong.
There are several options for workloads that cannot suffer any outage or as little outage as possible.
First, all options require that you build at least two VMs that run the same service. Typically, these VMs would be placed behind a load balancer so that if one fails, the other will take over, as we can see in the following diagram:

Figure 14.1 – Using Azure Load Balancer to protect VMs
Using a load balancer can help protect a VM, however, on its own, this is not enough.
Leave a Reply