Sometime having the right command in place opens up new doors to test things, like a failover for example. In this post we will take a look at a new ability that has recently surface within the Azure eco-system to help manage fail-overs. Let’s jump to it.
High availability is a crucial component for data professionals, even when operating in a cloud environment such as Azure. Thankfully, Microsoft Azure Platform as a Service (PaaS) is architected in a way that offers high availability for service right out of the gate. This helps to ensure that your databases, such as Azure SQL Database and Azure SQL Managed Instances, are always available without having to lift a finger. What even better is Microsoft now offers the ability to manually control a failover over for these resources which gives data professionals more granular control.
Previously, the service would manage this aspect and Microsoft would initiate the failover if needed. But what if I wanted to test the failover to see how my applications would react? Would a failover impact my end users? There was not any way to test this even though the service offers a high level of availability. Thankfully that has changed and we can now control, to a degree, failovers for Azure SQL Platform as a Service resources, including Azure SQL Database, Elastic Pools, and SQL Managed Instances.
How can we manage a high availability failover in Azure SQL PaaS?
To facilitate the failovers, you must do this through some type of command line interface. This means either PowerShell, Azure CLI, or a REST API call. There is currently not a way to manage this through the portal. In the future we could possibly see such capability, but I do not know if or when that would come to fruition. For the purposes of this post, we will look at PowerShell.
There are three powershell cmdlets that will failover Azure SQL resources.
This cmdlet will failover an individual database. If the database is involved within an elastic pool, the failover will not affect the entire pool and will only affect the database itself. In testing, failing over a database involved with an elastic pool did not affect the databases membership in the pool. Furthermore, if the database is within an Availability Zone, the database will be failed over to a secondary zone and all client connections will be redirected to the new primary zone.
It is also worth noting that there is a “-ReadableSecondary” switch that would instead a failover the readable secondary. Since you could be using a readable secondary to off-load read workloads it would make sense to test how its failover would impact those workloads.
This cmdlet will failover an entire elastic pool which means all the databases within the pool will failover. This cmdlet will be handy if you are utilizing elastic pools to help minimize Azure costs but still want to test a failover.
Like it’s the two predecessors, this cmdlet will failover a SQL Managed Instance. It also has a readable secondary switch that you can utilize to failover the readable secondary.
Are there any limitations?
With great power comes great responsibility and such is the case here. Given the intrusive nature of the failover within the Azure eco-system, it stands to make sense that you can only failover the resources every so often. Currently, at the time of this post, the documentation states you can only failover every 30 minutes. However, during testing things, I got a different error message that states it’s every 15 minutes.
click to see full image - azure-sql-offers-manual-failover-for-paas-resources
I have given feedback to Microsoft regarding this discrepancy and they were able to get it resolved and the documentation will be updated to reflect a 15 minute duration between failover events.
What else would this help fix?
Even with the highly durable infrastructure that Microsoft has built, there are occasions where hardware issues arise where the service might not failover. While failing over to a DR solution (such as active geo-replication or automatic failover groups) would help to resolve it, if things are configured correctly that is more intrusive to the application. By having the ability to failover, customers can now initiate a failover when hardware issues surface without having to implement their disaster recovery solutions.
Microsoft continues to enhance and improve the Azure SQL eco-system. By having the ability to control and test failovers for Azure SQL resources just further provides a deeper level of control for data professionals. If you are utilizing Microsoft Azure or even planning on moving Azure, I highly recommend you get familiar with how this feature works so that you can verify with certainty how your applications will handle a database high availability failover.