High Availability, Protection, and Recovery using Microsoft Azure - Managing Microsoft Hybrid Clouds (2015)

Managing Microsoft Hybrid Clouds (2015)

Chapter 7. High Availability, Protection, and Recovery using Microsoft Azure

Microsoft Azure can be used to protect your on-premise assets such as virtual machines, applications, and data. In this chapter, you will learn how to use Microsoft Azure to store backup data, replicate data, and even for orchestration of a failover and failback of a complete data center.

You will also learn how applications and the data running on Azure are protected by the Azure infrastructure and what we can do to get maximum uptime for our applications.

We will focus on the following topics:

· Microsoft Failover clustering

· Microsoft Azure Backup Vault for storage of backup data

· Azure Site Recovery for failover and failback of virtual machines

· SQL Server Replication to protect SQL databases

· Microsoft StorSimple

· Azure snapshots to protect Azure virtual machines

High availability in Microsoft Azure

One of the most important limitations of Microsoft Azure is the lack of an SLA for single-instance virtual machines. If a virtual machine is not part of an availability set, that instance is not covered by any kind of SLA. The reason for this is that when Microsoft needs to perform maintenance on Azure hosts, in many cases, a reboot is required. Reboot means the virtual machines on that host will be unavailable for a while. So, in order to accomplish High Availability for your application, you should have at least two instances of the application running at any point in time. As mentioned in previous chapters Microsoft is working on some sort of hot patching which enables virtual machines to remain active on hosts being patched. Details are not available at the moment of writing.

High Availability is a crucial feature that must be an integral part of an architectural design, rather than something that can be "bolted on" to an application afterwards. Designing for High Availability involves leveraging both the development platform as well as available infrastructure in order to ensure an application's responsiveness and overall reliability. The Microsoft Azure Cloud platform offers software developers PaaS extensibility features and network administrators IaaS computing resources that enable availability to be built into an application's design from the beginning. The good news is that organizations with mission-critical applications can now leverage core features within the Microsoft Azure platform in order to deploy highly available, scalable, and fault-tolerant cloud services that have been shown to be more cost-effective than traditional approaches that leverage on-premises systems.

Microsoft Failover Clustering support

Windows Server Failover Clustering (WSFC) is not supported on Azure. However, Microsoft does support SQL Server AlwaysOn Availability Groups. For AlwaysOn Availability Groups, there is currently no support for availability group listeners in Azure. Also, you must work around a DHCP limitation in Azure when creating WSFC clusters in Azure. After you create a WSFC cluster using two Azure virtual machines, the cluster name cannot start because it cannot acquire a unique virtual IP address from the DHCP service. Instead, the IP address assigned to the cluster name is a duplicate address of one of the nodes. This has a cascading effect that ultimately causes the cluster quorum to fail, because the nodes cannot properly connect to one another.

So if your application uses Failover Clustering, it is likely that you will not move it over to Azure. It might run, but Microsoft will not assist you when you encounter issues.

Load balancing

Besides clustering, we can also create highly available nodes using load balancing. Load balancing is useful for stateless servers. These are servers that are identical to each other and do not have a unique configuration or data.

When two or more virtual machines deliver the same application logic, you will need a mechanism that is able to redirect network traffic to those virtual machines. The Windows Network Load Balancing (NLB) feature in Windows Server is not supported on Microsoft Azure. An Azure load balancer does exactly this. It analyzes incoming network traffic of Azure, determines the type of traffic, and reroutes it to a service.

Load balancing

The Azure load balancer is running provided as a cloud service. In fact, this cloud service is running on virtual appliances managed by Microsoft. These are completely software-defined. The moment an administrator adds an endpoint, a set of load balancers is instructed to pass incoming network traffic on a certain port to a port on a virtual machine. If a load balancer fails, another one will take over.

Azure load balancing is performed at layer 4 of the OSI mode. This means the load balancer is not aware of the application content of the network packages. It just distributes packets based on network ports.

To load balance over multiple virtual machines, you can create a load-balanced set by performing the following steps:

1. In Azure Management Portal, select the virtual machine whose service should be load balanced.

2. Select Endpoints in the upper menu.

3. Click on Add.

4. Select Add a stand-alone endpoint and click on the right arrow.

5. Select a name or a protocol and set the public and private port.

6. Enable create a load-balanced set and click on the right arrow.

7. Next, fill in a name for the load-balanced set.

8. Fill in the probe port, the probe interval, and the number of probes. This information is used by the load balancer to check whether the service is available. It will connect to the probe port number; do that according to the interval. If the specified number of probes all result in unable to connect, the load balancer will no longer distribute traffic to this virtual machine.

Load balancing

9. Click on the check mark.

The load balancing mechanism available is based on a hash. Microsoft Azure Load Balancer uses a five tuple (source IP, source port, destination IP, destination port, and protocol type) to calculate the hash that is used to map traffic to the available servers.

A second load balancing mode was introduced in October 2014. It is called Source IP Affinity (also known as session affinity or client IP affinity). On using Source IP affinity, connections initiated from the same client computer go to the same DIP endpoint.

These load balancers provide high availability inside a single data center. If a virtual machine part of a cluster of instances fails, the load balancer will notice this and remove that virtual machine IP address from a table.

However, load balancers will not protect for failure of a complete data center. The domains that are used to direct clients to an application will route to a particular virtual IP that is bound to an Azure data center.

To access application even if an Azure region has failed, you can use Azure Traffic Manager. This service can be used for several purposes:

· To failover to a different Azure region if a disaster occurs

· To provide the best user experience by directing network traffic to Azure region closest to the location of the user

· To reroute traffic to another Azure region whenever there's any planned maintenance

The main task of Traffic Manager is to map a DNS query to an IP address that is the access point of a service.

This job can be compared for example with a job of someone working with the X-ray machine at an airport. I'm guessing that you have all seen those multiple rows of X-ray machines. Each queue at an X-ray machine is different at any moment. An officer standing at the entry of the area distributes people over the available X-rays machine such that all queues remain equal in length.

Traffic Manager provides you with a choice of load-balancing methods, including performance, failover, and round-robin. Performance load balancing measures the latency between the client and the cloud service endpoint. Traffic Manager is not aware of the actual load on virtual machines servicing applications.

As Traffic Manager resolved endpoints of Azure cloud services only, it cannot be used for load balancing between an Azure region and a non-Azure region (for example, Amazon EC2) or between on-premises and Azure services.

It will perform health checks on a regular basis. This is done by querying the endpoints of the services. If the endpoint does not respond, Traffic Manager will stop distributing network traffic to that endpoint for as long as the state of the endpoint is unavailable.

Traffic Manager is available in all Azure regions. Microsoft charges for using this service based on the number of DNS queries that are received by Traffic Manager. As the service is attached to an Azure subscription, you will be required to contact Azure support to transfer Traffic Manager to a different subscription.

The following table shows the difference between Azure's built-in load balancer and Traffic Manager:

Load balancer

Traffic Manager

Distribution targets

Must reside in same region

Can be across regions

Load balancing

5 tuple

Source IP Affinity

Performance, failover, and round-robin

Level

OSI layer 4 TCP/UDP ports

OSI Layer 4 DNS queries

Third-party load balancers

In certain configurations, the default Azure load balancer might not be sufficient. There are several vendors supporting or starting to support Azure. One of them is Kemp Technologies.

Kemp Technologies offers a free load balancer for Microsoft Azure. The Virtual LoadMaster (VLM) provides layer 7 application delivery. The virtual appliance has some limitations compared to the commercially available unit. The maximum bandwidth is limited to 100 Mbps and High Availability is not offered. This means the Kemp LoadMaster for Azure free edition is a single point of failure. Also, the number of SSL transactions per second is limited.

One of the use cases in which a third-party load balancer is required is when we use Microsoft Remote Desktop Gateway. As you might know, Citrix has been supporting the use of Citrix XenApp and Citrix XenDesktop running on Azure since 2013. This means service providers can offer cloud-based desktops and applications using these Citrix solutions.

To make this a working configuration, session affinity is required. Session affinity makes sure that network traffic is always routed over the same server.

Windows Server 2012 Remote Desktop Gateway uses two HTTP channels, one for input and one for output, which must be routed over the same Remote Desktop Gateway. The Azure load balancer is only able to do round-robin load balancing, which does not guarantee both channels using the same server.

However, hardware and software load balancers that support IP affinity, cookie-based affinity, or SSL ID-based affinity (and thus ensure that both HTTP connections are routed to the same server) can be used with Remote Desktop Gateway.

Another use case is load balancing of Active Directory Federation Services (ADFS). Microsoft Azure can be used as a backup for on-premises Active Directory (AD). Suppose your organization is using Office 365. To provide single sign-on, a federation has been set up between Office 365 directory and your on-premises AD. If your on-premises ADFS fails, external users would not be able to authenticate. By using Microsoft Azure for ADFS, you can provide high availability for authentication.

Kemp LoadMaster for Azure can be used to load balance network traffic to ADFS and is able to do proper load balancing. To install Kemp LoadMaster, perform the following steps:

1. Download the Publish Profile settings file from https://windows.azure.com/download/publishprofile.aspx.

2. Use PowerShell for Azure with the Import-AzurePublishSettingsFile command.

3. Upload the KEMP supplied VHD file to your Microsoft Azure storage account.

4. Publish the VHD as an image.

5. The VHD will be available as an image. The image can be used to create virtual machines.

The complete steps are described in the documentation provided by Kemp.

Geo-replication of data

Microsoft Azure has geo-replication of Azure Storage enabled by default. This means all of your data is not only stored at three different locations in the primary region, but also replicated and stored at three different locations at the paired region.

However, this data cannot be accessed by the customer. Microsoft has to declare a data center or storage stamp as lost before Microsoft will failover to the secondary location.

In the rare circumstance where a failed storage stamp cannot be recovered, you will experience many hours of downtime. So, you have to make sure you have your own disaster recovery procedures in place.

Zone Redundant Storage

Microsoft offers a third option you can use to store data. Zone Redundant Storage (ZRS) is a mix of two options for data redundancy and allows data to be replicated to a secondary data center / facility located in the same region or to a paired region. Instead of storing six copies of data like geo-replicated storage does, only three copies of data are stored. So, ZRS is a mix of local redundant storage and geo-replicated storage. The cost for ZRS is about 66 percent of the cost for GRS.

Snapshots of the Microsoft Azure disk

Server virtualization solutions such as Hyper-V and VMware vSphere offer the ability to save the state of a running virtual machine. This can be useful when you're making changes to the virtual machine but want to have the ability to reverse those changes if something goes wrong.

This feature is called a snapshot. Basically, a virtual disk is saved by marking it as read only. All writes to the disk after a snapshot has been initiated are stored on a temporary virtual disk. When a snapshot is deleted, those changes are committed from the delta disk to the initial disk.

While the Microsoft Azure Management Portal does not have a feature to create snapshots, there is an ability to make point-in-time copies of virtual disks attached to virtual machines.

Microsoft Azure Storage has the ability of versioning. Under the hood, this works differently than snapshots in Hyper-V. It creates a snapshot blob of the base blob. Snapshots are by no ways a replacement for a backup, but it is nice to know you can save the state as well as quickly reverse if required.

Introduction to geo-replication

By default, Microsoft replicates all data stored on Microsoft Azure Storage to the secondary location located in the paired region. Customers are able to enable or disable the replication. When enabled, customers are charged.

When Geo Redundant Storage has been enabled on a storage account, all data is asynchronous replicated. At the secondary location, data is stored on three different storage nodes. So even when two nodes fail, the data is still accessible.

However, before the read access Geo-Redundant feature was available, customers had no way to actually access replicated data. The replicated data could only be used by Microsoft when the primary storage could not be recovered again.

Microsoft will try everything to restore data in the primary location and avoid a so-called geo-failover process. A geo-failover process means that a storage account's secondary location (the replicated data) will be configured as the new primary location. The problem is that a geo-failover process cannot be done per storage account, but needs to be done at the storage stamp level. As you learned in Chapter 3, Understanding the Microsoft Azure Architecture, a storage stamp has multiple racks of storage nodes. You can imagine how much data and how many customers are involved when a storage stamp needs to failover. Failover will have an effect on the availability of applications. Also, because of the asynchronous replication, some data will be lost when a failover is performed.

Microsoft is working on an API that allows customers to failover a storage account themselves. When geo-redundant replication is enabled, you will only benefit from it when Microsoft has a major issue. Geo-redundant storage is neither a replacement for a backup nor for a disaster recovery solution.

Microsoft states that the Recover Point Objective (RPO) for Geo Redundant Storage will be about 15 minutes. That means if a failover is required, customers can lose about 15 minutes of data. Microsoft does not provide a SLA on how long geo-replication will take.

Microsoft does not give an indication for the Recovery Time Objective (RTO). The RTO indicates the time required by Microsoft to make data available again after a major failure that requires a failover. Microsoft once had to deal with a failure of storage stamps. They did not do a failover but it took many hours to restore the storage service to a normal level.

In 2013, Microsoft introduced a new feature called Read Access Geo Redundant Storage (RA-GRS). This feature allows customers to perform reads on the replicated data. This increases the read availability from 99.9 percent when GRS is used to above 99.99 percent when RA-GRS is enabled.

Microsoft charges more when RA-GRS is enabled. RA-GRS is an interesting addition for applications that are primarily meant for read-only purposes. When the primary location is not available and Microsoft has not done a failover, writes are not possible.

The availability of the Azure Virtual Machine service is not increased by enabling RA-GRS. While the VHD data is replicated and can be read, the virtual machine itself is not replicated. Perhaps this will be a feature for the future.

Disaster recovery using Azure Site Recovery

Disaster recovery has always been on the top priorities for organizations. IT has become a very important, if not mission-critical factor for doing business. A failure of IT could result in loss of money, customers, orders, and brand value.

There are many situations that can disrupt IT such as:

· Hurricanes

· Floods

· Earthquakes

· Disasters such as a failure of a nuclear power plant

· Fire

· Human error

· Outbreak of a virus

· Hardware or software failure

While these threads are clear and the risk of being hit by such a thread can be calculated, many organizations do not have a proper protection against those threads.

In three different situations, disaster recovery solutions can help an organization to continue doing business:

· Avoiding a possible failure of IT infrastructure by moving servers to a different location.

· Avoiding a disaster situation, such as hurricanes or floods, since such situations are generally well known in advance due to weather forecasting capabilities.

· Recovering as quickly as possible when a disaster has hit the data center. Disaster recovery is done when a disaster unexpectedly hit the data center, such as a fire, hardware error, or human error.

Some reasons for not having a proper disaster recovery plan are complexity, lack of time, and ignorance; however, in most cases, a lack of budget and the belief that disaster recovery is expensive are the main reasons. Almost all organizations that have been hit by a major disaster causing unacceptable periods of downtime started to implement a disaster recovery plan, including technology immediately after they recovered. However, in many cases, this insight came too late. According to Gartner, 43 percent of companies experiencing disasters never reopen and 29 percent close within 2 years.

Server virtualization has made disaster recovery a lot easier and cost effective. Verifying that your DR procedure actually works as designed and matches RTO and RPO is much easier using virtual machines.

Since Windows Server 2012, Hyper-V has a feature for asynchronous replication of virtual machine virtual disks to another location. This feature, Hyper-V Replica, is very easy to enable and configure. It does not cost extra. Hyper-V Replica is storage agnostic, which means the storage type at the primary site can be different than the storage type used in the secondary site. So, Hyper-V Replica perfectly works when your virtual machines are hosted on, for example, EMC storage while in the secondary a HP solution is used.

While replication is a must for DR, another very useful feature in DR is automation. As an administrator, you really appreciate the option to click on a button after deciding to perform a failover and sit back and relax. Recovery is mostly a stressful job when your primary location is flooded or burned and lots of things can go wrong if recovery is done manually.

This is why Microsoft designed Azure Site Recovery. Azure Site Recovery is able to assist in disaster recovery in several scenarios:

· A customer has two data centers both running Hyper-V managed by System Center Virtual Machine Manager. Hyper-V Replica is used to replicate data at the virtual machine level.

· A customer has two data centers both running Hyper-V managed by System Center Virtual Machine Manager. NetApp storage is used to replicate between two sites at the storage level.

· A customer has a single data center running Hyper-V managed by System Center Virtual Machine Manager.

· A customer has two data centers both running VMware vSphere. In this case InMage Scout software is used to replicate between two datacenters. Azure is not used for orchestration.

· A customer has a single data centers not managed by System Center Virtual Machine Manager.

In the second scenario, Microsoft Azure is used as a secondary data center if a disaster makes the primary data center unavailable.

Microsoft announced also to support a scenario where vSphere is used on-premises and Azure Site Recovery can be used to replicate data to Azure. To enable this InMage software will be used. Details were not available at the time this book was written.

In the first two described scenarios Site Recovery is used to orchestrate the failover and failback to the secondary location. The management is done using Azure Management Portal. This is available using any browser supporting HTML5. So a failover can be initiated even from a tablet or smartphone.

Using Azure as a secondary data center for disaster recovery

Azure Site Recovery went into preview in June 2014. For organizations using Hyper-V, there is no direct need to have a secondary data center as Azure can be used as a target for Hyper-V Replica.

Some of the characteristics of the service are as follows:

· Allows nondisruptive disaster recovery failover testing

· Automated reconfigure of network configuration of guests

· Storage agnostic supports any type of on-premises storage supported by Hyper-V

· Support for VSS to enable application consistency

· Protects more than 1,000 virtual machines (Microsoft tested with 2,000 virtual machines and this went well)

To be able to use Site Recovery, customers do not have to use System Center Virtual Machine Manager. Site Recovery can be used without this installed. System Center Virtual Machine Manager. Site Recovery will use information such as? virtual networks provided by SCVMM to map networks available in Microsoft Azure.

Site Recovery does not support the ability to send a copy of the virtual hard disks on removable media to an Azure data center to prevent the initial replication using WAN (seeding). Customers will need to transfer all the replication data over the network. ExpressRoute will help to get a much better throughput compared to a site-to-site VPN over the Internet.

Failover to Azure can be as simple as clicking on a single button. Site Recovery will then create new virtual machines in Azure and start the virtual machines in the order defined in the recovery plan. A recovery plan is a workflow that defines the startup sequence of virtual machines. It is possible to stop the recovery plan to allow a manual check, for example. If all is okay, the recovery plan will continue doing its job. Multiple recovery plans can be created.

Microsoft Volume Shadow Copy Services (VSS) is supported. This allows application consistency. Replication of data can be configured at intervals of 15 seconds, 5 minutes, or 15 minutes. Replication is performed asynchronously.

For recovery, 24 recovery points are available. These are like snapshots or point-in-time copies. If the most recent replica cannot be used (for example, because of damaged data), another replica can be used for restore. You can configure extended replication. In extended replication, your Replica server forwards changes that occur on the primary virtual machines to a third server (the extended Replica server). After a planned or unplanned failover from the primary server to the Replica server, the extended Replica server provides further business continuity protection. As with ordinary replication, you configure extended replication by using Hyper-V Manager, Windows PowerShell (using the –Extended option), or WMI.

At the moment, only VHD virtual disk format is supported. Generation 2 virtual machines that can be created on Hyper-V are not supported by Site Recovery. Generation 2 virtual machines have a simplified virtual hardware model and support Unified Extensible Firmware Interface (UEFI) firmware instead of BIOS-based firmware. Also, boot from PXE, SCSI hard disk, SCSCI DVD, and Secure Boot are supported in Generation 2 virtual machines.

However on March 19 Microsoft responded to numerous customer requests on support of Site Recovery for Generation 2 virtual machines. Site Recovery will soon support Gen 2 VM's. On failover, the VM will be converted to a Gen 1 VM. On failback, the VM will be converted to Gen 2. This conversion is done till the Azure platform natively supports Gen 2 VM's.

Customers using Site Recovery are charged only for consumption of storage as long as they do not perform a failover or failover test.

Failback is also supported. After running for a while in Microsoft Azure customers are likely to move their virtual machines back to the on-premises, primary data center. Site Recovery will replicate back only the changed data.

Mind that customer data is not stored in Microsoft Azure when Hyper-V Recovery Manager is used. Azure is used to coordinate the failover and recovery. To be able to do this, it stores information on network mappings, runbooks, and names of virtual machines and virtual networks. All data sent to Azure is encrypted.

By using Azure Site Recovery, we can perform service orchestration in terms of replication, planned failover, unplanned failover, and test failover. The entire engine is powered by Azure Site Recovery Manager.

Let's have a closer look on the main features of Azure Site Recovery. It enables three main scenarios:

· Test Failover or DR Drills: Enable support for application testing by creating test virtual machines and networks as specified by the user. Without impacting production workloads or their protection, HRM can quickly enable periodic workload testing.

· Planned Failovers (PFO): For compliance or in the event of a planned outage, customers can use planned failovers, virtual machines are shutdown, final changes are replicated to ensure zero data loss, and then virtual machines are brought up in order on the recovery site as specified by the RP. More importantly, failback is a single-click gesture that executes a planned failover in the reverse direction.

· Unplanned Failovers (UFO): In the event of unplanned outage or a natural disaster, HRM opportunistically attempts to shut down the primary machines if some of the virtual machines are still running when the disaster strikes. It then automates their recovery on the secondary site as specified by the RP.

If your secondary site uses a different IP subnet, Site Recovery is able to change the IP configuration of your virtual machines during the failover.

Part of the Site Recovery installation is the installation of a VMM provider. This component communicates with Microsoft Azure. Site Recovery can be used even if you have a single VMM to manage both primary and secondary sites.

Site Recovery does not rely on availability of any component in the primary site when performing a failover. So it doesn't matter if the complete site including link to Azure has been destroyed, as Site Recovery will be able to perform the coordinated failover.

Azure Site Recovery to customer owned sites is billed per protected virtual machine per month. The costs are approximately €12 per month. Microsoft bills for the average consumption of virtual machines per month. So if you are protecting 20 virtual machines in the first half and 0 in the second half, you will be charged for 10 virtual machines for that month.

When Azure is used as a target, Microsoft will only charge for consumption of storage during replication. The costs for this scenario are €40.22/month per instance protected.

As soon as you perform a test failover or actual failover Microsoft will charge for the virtual machine CPU and memory consumption.

Requirements

To be able to use Azure Site Recovery, the following items are required:

· System Center 2012 SP1 with latest cumulative update or System Center 2012 R2. Mind for small businesses Microsoft also offers Site Recovery without the need for System Center.

· A certificate.

· A Microsoft Azure subscription with the Site Recovery feature enabled.

· At least one on-premises data center with at least one instance of SCVMM 2012.

Configuring Azure Site Recovery

Enabling Azure Site Recovery involves the following steps:

1. Enable the Site Recovery vault.

2. Create and upload a certificate.

3. Download and install the Recovery Manager provider.

4. Choose which clouds should be protected.

5. Map networks.

6. Enable the virtual machines that should be protected.

A step-by-step instruction guide on how to configure Azure Site Recovery will take too many pages. You will find an excellent how-to at http://msdn.microsoft.com/en-us/library/dn788903.aspx.

In the previous chapter we discussed how to recover to Azure. One of the requirements for this scenario is Virtual Machine Manager. Not every organization needs Virtual Machine Manager. Many organizations use Hyper-V managed by native Microsoft tools or by third party such as like 5Nine Manager for Hyper-V.

In December 2014 Microsoft announced Disaster recovery for branch offices and SMB through Azure Site Recovery. This allows organizations that do not use Virtual Machine Manager to use Azure as a replication target.

The procedure to protect virtual machines running on Hyper-V is described in this Microsoft blogpost: http://azure.microsoft.com/en-us/documentation/articles/hyper-v-recovery-manager-hypervsite/.

Installing a replica Active Directory controller in Azure

Microsoft Azure can be used as a secondary location to keep a copy of Active Directory. If a disaster hits the primary site that makes Active Directory partly or totally unavailable, you have at least Active Directory still operational in a secondary location.

Think about how much time this saves compared to having to fully restore one of your most crucial assets. You can have an Active Directory server in Microsoft Azure as a replica. In this section, I will describe the steps you need to take to create such a replica.

The requirements include a VPN connection between your on-premises infrastructure and the Microsoft Azure network. For security reasons, it is not advised to have Active Directory replication over the public interface of Microsoft Azure using endpoints.

This webpage provides step-by-step instructions on how to configure AD replication to Azure: http://blogs.technet.com/b/keithmayer/archive/2013/01/20/step-by-step-extending-on-premises-active-directory-to-the-cloud-with-windows-azure-31-days-of-servers-in-the-cloud-part-20-of-31.aspx.

Using Microsoft Azure as a backup target

Storage in Microsoft Azure has many advantages. Capacity is almost unlimited, so there is no need for so-called forklift upgrades when using on-site storage. Provisioning of new storage capacity is done with few mouse clicks. You won't be able to do that when extending on-premises storage, as this involves placing cabinets in racks, supplying power, networking, and IO.

Microsoft Azure provides two options to use storage as a backup target:

· Using regular Azure Storage

· Using Azure Backup

Azure Storage is the regular blob storage that is used by many applications, virtual machine hard disks, and so on.

Azure Backup is especially targeted for storage of backup data. Microsoft charges the two quite differently. Azure Backup is almost eight times more expensive than Azure Storage. For Azure Backup, Microsoft does not charge for bandwidth, transactions, or computation. Data stored in Azure Backup is compressed.

Azure Storage can be used by many backup applications as a backup target. Some examples are Veeam Backup & Replication Cloud Edition.

Veeam Backup & Replication Cloud Edition features a virtual Disk. When enabled, administrators are able to mount any Azure Storage account as a drive letter in Windows Explorer.

Microsoft has a service that uses these capabilities of cloud storage. Azure Backup is a simple way to protect and recover files. The service can be accessed using an agent that supports Windows Server and Data Protection Manager.

Note that this is not yet an enterprise-ready solution. It lacks features such as a one-step bare metal recovery. Still, Azure Backup is very useful in branch offices, small and medium business, and other same-size environments.

When using the Microsoft supplied Windows Server Backup or DPM agent, all data is encrypted using a passphrase you select. The bandwidth usage can be throttled depending on the time of the day.

Some characteristics of the agent are:

· It supports bandwidth throttling

· It supports file servers, SQL Server, SharePoint, and Exchange

· It supports incremental backups that reduce bandwidth consumption

· It supports recovery to other than original server

· It supports data encrypting executed on-premises

· The ability to recover individual files

Microsoft Azure Backup has a limitation of maximum 1.65 TB (after installation of update KB2989574) of data per volume that can be backed up in one backup operation. The standalone server and DPM solutions have different retention maximums. In the standalone server scenario, backups can be retained in the vault for up to 30 days. This is configurable in the Windows Azure Backup agent's scheduling wizard.

Azure Backup lets you set multiple retention policies on backup data. Backup data can be stored for multiple years by maintaining more backup copies near term, and less backup copies as the backup data becomes aged. The number of backup copies that can be stored at Azure is 366.

Azure Backup integrates with the Azure Import service to send the initial backup data to Azure datacenter. This capability will enable the customers to ship the initial backup data through disk to the nearest Azure datacenter.

Many backup applications such as Veeam Backup & Replication, CA ARCserve, and Microsoft Data Protection Manager provide the ability to use Microsoft Azure as a storage target. Also, backup data directly from Windows Server can be stored in Microsoft Azure.

While capacity is unlimited, there are some caveats:

· In a scenario in which you like to backup local servers to Microsoft Azure, you should keep in mind the time required to restore that data. It can take a considerate amount of time to restore the data especially over slow WAN connections.

· While costs of cloud-based storage are dropping, in many scenarios it is more expensive than on-premises storage. Cloud-based storage is charged per month. Customers are not only charged on consumed storage capacity, but also for IO transactions and data that leaves the data center. This, however, is relatively very small percentage of the total costs.

· The costs per GB of Azure Backup are relatively high in comparison to the Azure Storage offering. At the time of writing this, costs are seven times higher than regular Azure Storage.

To use Azure Vault, it is not required to have a site-to-site VPN connection.

In this section, you are going to learn how to use Microsoft Azure for a backup target. We will do that in the following steps:

1. Enable the Backup Vault.

2. Create a certificate.

3. Download and install the backup agent.

4. Create a backup schedule.

Step 1 – enabling Azure Vault

The first step is to enable the Azure Vault. The Vault is a storage location dedicated to the storage of backup data. Data stored in Azure Vault cannot be access using storage accounts connected to your Azure subscription. The steps to enable an Azure Vault are as follows:

1. In Azure Management Portal, select the New button in the left corner.

2. Navigate to Data Services | Recovery Services | Backup Vault | Quick create.

3. Fill in a name for the vault and select the region in which you like to storage data in that vault.

Note

It is wise to select a different region for the backup Vault than the region of the servers you want to backup. This is true for the backup of on-premises servers as well as for Azure virtual machines. Suppose the east coast of the US is hit by a hurricane and your data center, power, or network connections are affected. There is a change the Azure data center is affected as well if located in the same region.

4. Click on Create Vault.

Note

It can take a while for the backup vault to be created. To check the status, you can monitor the notifications at the bottom of the portal. After the backup vault has been created, a message will tell you that the vault has been successfully created and it will be listed in the resources for Recovery Services as Online.

The next step is to create certificates. For each server you like to store backups of in the Azure Vault, you need to create a certificate.

Step 2 – creating a certificate

To connect to Vault, we need a certificate. We have two options here:

· Use any valid Secure Sockets Layer (SSL) certificate that is issued by a certification authority (CA) that is trusted by Microsoft (and whose root certificates are distributed through the Microsoft Root Certificate Program).

· Create your own certificate. This is created using the makecert tool. How to get this tool is described in Chapter 4, Building an Infrastructure on Microsoft Azure, of this book, in which we set up a client-to-site VPN connection.

Makecert is included in the Windows SDK for Windows 7 and Windows SDK for Windows 8. These can be downloaded for free. During the setup of the SDK, select Windows software Development kit. Deselect all other installation features.

After the installation is complete, makecert.exe can be located in the C:\Program Files (x86)\Windows Kits\8.0\bin\x64 folder.

Azure Vault has some requirements regarding attributes of the certificate:

· They must have key length of at least 2048 bits key

· Enhanced key usage should be Client Authentication

· The validity period should not exceed 3 years (the certificate should be an x.509 v3 certificate)

The certificate we created earlier to connect App Controller to Microsoft Azure cannot be used for this purpose. This is since that certificate uses Server Authentication as Enhanced Key Usage.

To create a certificate using the makecert tool, start a command prompt with the run as administrator option. If you start the Command Prompt without administrative privilege, the creation will fail. Do this when signed in to the Window Server you want to back up.

Type in the following command:

makecert.exe -r -pe -n CN=AzureBackup -ss my -sr localmachine -eku 1.3.6.1.5.5.7.3.2 -e 01/19/2015 -len 2048 AzureBackup.cer

The used parameter CN in the previous command is the name you like to give the certificate. Make sure the date entered in the –e parameter is not more than 3 years from the current date.

After the certificate has been created, upload it to Azure by performing the following steps:

1. In Azure Management Portal, click on Recovery Services.

2. Click on the vault you created in the previous step.

3. Click on the Manage Certificate button in the lower menu bar.

4. Navigate to the folder containing the .cer certificate, select it, and click on the check mark.

5. If the certificate is fine, there will be no errors shown. The certificate can be checked by selecting the backup vault and then click on Dashboard. Details on the certificate are shown on the right-hand side of the screen.

Step 3 – downloading and installing the Azure Backup agent

Microsoft provides free clients for the following operating systems and applications:

· Microsoft Data Protection Manager 2012 SP1 and later

· Windows Server 2012

· Windows Server 2012 Essentials

· Windows Server 2012 R2

· Windows 7 Service Pack 1, Windows 8 and Windows 8.1

· Windows Server 2008 R2 with Service Pack 1 (SP1)

· Windows Server 2008 with Service Pack 2 (SP2)

In the portal, select the backup vault. Next, select the Dashboard menu item. On the right, you will see a link to download the backup agent.

After the download has finished, start the setup. Accept or change the defaults of your choice. The required components will be installed automatically.

After a successful installation, a shortcut to Windows Server Backup is located in the Administrative Tools folder.

We are now going to configure Azure Backup to it is able to access the previously created Azure Vault. We also make sure our data is encrypted by using an encryption key.

1. Start Windows Server Backup and select Backup in the left pane.

2. Click on Register server in the right pane.

Step 3 – downloading and installing the Azure Backup agent

3. Fill in a proxy server if appropriate. Click on Next.

4. At the vault identification, click on the Browse button and select the .cer certificate we uploaded to Azure in the previous steps. When the correct certificate has been selected, you are presented with a location of the Azure Vault. Select it and click on Next.

Step 3 – downloading and installing the Azure Backup agent

5. Next, fill in a passphrase to encrypt the backup data. Select a file to store the passphrase. Make sure to store the passphrase file in a secure location such as a USB key stored in a safe. If the passphrase is lost, nobody will be able to recover the data stored in Azure.

6. Click on Register.

Step 4 – creating a backup schedule

The next step is to schedule a backup. This is a straightforward process. First, you select files and folders to include in the backup. The next step is to configure a schedule. A maximum of three backups can be made at on the same day. On the next screen of the wizard, you can specify the number of days for which backup should be retained. That is it. You now can Microsoft Azure to store backup data.

Restoring data is very simple. Just click on the Recover data button and you are guided by a wizard through the restoration process.

Recovery of virtual machines

A backup is only proven when it can be restored. So in this chapter, we are not focusing on how to install the agent and make backups. In this section, we are going to learn to perform a restore.

We do that using a fictitious scenario. Our monitoring tool noticed that a business-critical virtual machine is not running anymore. After contacting Microsoft and checking the status, you learned that there is a serious issue with storage. As we need to restore the application that runs on the failed virtual machine as soon as possible, we decide to restore the virtual machine in an alternate data center.

We cannot use Hyper-V Replica in Azure to replicate virtual machines, and we did not use another replication tool. So, we need to install a new virtual machine from scratch.

Luckily, we installed Active Directory in a different Azure data center that replicates to the primary region. The Azure Backup agent is not able to perform a bare metal restore. If you want to be able to do that, you will need a two-step approach. The first step is the use Windows Server Backup to create a backup and store it as a file on-premises. The second step is to back up this file using the Azure Backup agent to Microsoft Azure Vault.

Recovery is the opposite. Obviously, this is not an ideal situation as it is time-consuming and error-prone.

Using Microsoft StorSimple

Traditional SAN storage is often expensive. Each time when the maximum capacity has been reached, another large investment has to be made to acquire more storage. It does not scale very much because of those large capital investments.

In many cases, however, the data stored on a SAN is hardly accessed. So, why not move this data to the cloud? That is exactly what Microsoft StorSimple does; archiving hardly used data to Microsoft Azure while keeping the data accessible for applications and users.

Microsoft StorSimple is a hardware appliance that offers primary storage, archive, backup, and disaster recovery in one solution. Microsoft Azure Storage is used as a cheap and scalable storage tier to store archive, backup, and disaster recovery data. Other supported cloud platforms are, for example, Atmos, OpenStack, HP, and Amazon Web Services to name a few. However, StorSimple will only support Microsoft Azure as a storage target when the next firmware version has been released.

StorSimple is a so-called cloud integration storage (CiS) solution. It can archive unstructured data. That are common files like those created by Office, Photo editing software, and so on. StorSimple cannot be used to archive database files, virtual hard disks of operational virtual machines, or files such as Outlook .pst files.

Significant cost-saving over traditional storage and backup is accomplished by using Microsoft Azure Storage as its lowest storage tier storing archives and backup data. The solution replaces tape handling hardware offsite backup as backup data is stored in Azure as well.

Of all data created over time, on average 85 percent or more is regarded as cold. This means it is hardly ever accessed. The hardware appliance contains three tiers of local storage plus it uses cloud storage as the lowest tier.

The available tiers are as follows:

· SSD: Linear (raw tier 1)

· SSD: Deduplicated (tier 2), and concurrent inline block-level dedupe

· SAS: Deduplicated and compressed (tier 3)

· Cloud: Deduplicated, compressed, and encrypted (tier 4)

The life cycle of data starts when it is written in NVRAM. This is RAM that is battery-backed. Then, the data is written to SSD. Data is not compressed or deduped while stored in tier 1.

When data is not accessed for a while, blocks are deduplicated but still stored on SSD. The dedupe rate depends on the type of data. Storage capacity can be increased 2 to 5 times because of the deduplication.

When blocks are getting cooler they are compressed and moved to the SAS tier. Note that blocks are automatically tiered but not complete files. So, the most frequently accessed blocks of files remain stored on the fastest SSD tier, ensuring good performance.

The final stage of the data life cycle is the cloud tier. Data is encrypted and moved to Azure (or any other cloud storage).

The metadata that is used to track the location of each block is always stored locally on the StorSimple device.

Weighted Storage Layout is the technology that determines when a block of data is moved to another tier. It analyzes frequency of use, age of data, reference counts, and preferences set by administrators to decide for data move.

The last tier of storage is cloud as a tier. StorSimple is able to move data that is not accessed frequently to Microsoft Azure Storage. Note that it does not use Azure Backup Vault that has much higher costs per GB than Microsoft Azure Storage.

To make sure data stored in Azure is safe, AES-256 military-grade encryption is used for all data leaving the appliance. Only the customer has access to the encryption key. The data is presented to physical servers and virtualization hosts over the iSCSI protocol. Both Hyper-V and VMware vSphere are supported.

Many components of the appliance are redundant: power supply, storage controller, and network interface. The disk configuration is RAID-10 with a hot spare hard disk. Software upgrades are nondisruptive.

StorSimple is available in four models ranging in a storage capacity from 2 TB to 40 TB. A typical use case for StorSimple is for storage of file server data. The data stored on a file server is typically accessed frequently within weeks after creation. Then, the data is hardly accessed and is a candidate to be archived.

Another typical use case is for projects. Projects run for a limited time. After the project has finished, that data needs to be available for archiving. Storing that kind of data on cheap storage will reduce the TCO for storage a lot.

StorSimple has two types of snapshots. A snapshot is a point in time picture of the data in a volume at a given time. Local snapshots are like traditional backup. The main purpose is to recover accidentally deleted files. Each snapshot only stores changed data compared to the previous snapshot. The local snapshot is deduplicated and compressed. This has no performance impact, since it doesn't have to go out and read through all metadata to work out what blocks or files have changed, like a traditional backup. Reading through file information is one of the worst enemies of backing up unstructured data; if you have millions of files (which is common) it can take hours just to read through the data to work out what files have changed before you back up a single file. So, StorSimple can efficiently give you local points in time for quick restores that are near instant to backup and restore from.

Cloud snapshots allow us to create a snapshot of a StorSimple volume and replicate that snapshot to Azure for archiving and disaster recovery purposes. The first cloud snapshot contains the whole dataset. Later snapshots only contains the block changes. These cloud snapshots are policy-based and can be kept for hours, days, weeks, months, or years as required.

StorSimple supports a maximum of 64 storage accounts per system.

Restoring individual files

Restoring accidentally deleted files is very simple. An administrator connects to a local or cloud snapshot, clones it, and then mounts it as a drive letter or mount point. Files can then be copied from the clone back to the original volume.

Disaster recovery using Microsoft StorSimple

A unique feature of the StorSimple appliance is the ability to quickly recover from a volume failure or complete loss of a data center. StorSimple creates VSS snapshots of volumes and stores these in Azure Storage or any other supported cloud. If your primary data center has failed, the recovery of data can be performed easily and quickly.

A high level overview of the steps to recover when a data center is lost is as follows:

1. Install a replacement StorSimple device in the recovery location.

2. Connect it to the Microsoft Azure Storage account.

3. Fill in the encryption key.

4. Publish the volumes to the servers by mounting the cloud snapshots.

5. While the data is still physically located in Azure storage, applications will be able to access the data. If data is accessed, it is transferred from Azure to the StorSimple appliance.

While in a traditional disaster recovery all data is restored in one go, StorSimple restores only the requested data (hot data). Data that is not requested (cold data) is automatically restored at a later time. This so-called thin restore downloads the metadata map that describes the state of the system and provides an image of the volume's contents at the time a snapshot is taken. This map is typically 0.1 percent the size of the stored data.

As soon as the metadata map has been downloaded, systems are able to access data. For applications and users, it shows all data to be local.

Disaster recovery using Microsoft StorSimple

Microsoft has temporary offers of which StorSimple appliances are free if customers purchase $50,000 or $100,000 of Azure storage per year. Free 1 year Gold support is included in this offer.

Backing up and restoring Azure virtual machines

If you are using Microsoft Azure to host legacy applications vague, you are likely to want a backup of virtual machines.

Backup data should never be stored in the same physical location as the live data. So, we must make sure backup data is stored in another location than the Azure region that hosts our virtual machines.

As Microsoft Azure data centers do not provide backup to tape and no physical access to collect tapes, the only option to get data offsite is by using disk to disk copy.

We have three options to do that:

· Store backup data in our on-premises data center

· Store backup data in another Azure region

· Store backup data in a non-Microsoft Azure cloud

Storing backup data in an on-premises data center will mean all the backup data is transferred over a site to site VPN. This can take quite some time for the transfer to finish. Apart from data, Microsoft will charge for data leaving Azure.

Another aspect to consider is the time it will take to restore data. In case of a restore, data has to be transferred back from on-premises to the Azure data center.

Storing data in another Azure data center is another option. Note that Microsoft will charge for egress data leaving an Azure data center with destination on-premises or another Azure data center. Bandwidth available for inter-Azure data center traffic is about 10 GB per hour.

Using PowerShell, we have an option to copy blobs from one Azure data center to another. The disadvantage of a blob copy is that the virtual machine accessing that blob should be shutdown. So, in most cases, we cannot use this method. You might think, "So what about geo-replication?". Microsoft makes sure that data is replicated to another data center.

Geo-replicated data is a type of insurance for Microsoft. If Microsoft is not able to recover data due to unavailability in a storage stamp or complete data center, they can decide to make the replicated data primary. Note that as a customer, you have no control over accessing that data.

Microsoft offers read access geo-replication that is in a preview at the time of writing this book. This enables customers to read from the geo-replicated storage. This is ideal for a backup solution.

At the time of writing this, there are no third-party tools that are able to read data from geo-replicated storage. Each blob in the primary location is addressed by using a URL. To read the replicated data a different URL is required.

I do expect that backup vendors will support RA-GRS. This enables customers to restore their data themselves.

Summary

In this chapter, you learned about ways to protect virtual machines running both on-premises and in Microsoft Azure. Microsoft is frequently adding new features that help to protect applications. Azure Site Recovery is a very interesting service to protect Hyper-V virtual machines without huge investments in data center facilities, hardware, and software.

In the next chapter we are going to learn how physical and virtual servers can be migrated to Azure using various types of tools. Microsoft offers some sophisticated tools that allows us to migrate servers to Azure with very limited downtime.