Packt+ | Advance your knowledge in tech

You're reading from Microsoft SharePoint Server 2019 and SharePoint Hybrid Administration Deploy, configure, and manage SharePoint on-premises and hybrid scenarios

Product type Paperback

Published in Oct 2020

Publisher Packt

ISBN-13 9781800563735

Length 536 pages

Edition 1st Edition

Tools

SharePoint Framework

Concepts

System Administration

Author (1):

Aaron Guilmette

View More author details

Planning for disaster recovery

Disaster recovery is the set of measures you undertake when your deployment has undergone a significant failure that exceeds the capabilities of your fault tolerance. Some example scenarios that might require disaster recovery efforts include the following:

Storage failures: For example, if your storage environment has two redundant disk controllers and both of them fail before you can return the system to full capacity, or more than one disk fails simultaneously in a RAID-5 disk volume.
Virtual machine host failures: If your environment comprises virtual machines and the underlying virtual machine hypervisors fail in a way that prohibits the virtual machines supporting your environment from powering up.

Software updates: This could apply to operating system updates, application platform updates, driver updates, or other application updates that render the system unusable.
Database failure: Since the majority of SharePoint Server's services rely on storing and retrieving information from databases, a catastrophic database failure could prohibit components in the farm from working correctly.
Primary data center site compromise: Any event that impacts your primary data center, such as extended power outage, a flood or another natural disaster, network connectivity service interruption, or military action.

Your organization may require you to be prepared to resume activities in the event of any of these scenarios (or others that may apply to your environment). The ability to recover or restore operations is gauged by three measurements:

Recovery Point Objective (RPO): The RPO can be expressed in several ways, such as "the last available backup from which to initiate a restore" or "the acceptable amount of data loss."
Recovery Level Objective (RLO): A sub-function of the RPO, the RLO defines the granularities that you need to be able to recover (such as a data center, rack, host, farm, server, application, database, site, document library, folder, or file).
Recovery Time Objective (RTO): The amount of time it takes to get a system operational with the data parameters of the RPO. This can also be referred to as how long the outage can last or "how long we're down."

When starting to develop an RPO, many organizations state that "no data loss is acceptable." While no-loss solutions are possible, the more data a system contains and a higher frequency of activity could significantly impact the overall cost of a solution. Frequently, a "no data loss" policy is not cost-justifiable. The business needs to determine how valuable an outage is (quantified by the business, legal, and financial risk from an extended outage) before work can begin on recommending technical solutions.

A business recovery objective or requirement might be expressed as follows:

Must be able to recover a SharePoint farm at the document library level (RLO) in less than 2 hours (RTO) with no more than 2 hours of potential data loss (RPO).

As you put together a disaster recovery plan for SharePoint Server, it's important to start with the organization's goals (such as the number of hours of downtime or how much potential data loss is acceptable), and then recommend strategies, processes, and products based on that business requirement.

Outage costs

Outages fall into three categories, generally as follows:

Planned loss of application or service (such as a service upgrade or scheduled maintenance)
Unplanned loss of application or service
Loss of data

Loss of an application or service may prohibit your organization from generating revenue or performing required activities for the business to operate, which may have a financial impact, depending on the application or service that is inoperable. An application or service can also incur a partial loss (such as running in a degraded fashion), which may render the system usable for some activities and not for others.

Planned outages are typically communicated to business users or customers and are scheduled to happen during low periods of activity. Unplanned outages, conversely, happen without notice due to some type of system failure.

Loss of data, depending on the type of data affected, could have a significant financial impact on an organization.

Depending on the type of application or data hosted by a SharePoint Server environment and the type of outage incurred, you may need to evaluate one or more disaster recovery options.

Disaster recovery options, costs, and considerations

Disaster recovery options (and their costs) can be quite varied, from a simple backup and restore to full standby data center solutions. Here are some example of disaster recovery options:

Type	Components	Notes	Relative Deployment and Maintenance Cost	Recovery Time
Tape or disk-to-disk backup solution	Tape or disk-to-disk backup hardware, software	This simplest form of disaster recovery covers only the applications and data. It is typically the cheapest option to deploy and maintain, but it depends on the organization being able to provide infrastructure, should the need to recover data arise.	Lowest	Longest
Cold standby infrastructure	Dedicated servers ready to be configured in the event of a disaster	This solution builds on having backups by providing dedicated hardware. This hardware is not configured or maintained but is waiting for a disaster so that it can be configured to meet the exact recovery requirements. Cold standby infrastructure is typically infrastructure that can be available within hours or days.	Low	Long
Warm standby infrastructure	Dedicated servers that are regularly maintained and available	A warm standby infrastructure disaster recovery scenario leverages dedicated equipment that is kept up to date on a schedule using regular restores or synchronizations of data. Warm standby infrastructure can typically be used to make a solution available within minutes to hours.	Medium	Medium
Hot standby infrastructure	Dedicated servers that are regularly maintained and kept up to date, ready for failover	Hot standby infrastructure, like warm standby infrastructure, is dedicated equipment that is kept up to date. Unlike warm standby infrastructure, however, hot standby infrastructure is ready to take over within seconds to minutes. Hot standby infrastructure plans frequently rely on load balancing and data replication technologies.	Expensive	Shortest
Cold standby data center	Dedicated data center space with equipment ready to be provisioned	A cold standby data center strategy relies on having available equipment and backups at a secondary location. This is a somewhat expensive solution to maintain (a data center space and networking and server equipment is required, as well as ensuring backups are available) and has both high-recovery time and point objectives. It will likely take days or weeks to get a cold standby data center operational.	Somewhat expensive	Long
Warm standby data center	Dedicated data center space with pre-configured equipment, ready to accept failover or restores	Similar to a warm standby infrastructure solution, a warm standby data center disaster recovery solution means you have equipment mostly up to date at a remote location. The most recent data can be applied to this environment, typically within minutes or hours.	More expensive	Medium
Hot standby data center	Dedicated servers that are regularly maintained and kept up to date, ready for failover in a separate data center space	Building on the concepts of hot standby infrastructure, a hot standby data center recovery strategy is the most resilient (and expensive) solution to maintain as it requires both investment (data center space, dedicated equipment, software, networking, and communications) and sound process execution. Hot standby data centers can be ready within seconds to minutes and can have the lowest recovery time and recovery point objectives for overcoming full primary site disaster.	Most expensive	Shortest

As with designing a fault-tolerance strategy, you'll also want to design a disaster recovery strategy that takes failure domains into account. These failure domains might include the following:

Application, workload, database, or service
Infrastructure or platform
Farm
Data center

Finally, no disaster recovery plan is complete without documentation that allows the technicians or support staff to return services to their full operational status. These operational recovery plans (sometimes referred to as runbooks or playbooks) should include things such as the following:

Step-by-step printed instructions used to recover services from each failure or disaster mode, such as operating system installation and configuration, configuration, IP address schemes, or database names
Tested scripts for building, deploying, and testing the configuration
Operational procedures for restoring data
Correct versions of software installation media and any applicable licensing information (such as key files, licenses, or other activation/registration information necessary to bring the service online)
Emergency contact information for building access, infrastructure personnel, and application or business owners

Evaluating the business objectives (recovery time objective and recovery point objective) in conjunction with the budget will help you arrive at an appropriate disaster recovery strategy for your organization.

Azure Site Recovery is a Microsoft Azure-based disaster recovery service that can be leveraged in lieu of building and maintaining a physical disaster recovery site. It can be used as a disaster recovery solution for physical or virtual machines. For more information on configuring Azure Site Recovery for SharePoint, go to https://docs.microsoft.com/en-us/azure/site-recovery/site-recovery-sharepoint.

Next, we'll look at backup and restore as part of the SharePoint Server planning process.