Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Microsoft SharePoint Server 2019 and SharePoint Hybrid Administration

You're reading from   Microsoft SharePoint Server 2019 and SharePoint Hybrid Administration Deploy, configure, and manage SharePoint on-premises and hybrid scenarios

Arrow left icon
Product type Paperback
Published in Oct 2020
Publisher Packt
ISBN-13 9781800563735
Length 536 pages
Edition 1st Edition
Arrow right icon
Author (1):
Arrow left icon
Aaron Guilmette Aaron Guilmette
Author Profile Icon Aaron Guilmette
Aaron Guilmette
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Overview of SharePoint Server 2019 2. Planning a SharePoint Farm FREE CHAPTER 3. Managing and Maintaining a SharePoint Farm 4. Implementing Authentication 5. Managing Site Collections 6. Configuring Business Connectivity Services 7. Planning and Configuring Managed Metadata 8. Managing Search 9. Exploring Office Service Applications 10. Overview of SharePoint Hybrid 11. Planning a Hybrid Configuration and Topology 12. Implementing Hybrid Teamwork Artifacts 13. Implementing a Hybrid Search Service Application 14. Implementing a Data Gateway 15. Using Power Automate with a Data Gateway 16. Overview of the Migration Process 17. Migrating Data and Content 18. Other Books You May Enjoy

Designing for high availability

When designing any system for high availability, a number of questions/concerns are typically addressed, such as the following:

  • What types of failures should a system be able to sustain?
  • How many failures should a system be able to sustain?
  • What steps (manual or automatic) need to be executed to ensure availability?
  • What systems or processes can we put in place to avoid interruptions in the first place?

These types of questions speak to the concept of dependability. A dependable system is one that is available to service a request and is able to continue serving requests despite failures of the component architecture (such as a server or network device) or supporting services (such as electricity). Dependability has six core attributes:

  • Availability: Measures the system's readiness to accept and respond to new requests for service
  • Reliability: Measures how a system can continue to operate after an unexpected event
  • Safety: Measures a system's level of risk to users and the environment
  • Confidentiality: The ability to control or prevent unauthorized disclosure of information
  • Integrity: Measures the presence or absence of an improper system alteration (such as data corruption)
  • Maintainability: A qualitative measurement for how easily a system is kept current, repaired, or updated

When designing a system, these ideas or attributes of dependability can be used when building a Fault-Error-Failure chain to help identify potential errors and solve them before they are expressed during operation.

The Fault-Error-Failure chain design principles are used in the development of most modern, highly available systems. The original work that introduces this, Fundamental Concepts of Dependability, is available at https://www.cs.rutgers.edu/~rmartin/teaching/spring03/cs553/readings/avizienis00.pdf.

From a practical standpoint, these questions of dependability can be broken up into four main categories:

  • Fault forecasting
  • Fault avoidance
  • Fault removal
  • Fault tolerance

Let's examine each of these with regard to designing a highly available SharePoint Server environment.

Fault forecasting

Fault forecasting is the prediction of likely or potential failures. With respect to SharePoint Server architectures, some of the following components come to mind:

  • Server hardware, including components such as memory, chassis, power supplies, or mainboards
  • Storage hardware, including components such as disk drives or other storage media, storage array software or firmware, or disk controllers
  • Networking, including device (switch, router, firewall, proxy, and load-balancers) and cabling components, and inbound and outbound connectivity to the internet or other sites
  • Power, including any power cables, switch boxes, outlets, power strips, uninterruptible power supplies, building or site power, and redundant power generation
  • Software, such as application binaries or updates, Secure Sockets Layer (SSL) certificates, operating system binaries or updates, database servers, application services, and components

Each of those component categories represents one or more potential failures for an environment. In the forecasting stage, it's important to determine as many things as possible that can go wrong, as well as the likelihood and service impact of each.

Faults will happen in any environment, so devising strategies to identify potential faults and their impacts will help you design highly available systems.

Fault avoidance

Once potential faults in architecture have been identified, you can design around them. The premise of fault avoidance (or fault prevention) is to introduce elements that prevent faults. In the context of SharePoint Server architecture, this can mean several things, such as the following:

  • Rigorous change control processes to understand modifications being made to the environment
  • Development, test, or other sandbox-style environments where modifications are made and evaluated prior to production deployment
  • Automated or scripted procedures to reduce the opportunity of human-caused failures
  • Planning for redundancy and multiple failure modes

Fault avoidance is critical from both the design and operational perspectives to help ensure a high level of service and availability for a given service or application.

Fault removal

The goal of fault removal is to reduce the number and severity of service faults. Fault removal activities can be broadly divided into two categories:

  • During the planning, design, or development of a system
  • During the operation of a system

From a SharePoint Server perspective, removing faults during the development or planning of a system is the iterative process of identifying potential faults, such as disk drive or database failure (fault forecasting), designing a system to mitigate or prevent them (fault avoidance), and then performing testing that would trigger a particular failure mode.

For example, if you are planning for disk drive failure in a storage array, you would do the following:

  1. Implement a storage subsystem with redundant features, such as disk mirroring.
  2. Deploy an application or service utilizing the storage subsystem.
  3. Introduce a failure, such as removing a disk drive, that would normally trigger a system failure.
  4. Verify that the application or service continues to operate.

If the service or application fails to continue operating, you need to review the error logs and conditions, revise the deployment methodology or design, and then repeat the testing. Through this process, you can provide assurance to the business that the system will perform as designed.

Addressing the concept of fault removal during operation, using the previous example of disk drive failure, might look something like this:

  1. The disk in the storage subsystem fails.
  2. The disk subsystems continue operating in a degraded state.
  3. The technician replaces the failed disk.
  4. The system returns to a normal operational state.

In the preceding example, Step 1 is the failure mode. Step 2 indicates that the system's design has successfully resulted in continuing operations. In Step 3, the technician is performing fault removal by removing a failed device and replacing it with an operational one. In Step 4, the system has recovered and has returned to a normal operating state, free of faults.

In the previous failure scenario, the disk subsystem may have been designed to sustain the failure of a single disk drive. After the disk has failed in Step 1, the system is then at risk until the disk has been replaced in Step 3. The ability for a system to continue operation is compromised with each further fault, so it's important to minimize the amount of time between the steps.

Fault tolerance

Finally, the design goal of fault tolerance is to address how systems react when faults happen. As we've already stated, faults will happen. Fault-tolerant design plays a crucial role in allowing services to continue while faults are removed.

As a practitioner, you'll often be faced with choices and trade-offs to make on fault-tolerant designs, such as spending resources on redundant database hardware or additional servers in the SharePoint Server farm.

When designing highly available, fault-tolerant design for SharePoint, you'll likely need to incorporate the following components:

Fault Domains

Examples

Rack and power infrastructure

Server racks, power distribution units, power circuits, uninterruptible power supplies, fans, and cooling equipment

Physical server infrastructure and components

Servers, server chassis, server backplanes or midplanes, hard disk drives, controllers, network interface cards, and processors

Virtual server infrastructure and components

Virtual machine hosts

Network infrastructure and components

Rack-based switches, cabling, core switching, load balancers and traffic directors, and firewalls

Storage infrastructure and components

Storage networking components, disk arrays, disks, disk controllers, and Redundant Array of Independent Disks (RAID) settings.

Application services and components

SharePoint application servers, Distributed Cache servers, User Profile Service, and the Search Service application

Database services and components

The SQL Server database failover clustering or AlwaysOn availability groups for content, configuration, and service application databases

In the fault forecasting step, you identified potential failures that could affect the SharePoint Server system and designed methods in the fault avoidance step to help mitigate or reduce the impact of the faults on the environment.

In addition to fault-tolerant designs, you also need to make preparations for how to recover from catastrophic failures (such as a natural disaster) that spans all components in either a single fault domain or multiple fault domains.

In the next section, we'll look at using highly available designs to mitigate the impact of failures of various service databases.

Supported SharePoint high-availability designs

A SharePoint farm has many moving pieces. A successful highly available design requires understanding how the various components can be made resilient. The following table lists the database design considerations:

Service Database

Supports Database Mirroring for High Availability

Supports Database Mirroring or Log Shipping for Disaster Recovery

Supports SQL AlwaysOn Availability Group for Availability

Supports SQL AlwaysOn Availability Group for Disaster Recovery

Configuration database

X

X

Central Administration database

X

X

Content database(s)

X

X

X

X

App Management database

X

X

X

X

Business Connectivity Service database

X

X

X

X

Managed Metadata Service database

X

X

X

X

PerformancePoint Services database

X

X

X

X

Power Pivot Service database

X

X

X

X

Project Server database

X

X

X

X

SharePoint Search Service – administration database

X

X

SharePoint Search Service – analytics reporting database

X

X

X

SharePoint Search Service – crawl database

X

X

SharePoint Search Service – link database

X

X

Secure Store database

X

X

X

X

SharePoint Translation Services database

X

X

X

X

State Service database

X

Subscription Settings database

X

X

X

X

Usage and Health Collection database

X

X

X

User Profile Service – profile database

X

X

X

X

User Profile Service – synchronization database

X

X

X

X

User Profile Service – social tagging database

X

X

X

X

Word Automation Services database

X

X

X

X

For more information on the specific SQL or SharePoint versions necessary to support certain high-availability designs, go to https://docs.microsoft.com/en-us/sharepoint/administration/supported-high-availability-and-disaster-recovery-options-for-sharepoint-databas.

One of the common threads you'll see in the databases' availability design is the support for SQL Server AlwaysOn availability groups. Microsoft recommends AlwaysOn availability groups for all databases in a SharePoint Server environment from the perspective of same-farm high availability.

Service Applications support high availability behind load-balancers. After using the SharePoint product configuration wizard to configure a role for your server, add a configuration object (such as a virtual IP) to your load balancer that includes all of the servers hosting an application or service.

While a fault-tolerant and resilient design is important from a design and day-to-day operational perspective, you also need a plan for business continuity concerns in the event of a significant problem. That is where disaster-recovery planning is helpful.

You have been reading a chapter from
Microsoft SharePoint Server 2019 and SharePoint Hybrid Administration
Published in: Oct 2020
Publisher: Packt
ISBN-13: 9781800563735
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image