There are a few principles and architectures that are commonly practiced when designing enterprise applications. First and foremost, the goal of any architecture is to support business needs at the lowest cost possible (costs being time and resources). A business wants software to enable it rather than act as a bottleneck. In today's world, availability, reliability, and performance are the three KPIs of any system.
In this section, first, we will look at the issues with monolithic architectures, and then we will see how to avoid them by using widely adopted and proven architectures for developing enterprise applications.
Consider a classical monolithic e-commerce website application, such as the one shown in the following diagram, with all the business providers and functionality in a single app and data being stored in a classical SQL database:
Figure 1.8 – A monolithic app
The monolithic architecture was widely adopted 15–20 years ago, but plenty of problems arose for software engineering teams when systems grew and business needs expanded over time. Let's look at some of the common issues with this approach.
Common issues with monolithic apps
Let's take a look at the scaling issues:
- In a monolithic app, the only way to horizontally scale is by adding more compute to the system. This leads to higher operational costs and unoptimized resource utilization. Sometimes, scaling becomes impossible due to conflicting needs in terms of resources.
- As all the features mostly use single storage, there is the possibility of locks leading to high latency, and there will also be physical limits as to how far a single storage instance can scale.
Here is a list of issues associated with availability, reliability, and performance:
- Any changes in the system will require the redeployment of all components, leading to downtime and low availability.
- Any non-persistent state, such as sessions stored in a web app, will be lost after every deployment. This will lead to the abandonment of all workflows that were triggered by users.
- Any bugs in a module, such as memory leaks or security bugs, make all the modules vulnerable and have the potential to impact the whole system.
- Due to the highly coupled nature and sharing of resources within modules, there will always be unoptimized use of resources, leading to high latency in the system.
Lastly, let's see what the impact on the business and engineering teams is:
- The impact of a change is difficult to quantify and requires extensive testing. Hence, it slows down the rate of delivery to production. Even a small change will require the entire system to be deployed again.
- In a single highly coupled system, there will always be physical limits on collaborations across teams to deliver any features.
- New scenarios such as mobile apps, chatbots, and analysis engines will take more effort as there are no independent reusable components or services.
- Continuous deployment is almost impossible.
Let's try to solve these common problems by adopting some proven principles/ architectures.
Separation of concerns/single-responsibility architecture
Software should be divided into components or modules based on the kind of work it performs where every module or component owns a single responsibility from the entire software's responsibility. Interaction between components happens via interfaces or messaging systems. Let's look at the n-tier and microservices architecture and how the separation of concerns is taken care of.
N-tier architecture
N-tier architecture divides the application of a system into three (or n) tiers:
- Presentation (known as the UX layer, the UI layer, or the work surface)
- Business (known as the business rules layer or the services layer)
- Data (known as the data storage and access layer)
Figure 1.9 – N-tier architecture
These tiers can be owned/managed/deployed separately. For example, multiple presentation layers, such as the web, mobile, and bot layers, can leverage the same business and data tier.
Microservices architecture
Microservices architecture consists of small, loosely coupled, independent, and autonomous services. Let's see their benefits:
- Services can be deployed and scaled independently. An issue in one service will have a local impact and can be fixed by just deploying the impacted service. There is no compulsion to share technology or frameworks.
- Services communicate with each other via well-defined APIs or messaging systems such as the Azure service bus.
Figure 1.10 – Microservices architecture
As you can see in the preceding diagram, a service can be owned by independent teams and have its own cycle. Services are responsible for managing their own data stores. Scenarios demanding lower latency can be optimized by bringing in a cache or high-performance NoSQL stores.
Stateless services architecture
Services should not have any state. State and data should be managed independently from services, that is, externally through a data store such as a distributed cache or a database. By delegating the state externally, services will have the resources to serve more requests with high reliability. The following diagram shows an example of stateful services on the left-hand side. Here, state is maintained in each service through an in-memory cache or session provider, whereas a stateless service, as shown on the right-hand side, manages state and data externally.
Figure 1.11 – Stateful (left) versus stateless (right)
Session affinity should not be enabled as it leads to sticky session issues and will stop you from getting the benefits of load balancing, scalability, and the distribution of traffic.
Event-driven architecture
The main features of event-driven architectures are listed as follows:
- In an event-driven architecture, communication, which is generally known as publisher-subscriber communication, between modules, is primarily asynchronous and achieved via events. Producers and consumers are totally decoupled from each other. The structure of the event is the only contract that is exchanged between them.
- There can be multiple consumers of the same event taking care of their specific operations; ideally, they won't even be aware of each other. Producers can continuously push events without worrying about the availability of consumers.
- Publishers publish events via a messaging infrastructure such as queues or a service bus. Once an event has been published, the messaging infrastructure is responsible for sending the event to eligible subscribers.
Figure 1.12 – Event-driven architecture
This architecture is best suited for scenarios that are asynchronous in nature. For example, long-running operations can be queued for processing. A client might poll for status or even act as a subscriber for an event.
Resiliency architecture
As the communication between components increases, so does the possibility of failures. A system should be designed to recover from any kind of failure. We will cover a few strategies for building a fault-tolerant system that can heal itself in the case of failures.
If you are familiar with Azure, you'll know that applications, services, and data should be replicated globally in at least two Azure regions for planned downtime and unplanned transient or permanent failures, as shown in the following screenshot. In these scenarios, choosing Azure App Service to host web applications, using REST APIs, and choosing a globally distributed database service, such as Azure Cosmos DB, is wise. Choosing Azure paired regions will help in business continuity and disaster recovery (BCDR), as at least one region in each pair will be prioritized for recovery if an outage affects multiple regions.
Figure 1.13 – Resiliency architecture
Now, let's see how to tackle different types of faults.
Transient faults can occur in any type of communication or service. You need to have a strategy to recover from transient faults, such as the following:
- Identify the operation and type of transient fault. Then, determine the appropriate retry count and interval.
- Avoid anti-patterns such as endless retry mechanisms with a finite number of retries or circuit breakers.
If a failure is not transient, you should respond to the failure gracefully by choosing some of the following options:
- Failing over
- Compensating for any failed operations
- Throttling/blocking the bad client/actor
- Using a leader election to select a leader in the case of a failure
Here, telemetry plays a big role; you should have custom metrics to keep a tab on the health of any component. Alerts can be raised when a custom event occurs or a specific metric reaches a certain threshold.
With this, we are done with our coverage of common enterprise architectures. Next, we will look at the requirements of enterprise applications and their different architectures through the lens of the design principles and common architectures that we learned about earlier.