There are a few principles and architectures that are commonly practiced when designing enterprise applications. First and foremost, the goal of any architecture is to support business needs at the lowest cost possible (costs being time and resources). A business wants software to enable it rather than acting as a bottleneck. In today's world, availability, reliability, and performance are the three KPIs of any system.
In this section, we will first look at the issues with monolithic architecture and then we will see how to avoid them using widely adopted and proven architectures for developing enterprise applications.
Consider a classical monolithic e-commerce website application, such as the one shown in the following diagram, with all the business providers and functionality in a single app and data being stored in a classical SQL database:
Figure 1.8 – Monolithic application
The monolithic architecture was widely adopted 15-20 years ago, but plenty of problems arose for software engineering teams when systems grew and business needs expanded over time. Let's look at some of the common issues with this approach.
Common issues with monolithic apps
Let's have a look at the scaling issues:
- In a monolithic app, the only way to horizontally scale is by adding more compute to the system. This leads to higher operational costs and unoptimized resource utilization. Sometimes, scaling becomes impossible due to conflicting needs in terms of resources.
- As all the features mostly use single storage, there is the possibility of locks leading to high latency, and there will also be physical limits as to how far a single storage instance can scale.
Here are some issues associated with availability, reliability, and performance:
- Any changes in the system will require the redeployment of all components, leading to downtime and low availability.
- Any non-persistent state, such as sessions stored in a web app, will be lost after every deployment. This will lead to the abandonment of all workflows that were triggered by users.
- Any bugs in a module, such as memory leaks or security bugs, make all the modules vulnerable and have the potential to impact the whole system.
- Due to the highly coupled nature and sharing of resources within modules, there will always be unoptimized use of resources, leading to high latency in the system.
Lastly, let's see what the impact on the business and engineering teams is:
- The impact of a change is difficult to quantify and needs extensive testing. Hence, it slows down the rate of delivery to production. Even a small change will require the entire system to be deployed again.
- In a single highly coupled system, there will always be physical limits on collaboration across teams to deliver any feature.
- New scenarios such as mobile apps, chatbots, and analysis engines will take more effort as there are no independent reusable components or services.
- Continuous deployment is almost impossible.
Let's try to solve these common problems by adopting some proven principles/architectures.
Separation of concerns/single responsibility architecture
Software should be divided into components or modules based on the kind of work it performs. Every module or component should have a single responsibility. Interaction between components should be via interfaces or messaging systems. Let's look at the n-tier and microservices architecture and how the separation of concerns is taken care of.
N-tier architecture
N-tier architecture divides the application of a system into three (or n) tiers:
- Presentation (known as the UX layer, the UI layer, or the work surface)
- Business (known as the business rules layer or the services layer)
- Data (known as the data storage and access layer)
Figure 1.9 – N-tier architecture
These tiers can be owned/managed/deployed separately. For example, multiple presentation layers, such as web, mobile, and bot layers, can leverage the same business and data tier.
Microservices architecture
Microservices architecture consists of small, loosely coupled, independent, and autonomous services. Let's look at their benefits:
- Services can be deployed and scaled independently. An issue in one service will have a local impact and can be fixed by just deploying the impacted service. There is no need to share a technology or framework.
- Services communicate with each other via well-defined APIs or a messaging system such as Azure Service Bus:
Figure 1.10 – Microservices architecture
As seen in the preceding figure, a service can be owned by independent teams and can have its own cycle. Services are responsible for managing their own data stores. Scenarios demanding lower latency can be optimized by bringing in a cache or high-performance NoSQL stores.
Domain-driven architecture
Each logical module should not have a direct dependency on another module. Each module or component should serve a single domain.
Modeling services around a domain prevents service explosion. Modules should be loosely coupled and modules that are likely to change together can be clubbed together.
Stateless services architecture
Services should not have any state. State and data should be managed independently from services, that is, externally. By delegating state externally, services will have the resources to serve more requests with high reliability.
Session affinity should not be enabled as it leads to sticky session issues and will stop you from getting the benefits of load balancing, scalability, and the distribution of traffic.
Event-driven architecture
The main features of event-driven architecture are as follows:
- In event-driven architecture, communication, which is generally known as (publisher-subscriber communication) between modules, is primarily asynchronous and achieved via events. Producers and consumers are totally decoupled from each other. The structure of the event is the only contract that is exchanged between them.
- There can be multiple consumers of the same event taking care of their specific operations; ideally, they won't even be aware of each other. Producers can continuously push events without worrying about the availability of the consumers.
- Publishers publish events via a messaging infrastructure such as queues or a service bus. Once an event is published, the messaging infrastructure is responsible for sending the event to eligible subscribers:
Figure 1.11 – Event-driven architecture
This architecture is best suited for scenarios that are asynchronous in nature. For example, long-running operations can be queued for processing. A client might poll for a status or even act as a subscriber for an event.
Data storage and access architecture
Data storage and access architecture play a vital role in the scaling, availability, and reliability of an overall system:
- A service should decide the type of data storage depending on the needs of the operation.
- Data should be partitioned and modeled according to the needs of the given operation. Hot partitions should be avoided at any cost. Replication should be opted for if you need more than one type of structure from the same data.
- The correct consistency model should be chosen for lower latency. For example, an operation that can afford to have stale data for some time should use weak/eventual consistency. Operations that have the potential to change the state and need real-time data should opt for stronger consistency.
- Caching data that is appropriate to services helps the performance of services. Areas should be identified where data can be cached. Depending on the given need, an in-memory or out-of-memory cache can be chosen.
Resiliency architecture
As the communication between components increases, so does the possibility of failures. A system should be designed to recover from any kind of failure. We will cover a few strategies for building a fault-tolerant system that can heal itself in the case of failures.
If you are familiar with Azure, you'll know that applications, services, and data should be replicated globally in at least two Azure regions for planned downtime and unplanned transient or permanent failures. Choosing Azure App Service to host web applications, using REST APIs, and choosing a globally distributed database service such as Azure Cosmos DB, is wise in these scenarios. Choosing Azure paired regions will help in business continuity and disaster recovery (BCDR), as at least one region in each pair will be prioritized for recovery if an outage affects multiple regions. Now, let's see how to tackle different types of faults.
Transient faults can occur in any type of communication or service. You need to have a strategy to recover from transient faults, such as the following:
- Identify the operation and type of a transient fault, then determine the appropriate retry count and interval.
- Avoid anti-patterns such as endless retry mechanisms with a finite number of retries or circuit breakers.
If a failure is not transient, you should respond to the failure gracefully by choosing some of the following options:
- Failing over
- Compensating for any failed operations
- Throttling/blocking the bad client/actor
- Using a leader election to select a leader in the case of a failure
Telemetry plays a big role here; you should have custom metrics to keep a tab on the health of any component. Alerts can be raised when a custom event occurs or a specific metric reaches a certain threshold.
Evolution and operations architecture
Evolution and operations play a vital role in continuous integration, deployment, staged feature rollout, and reducing downtime and costs:
- Services should be deployed independently.
- Designing an ecosystem that can scale enables a business to grow and change over time.
- A loosely coupled system is best for a business, as any change or feature can be delivered with good velocity and quality. Changes can be managed and scoped to individual components.
- Elasticity in scale leads to the better management of resources, which in turn reduces operation costs.
- A continuous build and release pipeline alongside a blue-green deployment strategy can help in identifying issues early in a system. This also enables the testing of certain hypotheses with a reduced amount of production traffic.
With this, we are done with our coverage of common enterprise architectures. Next, we will look at enterprise application requirements and different architectures through the lens of the design principles and common architectures we have learned about.