Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

Tech Guides - Design Patterns

3 Articles
article-image-developers-guide-to-software-architecture-patterns
Sugandha Lahoti
06 Aug 2018
11 min read
Save for later

Developer's guide to Software architecture patterns

Sugandha Lahoti
06 Aug 2018
11 min read
As we all know, patterns are a kind of simplified and smarter solution for a repetitive concern or recurring challenge in any field of importance. In the field of software engineering, there are primarily many designs, integration, and architecture patterns. In this article, we will cover the need for software patterns and describe the most prominent and dominant software architecture patterns. This article is an excerpt from Architectural Patterns by Pethuru Raj, Anupama Raman, and Harihara Subramanian. Why software patterns? There is a bevy of noteworthy transformations happening in the IT space, especially in software engineering. The complexity of recent software solutions is continuously going up due to the continued evolution of the business expectations. With complex software, not only does the software development activity become very difficult, but also the software maintenance and enhancement tasks become tedious and time-consuming. Software patterns come as a soothing factor for software architects, developers, and operators. Types of software patterns Several newer types of patterns are emerging in order to cater to different demands. This section throws some light on these. An architecture pattern expresses a fundamental structural organization or schema for complex systems. It provides a set of predefined subsystems, specifies their unique responsibilities, and includes the decision-enabling rules and guidelines for organizing the relationships between them. The architecture pattern for a software system illustrates the macro-level structure for the whole software solution. A design pattern provides a scheme for refining the subsystems or components of a system, or the relationships between them. It describes a commonly recurring structure of communicating components that solves a general design problem within a particular context. The design pattern for a software system prescribes the ways and means of building the software components. There are other patterns, too. The dawn of the big data era mandates for distributed computing. The monolithic and massive nature of enterprise-scale applications demands microservices-centric applications. Here, application services need to be found and integrated in order to give an integrated result and view. Thus, there are integration-enabled patterns. Similarly, there are patterns for simplifying software deployment and delivery. Other complex actions are being addressed through the smart leverage of simple as well as composite patterns. Software architecture patterns Let's look at some of the prominent and dominant software architecture patterns. Object-oriented architecture (OOA) Objects are the fundamental and foundational building blocks for all kinds of software applications. Therefore, the object-oriented architectural style has become the dominant one for producing object-oriented software applications. Ultimately, a software system is viewed as a dynamic collection of cooperating objects, instead of a set of routines or procedural instructions. We know that there are proven object-oriented programming methods and enabling languages, such as C++, Java, and so on. The properties of inheritance, polymorphism, encapsulation, and composition being provided by OOA come in handy in producing highly modular (highly cohesive and loosely coupled), usable and reusable software applications. The object-oriented style is suitable if we want to encapsulate logic and data together in reusable components. Also, the complex business logic that requires abstraction and dynamic behavior can effectively use this OOA. Component-based assembly (CBD) architecture Monolithic and massive applications can be partitioned into multiple interactive and smaller components. When components are found, bound, and composed, we get the full-fledged software applications.  CBA does not focus on issues such as communication protocols and shared states. Components are reusable, replaceable, substitutable, extensible, independent, and so on. Design patterns such as the dependency injection (DI) pattern or the service locator pattern can be used to manage dependencies between components and promote loose coupling and reuse. Such patterns are often used to build composite applications that combine and reuse components across multiple applications. Aspect-oriented programming (AOP) aspects are another popular application building block. By deft maneuvering of this unit of development, different applications can be built and deployed. The AOP style aims to increase modularity by allowing the separation of cross-cutting concerns. AOP includes programming methods and tools that support the modularization of concerns at the level of the source code. Agent-oriented software engineering (AOSE) is a programming paradigm where the construction of the software is centered on the concept of software agents. In contrast to the proven object-oriented programming, which has objects (providing methods with variable parameters) at its core, agent-oriented programming has externally specified agents with interfaces and messaging capabilities at its core. They can be thought of as abstractions of objects. Exchanged messages are interpreted by receiving agents, in a way specific to its class of agents. Domain-driven design (DDD) architecture Domain-driven design is an object-oriented approach to designing software based on the business domain, its elements and behaviors, and the relationships between them. It aims to enable software systems that are a correct realization of the underlying business domain by defining a domain model expressed in the language of business domain experts. The domain model can be viewed as a framework from which solutions can then be readied and rationalized. DDD is good if we have a complex domain and we wish to improve communication and understanding within the development team. DDD can also be an ideal approach if we have large and complex enterprise data scenarios that are difficult to manage using the existing techniques. Client/server architecture This pattern segregates the system into two main applications, where the client makes requests to the server. In many cases, the server is a database with application logic represented as stored procedures. This pattern helps to design distributed systems that involve a client system and a server system and a connecting network. The main benefits of the client/server architecture pattern are: Higher security: All data gets stored on the server, which generally offers a greater control of security than client machines. Centralized data access: Because data is stored only on the server, access and updates to the data are far easier to administer than in other architectural styles. Ease of maintenance: The server system can be a single machine or a cluster of multiple machines. The server application and the database can be made to run on a single machine or replicated across multiple machines to ensure easy scalability and high availability. However, the traditional two-tier client/server architecture pattern has numerous disadvantages. Firstly, the tendency of keeping both application and data on a server can negatively impact system extensibility and scalability. The server can be a single point of failure. The reliability is the main worry here. To address these issues, the client-server architecture has evolved into the more general three-tier (or N-tier) architecture. This multi-tier architecture not only surmounts the issues just mentioned but also brings forth a set of new benefits. Multi-tier distributed computing architecture The two-tier architecture is neither flexible nor extensible. Hence, multi-tier distributed computing architecture has attracted a lot of attention. The application components can be deployed in multiple machines (these can be co-located and geographically distributed). Application components can be integrated through messages or remote procedure calls (RPCs), remote method invocations (RMIs), common object request broker architecture (CORBA), enterprise Java beans (EJBs), and so on. The distributed deployment of application services ensures high availability, scalability, manageability, and so on. Web, cloud, mobile, and other customer-facing applications are deployed using this architecture. Thus, based on the business requirements and the application complexity, IT teams can choose the simple two-tier client/server architecture or the advanced N-tier distributed architecture to deploy their applications. These patterns are for simplifying the deployment and delivery of software applications to their subscribers and users. Layered/tiered architecture This pattern is an improvement over the client/server architecture pattern. This is the most commonly used architectural pattern. Typically, an enterprise software application comprises three or more layers: presentation/user interface layer, business logic layer, and data persistence layer. The presentation layer is primarily usded for user interface applications (thick clients) or web browsers (thin clients). With the fast proliferation of mobile devices, mobile browsers are also being attached to the presentation layer. Such tiered segregation comes in handy in managing and maintaining each layer accordingly. The power of plug-in and play gets realized with this approach. Additional layers can be fit in as needed. There are model view controller (MVC) pattern-compliant frameworks hugely simplifying enterprise-grade and web-scale applications. MVC is a web application architecture pattern. The main advantage of the layered architecture is the separation of concerns. That is, each layer can focus solely on its role and responsibility. The layered and tiered pattern makes the application: Maintainable Testable Easy to assign specific and separate roles Easy to update and enhance layers separately This architecture pattern is good for developing web-scale, production-grade, and cloud-hosted applications quickly and in a risk-free fashion. When there are business and technology changes, this layered architecture comes in handy in embedding newer things in order to meet varying business requirements. Event-driven architecture (EDA) The world is eventually becoming event-driven. That is, applications have to be sensitive and responsive proactively, pre-emptively, and precisely. Whenever there is an event happening, applications have to receive the event information and plunge into the necessary activities immediately. The request and reply notion paves the way for the fire and forgets tenet. The communication becomes asynchronous. There is no need for the participating applications to be available online all the time. EDA is typically based on an asynchronous message-driven communication model to propagate information throughout an enterprise. It supports a more natural alignment with an organization's operational model by describing business activities as series of events. EDA does not bind functionally disparate systems and teams into the same centralized management model. EDA ultimately leads to highly decoupled systems. The common issues being introduced by system dependencies are getting eliminated through the adoption of the proven and potential EDA. We have seen various forms of events used in different areas. There are business and technical events. Systems update their status and condition emitting events to be captured and subjected to a variety of investigations in order to precisely understand the prevailing situations. The submission of web forms and clicking on some hypertexts generate events to be captured. Incremental database synchronization mechanisms, RFID readings, email messages, short message service (SMS), instant messaging, and so on are events not to be taken lightly. There are event processing engines, message-oriented middleware (MoM) solutions such as message queues and brokers to collect and stock event data and messages. Millions of events can be collected, parsed, and delivered through multiple topics through these MoM solutions. As event sources/producers publish notifications, event receivers can choose to listen to or filter out specific events and make proactive decisions in real-time on what to do next. EDA style is built on the fundamental aspects of event notifications to facilitate immediate information dissemination and reactive business process execution. In an EDA environment, information can be propagated to all the services and applications in real-time. The EDA pattern enables highly reactive enterprise applications. Real-time analytics is the new normal with the surging popularity of the EDA pattern. Service-oriented architecture (SOA) With the arrival of service paradigms, software packages and libraries are being developed as a collection of services. Services are capable of running independently of the underlying technology. Also, services can be implemented using any programming and script languages. Services are self-defined, autonomous, and interoperable, publicly discoverable, assessable, accessible, reusable, and compostable. Services interact with one another through messaging. There are service providers/developers and consumers/clients. Every service has two parts: the interface and the implementation. The interface is the single point of contact for requesting services. Interfaces give the required separation between services. All kinds of deficiencies and differences of service implementation get hidden by the service interface. Precisely speaking, SOA enables application functionality to be provided as a set of services, and the creation of personal as well as professional applications that make use of software services. In short, SOA is for service-enablement and service-based integration of monolithic and massive applications. The complexity of enterprise process/application integration gets moderated through the smart leverage of the service paradigm. To summarize, we detailed the prominent and dominant software architecture patterns and how they are used for producing and running any kind of enterprise-class and production-grade software applications. To know more about patterns associated with object-oriented, component-based, client-server, and cloud architectures, grab the book Architectural Patterns. Why we need Design Patterns? Implementing 5 Common Design Patterns in JavaScript (ES8) An Introduction to Node.js Design Patterns
Read more
  • 0
  • 0
  • 12773

article-image-famous-gang-of-four-design-patterns
Sugandha Lahoti
10 Jul 2018
14 min read
Save for later

Meet the famous 'Gang of Four' design patterns

Sugandha Lahoti
10 Jul 2018
14 min read
A design pattern is a reusable solution to a recurring problem in software design. It is not a finished piece of code but a template that helps to solve a particular problem or family of problems. In this article, we will talk about the Gang of Four design patterns. The gang of four, authors Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides, initiated the concept of Design Pattern in Software development. These authors are collectively known as Gang of Four (GOF). We are going to focus on the design patterns from the Scala point of view. All different design patterns can be grouped into the following types: Creational Structural Behavioral These three groups contain the famous Gang of Four design patterns.  In the next few subsections, we will explain the main characteristics of the listed groups and briefly present the actual design patterns that fall under them. This article is an excerpt from Scala Design Patterns - Second Edition by Ivan Nikolov. In this book, you will learn how to write efficient, clean, and reusable code with Scala. Creational design patterns The creational design patterns deal with object creation mechanisms. Their purpose is to create objects in a way that is suitable to the current situation, which could lead to unnecessary complexity and the need for extra knowledge if they were not there. The main ideas behind the creational design patterns are as follows: Knowledge encapsulation about the concrete classes Hiding details about the actual creation and how objects are combined We will be focusing on the following creational design patterns in this article: The abstract factory design pattern The factory method design pattern The lazy initialization design pattern The singleton design pattern The object pool design pattern The builder design pattern The prototype design pattern The following few sections give a brief definition of what these patterns are. The abstract factory design pattern This is used to encapsulate a group of individual factories that have a common theme. When used, the developer creates a specific implementation of the abstract factory and uses its methods in the same way as in the factory design pattern to create objects. It can be thought of as another layer of abstraction that helps to instantiate classes. The factory method design pattern This design pattern deals with the creation of objects without explicitly specifying the actual class that the instance will have—it could be something that is decided at runtime based on many factors. Some of these factors can include operating systems, different data types, or input parameters. It gives developers the peace of mind of just calling a method rather than invoking a concrete constructor. The lazy initialization design pattern This design pattern is an approach to delay the creation of an object or the evaluation of a value until the first time it is needed. It is much more simplified in Scala than it is in an object-oriented language such as Java. The singleton design pattern This design pattern restricts the creation of a specific class to just one object. If more than one class in the application tries to use such an instance, then this same instance is returned for everyone. This is another design pattern that can be easily achieved with the use of basic Scala features. The object pool design pattern This design pattern uses a pool of objects that are already instantiated and ready for use. Whenever someone requires an object from the pool, it is returned, and after the user is finished with it, it puts it back into the pool manually or automatically. A common use for pools are database connections, which generally are expensive to create; hence, they are created once and then served to the application on request. The builder design pattern The builder design pattern is extremely useful for objects with many possible constructor parameters that would otherwise require developers to create many overrides for the different scenarios an object could be created in. This is different to the factory design pattern, which aims to enable polymorphism. Many of the modern libraries today employ this design pattern. As we will see later, Scala can achieve this pattern really easily. The prototype design pattern This design pattern allows object creation using a clone() method from an already created instance. It can be used in cases when a specific resource is expensive to create or when the abstract factory pattern is not desired. Structural design patterns Structural design patterns exist in order to help establish the relationships between different entities in order to form larger structures. They define how each component should be structured so that it has very flexible interconnecting modules that can work together in a larger system. The main features of structural design patterns include the following: The use of the composition to combine the implementations of multiple objects Help build a large system made of various components by maintaining a high level of flexibility In this article, we will focus on the following structural design patterns: The adapter design pattern The decorator design pattern The bridge design pattern The composite design pattern The facade design pattern The flyweight design pattern The proxy design pattern The next subsections will put some light on what these patterns are about. The adapter design pattern The adapter design pattern allows the interface of an existing class to be used from another interface. Imagine that there is a client who expects your class to expose a doWork() method. You might have the implementation ready in another class, but the method is called differently and is incompatible. It might require extra parameters too. This could also be a library that the developer doesn't have access to for modifications. This is where the adapter can help by wrapping the functionality and exposing the required methods. The adapter is useful for integrating the existing components. In Scala, the adapter design pattern can be easily achieved using implicit classes. The decorator design pattern Decorators are a flexible alternative to sub classing. They allow developers to extend the functionality of an object without affecting other instances of the same class. This is achieved by wrapping an object of the extended class into one that extends the same class and overrides the methods whose functionality is supposed to be changed. Decorators in Scala can be built much more easily using another design pattern called stackable traits. The bridge design pattern The purpose of the bridge design pattern is to decouple an abstraction from its implementation so that the two can vary independently. It is useful when the class and its functionality vary a lot. The bridge reminds us of the adapter pattern, but the difference is that the adapter pattern is used when something is already there and you cannot change it, while the bridge design pattern is used when things are being built. It helps us to avoid ending up with multiple concrete classes that will be exposed to the client. You will get a clearer understanding when we delve deeper into the topic, but for now, let's imagine that we want to have a FileReader class that supports multiple different platforms. The bridge will help us end up with FileReader, which will use a different implementation, depending on the platform. In Scala, we can use self-types in order to implement a bridge design pattern. The composite design pattern The composite is a partitioning design pattern that represents a group of objects that are to be treated as only one object. It allows developers to treat individual objects and compositions uniformly and to build complex hierarchies without complicating the source code. An example of composite could be a tree structure where a node can contain other nodes, and so on. The facade design pattern The purpose of the facade design pattern is to hide the complexity of a system and its implementation details by providing the client with a simpler interface to use. This also helps to make the code more readable and to reduce the dependencies of the outside code. It works as a wrapper around the system that is being simplified and, of course, it can be used in conjunction with some of the other design patterns mentioned previously. The flyweight design pattern The flyweight design pattern provides an object that is used to minimize memory usage by sharing it throughout the application. This object should contain as much data as possible. A common example given is a word processor, where each character's graphical representation is shared with the other same characters. The local information then is only the position of the character, which is stored internally. The proxy design pattern The proxy design pattern allows developers to provide an interface to other objects by wrapping them. They can also provide additional functionality, for example, security or thread-safety. Proxies can be used together with the flyweight pattern, where the references to shared objects are wrapped inside proxy objects. Behavioral design patterns Behavioral design patterns increase communication flexibility between objects based on the specific ways they interact with each other. Here, creational patterns mostly describe a moment in time during creation, structural patterns describe a more or less static structure, and behavioral patterns describe a process or flow. They simplify this flow and make it more understandable. The main features of behavioral design patterns are as follows: What is being described is a process or flow The flows are simplified and made understandable They accomplish tasks that would be difficult or impossible to achieve with objects In this article, we will focus our attention on the following behavioral design patterns: The value object design pattern The null object design pattern The strategy design pattern The command design pattern The chain of responsibility design pattern The interpreter design pattern The iterator design pattern The mediator design pattern The memento design pattern The observer design pattern The state design pattern The template method design pattern The visitor design pattern The following subsections will give brief definitions of the aforementioned behavioral design patterns. The value object design pattern Value objects are immutable and their equality is based not on their identity, but on their fields being equal. They can be used as data transfer objects, and they can represent dates, colors, money amounts, numbers, and more. Their immutability makes them really useful in multithreaded programming. The Scala programming language promotes immutability, and value objects are something that naturally occur there. The null object design pattern Null objects represent the absence of a value and they define a neutral behavior. This approach removes the need to check for null references and makes the code much more concise. Scala adds the concept of optional values, which can replace this pattern completely. The strategy design pattern The strategy design pattern allows algorithms to be selected at runtime. It defines a family of interchangeable encapsulated algorithms and exposes a common interface to the client. Which algorithm is chosen could depend on various factors that are determined while the application runs. In Scala, we can simply pass a function as a parameter to a method, and depending on the function, a different action will be performed. The command design pattern This design pattern represents an object that is used to store information about an action that needs to be triggered at a later time. The information includes the following: The method name The owner of the method Parameter values The client then decides which commands need to be executed and when by the invoker. This design pattern can easily be implemented in Scala using the by-name parameters feature of the language. The chain of responsibility design pattern The chain of responsibility is a design pattern where the sender of a request is decoupled from its receiver. This way, it makes it possible for multiple objects to handle the request and to keep logic nicely separated. The receivers form a chain where they pass the request and, if possible, they process it, and if not, they pass it to the next receiver. There are variations where a handler might dispatch the request to multiple other handlers at the same time. This somehow reminds us of function composition, which in Scala can be achieved using the stackable traits design pattern. The interpreter design pattern The interpreter design pattern is based on the ability to characterize a well-known domain with a language with a strict grammar. It defines classes for each grammar rule in order to interpret sentences in the given language. These classes are likely to represent hierarchies as grammar is usually hierarchical as well. Interpreters can be used in different parsers, for example, SQL or other languages. The iterator design pattern The iterator design pattern is when an iterator is used to traverse a container and access its elements. It helps to decouple containers from the algorithms performed on them. What an iterator should provide is sequential access to the elements of an aggregate object without exposing the internal representation of the iterated collection. The mediator design pattern This pattern encapsulates the communication between different classes in an application. Instead of interacting directly with each other, objects communicate through the mediator, which reduces the dependencies between them, lowers the coupling, and makes the overall application easier to read and maintain. The memento design pattern This pattern provides the ability to roll back an object to its previous state. It is implemented with three objects—originator, caretaker, and memento. The originator is the object with the internal state; the caretaker will modify the originator, and a memento is an object that contains the state that the originator returns. The originator knows how to handle a memento in order to restore its previous state. The observer design pattern This design pattern allows the creation of publish/subscribe systems. There is a special object called subject that automatically notifies all the observers when there are any changes in the state. This design pattern is popular in various GUI toolkits and generally where event handling is needed. It is also related to reactive programming, which is enabled by libraries such as Akka. We will see an example of this towards the end of this book. The state design pattern This design pattern is similar to the strategy design pattern, and it uses a state object to encapsulate different behavior for the same object. It improves the code's readability and maintainability by avoiding the use of large conditional statements. The template method design pattern This design pattern defines the skeleton of an algorithm in a method and then passes some of the actual steps to the subclasses. It allows developers to alter some of the steps of an algorithm without having to modify its structure. An example of this could be a method in an abstract class that calls other abstract methods, which will be defined in the children. The visitor design pattern The visitor design pattern represents an operation to be performed on the elements of an object structure. It allows developers to define a new operation without changing the original classes. Scala can minimize the verbosity of this pattern compared to the pure object-oriented way of implementing it by passing functions to methods. Choosing a design pattern As we already saw, there are a huge number of design patterns. In many cases, they are suitable to be used in combinations as well. Unfortunately, there is no definite answer regarding how to choose the concept of designing our code. There are many factors that could affect the final decision, and you should ask yourselves the following questions: Is this piece of code going to be fairly static or will it change in the future? Do we have to dynamically decide what algorithms to use? Is our code going to be used by others? Do we have an agreed interface? What libraries are we planning to use, if any? Are there any special performance requirements or limitations? This is by no means an exhaustive list of questions. There is a huge amount of factors that could dictate our decision in how we build our systems. It is, however, really important to have a clear specification, and if something seems missing, it should always be checked first. By now, we have a fair idea about what a design pattern is and how it can affect the way we write our code. We've iterated through the most famous Gang of Four design patterns out there, and we have outlined the main differences between them. To know more on how to incorporate functional patterns effectively in real-life applications, read our book Scala Design Patterns - Second Edition. Implementing 5 Common Design Patterns in JavaScript (ES8) An Introduction to Node.js Design Patterns
Read more
  • 0
  • 0
  • 20766

article-image-common-big-data-design-patterns
Sugandha Lahoti
08 Jul 2018
17 min read
Save for later

Common big data design patterns

Sugandha Lahoti
08 Jul 2018
17 min read
Design patterns have provided many ways to simplify the development of software applications. Now that organizations are beginning to tackle applications that leverage new sources and types of big data, design patterns for big data are needed. These big data design patterns aim to reduce complexity, boost the performance of integration and improve the results of working with new and larger forms of data. This article intends to introduce readers to the common big data design patterns based on various data layers such as data sources and ingestion layer, data storage layer and data access layer. This article is an excerpt from Architectural Patterns by Pethuru Raj, Anupama Raman, and Harihara Subramanian. In this book, you will learn the importance of architectural and design patterns in business-critical applications. Data sources and ingestion layer Enterprise big data systems face a variety of data sources with non-relevant information (noise) alongside relevant (signal) data. Noise ratio is very high compared to signals, and so filtering the noise from the pertinent information, handling high volumes, and the velocity of data is significant. This is the responsibility of the ingestion layer. The common challenges in the ingestion layers are as follows: Multiple data source load and prioritization Ingested data indexing and tagging Data validation and cleansing Data transformation and compression The preceding diagram depicts the building blocks of the ingestion layer and its various components. We need patterns to address the challenges of data sources to ingestion layer communication that takes care of performance, scalability, and availability requirements. In this section, we will discuss the following ingestion and streaming patterns and how they help to address the challenges in ingestion layers. We will also touch upon some common workload patterns as well, including: Multisource extractor Multidestination Protocol converter Just-in-time (JIT) transformation Real-time streaming pattern Multisource extractor An approach to ingesting multiple data types from multiple data sources efficiently is termed a Multisource extractor. Efficiency represents many factors, such as data velocity, data size, data frequency, and managing various data formats over an unreliable network, mixed network bandwidth, different technologies, and systems: The multisource extractor system ensures high availability and distribution. It also confirms that the vast volume of data gets segregated into multiple batches across different nodes. The single node implementation is still helpful for lower volumes from a handful of clients, and of course, for a significant amount of data from multiple clients processed in batches. Partitioning into small volumes in clusters produces excellent results. Data enrichers help to do initial data aggregation and data cleansing. Enrichers ensure file transfer reliability, validations, noise reduction, compression, and transformation from native formats to standard formats. Collection agent nodes represent intermediary cluster systems, which helps final data processing and data loading to the destination systems. The following are the benefits of the multisource extractor: Provides reasonable speed for storing and consuming the data Better data prioritization and processing Drives improved business decisions Decoupled and independent from data production to data consumption Data semantics and detection of changed data Scaleable and fault tolerance system The following are the impacts of the multisource extractor: Difficult or impossible to achieve near real-time data processing Need to maintain multiple copies in enrichers and collection agents, leading to data redundancy and mammoth data volume in each node High availability trade-off with high costs to manage system capacity growth Infrastructure and configuration complexity increases to maintain batch processing Multidestination pattern In multisourcing, we saw the raw data ingestion to HDFS, but in most common cases the enterprise needs to ingest raw data not only to new HDFS systems but also to their existing traditional data storage, such as Informatica or other analytics platforms. In such cases, the additional number of data streams leads to many challenges, such as storage overflow, data errors (also known as data regret), an increase in time to transfer and process data, and so on. The multidestination pattern is considered as a better approach to overcome all of the challenges mentioned previously. This pattern is very similar to multisourcing until it is ready to integrate with multiple destinations (refer to the following diagram). The router publishes the improved data and then broadcasts it to the subscriber destinations (already registered with a publishing agent on the router). Enrichers can act as publishers as well as subscribers: Deploying routers in the cluster environment is also recommended for high volumes and a large number of subscribers. The following are the benefits of the multidestination pattern: Highly scalable, flexible, fast, resilient to data failure, and cost-effective Organization can start to ingest data into multiple data stores, including its existing RDBMS as well as NoSQL data stores Allows you to use simple query language, such as Hive and Pig, along with traditional analytics Provides the ability to partition the data for flexible access and decentralized processing Possibility of decentralized computation in the data nodes Due to replication on HDFS nodes, there are no data regrets Self-reliant data nodes can add more nodes without any delay The following are the impacts of the multidestination pattern: Needs complex or additional infrastructure to manage distributed nodes Needs to manage distributed data in secured networks to ensure data security Needs enforcement, governance, and stringent practices to manage the integrity and consistency of data Protocol converter This is a mediatory approach to provide an abstraction for the incoming data of various systems. The protocol converter pattern provides an efficient way to ingest a variety of unstructured data from multiple data sources and different protocols. The message exchanger handles synchronous and asynchronous messages from various protocol and handlers as represented in the following diagram. It performs various mediator functions, such as file handling, web services message handling, stream handling, serialization, and so on: In the protocol converter pattern, the ingestion layer holds responsibilities such as identifying the various channels of incoming events, determining incoming data structures, providing mediated service for multiple protocols into suitable sinks, providing one standard way of representing incoming messages, providing handlers to manage various request types, and providing abstraction from the incoming protocol layers. Just-In-Time (JIT) transformation pattern The JIT transformation pattern is the best fit in situations where raw data needs to be preloaded in the data stores before the transformation and processing can happen. In this kind of business case, this pattern runs independent preprocessing batch jobs that clean, validate, corelate, and transform, and then store the transformed information into the same data store (HDFS/NoSQL); that is, it can coexist with the raw data: The preceding diagram depicts the datastore with raw data storage along with transformed datasets. Please note that the data enricher of the multi-data source pattern is absent in this pattern and more than one batch job can run in parallel to transform the data as required in the big data storage, such as HDFS, Mongo DB, and so on. Real-time streaming pattern Most modern businesses need continuous and real-time processing of unstructured data for their enterprise big data applications. Real-time streaming implementations need to have the following characteristics: Minimize latency by using large in-memory Event processors are atomic and independent of each other and so are easily scalable Provide API for parsing the real-time information Independent deployable script for any node and no centralized master node implementation The real-time streaming pattern suggests introducing an optimum number of event processing nodes to consume different input data from the various data sources and introducing listeners to process the generated events (from event processing nodes) in the event processing engine: Event processing engines (event processors) have a sizeable in-memory capacity, and the event processors get triggered by a specific event. The trigger or alert is responsible for publishing the results of the in-memory big data analytics to the enterprise business process engines and, in turn, get redirected to various publishing channels (mobile, CIO dashboards, and so on). Big data workload patterns Workload patterns help to address data workload challenges associated with different domains and business cases efficiently. The big data design pattern manifests itself in the solution construct, and so the workload challenges can be mapped with the right architectural constructs and thus service the workload. The following diagram depicts a snapshot of the most common workload patterns and their associated architectural constructs: Workload design patterns help to simplify and decompose the business use cases into workloads. Then those workloads can be methodically mapped to the various building blocks of the big data solution architecture. Data storage layer Data storage layer is responsible for acquiring all the data that are gathered from various data sources and it is also liable for converting (if needed) the collected data to a format that can be analyzed. The following sections discuss more on data storage layer patterns. ACID versus BASE versus CAP Traditional RDBMS follows atomicity, consistency, isolation, and durability (ACID) to provide reliability for any user of the database. However, searching high volumes of big data and retrieving data from those volumes consumes an enormous amount of time if the storage enforces ACID rules. So, big data follows basically available, soft state, eventually consistent (BASE), a phenomenon for undertaking any search in big data space. Database theory suggests that the NoSQL big database may predominantly satisfy two properties and relax standards on the third, and those properties are consistency, availability, and partition tolerance (CAP). With the ACID, BASE, and CAP paradigms, the big data storage design patterns have gained momentum and purpose. We will look at those patterns in some detail in this section. The patterns are: Façade pattern NoSQL pattern Polyglot pattern Façade pattern This pattern provides a way to use existing or traditional existing data warehouses along with big data storage (such as Hadoop). It can act as a façade for the enterprise data warehouses and business intelligence tools. In the façade pattern, the data from the different data sources get aggregated into HDFS before any transformation, or even before loading to the traditional existing data warehouses: The façade pattern allows structured data storage even after being ingested to HDFS in the form of structured storage in an RDBMS, or in NoSQL databases, or in a memory cache. The façade pattern ensures reduced data size, as only the necessary data resides in the structured storage, as well as faster access from the storage. NoSQL pattern This pattern entails getting NoSQL alternatives in place of traditional RDBMS to facilitate the rapid access and querying of big data. The NoSQL database stores data in a columnar, non-relational style. It can store data on local disks as well as in HDFS, as it is HDFS aware. Thus, data can be distributed across data nodes and fetched very quickly. Let's look at four types of NoSQL databases in brief: Column-oriented DBMS: Simply called a columnar store or big table data store, it has a massive number of columns for each tuple. Each column has a column key. Column family qualifiers represent related columns so that the columns and the qualifiers are retrievable, as each column has a column key as well. These data stores are suitable for fast writes. Key-value pair database: A key-value database is a data store that, when presented with a simple string (key), returns an arbitrarily large data (value). The key is bound to the value until it gets a new value assigned into or from a database. The key-value data store does not need to have a query language. It provides a way to add and remove key-value pairs. A key-value store is a dictionary kind of data store, where it has a list of words and each word represents one or more definitions. Graph database: This is a representation of a system that contains a sequence of nodes and relationships that creates a graph when combined. A graph represents three data fields: nodes, relationships, and properties. Some types of graph store are referred to as triple stores because of their node-relationship-node structure. You may be familiar with applications that provide evaluations of similar or likely characteristics as part of the search (for example, a user bought this item also bought... is a good illustration of graph store implementations). Document database: We can represent a graph data store as a tree structure. Document trees have a single root element or sometimes even multiple root elements as well. Note that there is a sequence of branches, sub-branches, and values beneath the root element. Each branch can have an expression or relative path to determine the traversal path from the origin node (root) and to any given branch, sub-branch, or value. Each branch may have a value associated with that branch. Sometimes the existence of a branch of the tree has a specific meaning, and sometimes a branch must have a given value to be interpreted correctly. The following table summarizes some of the NoSQL use cases, providers, tools and scenarios that might need NoSQL pattern considerations. Most of this pattern implementation is already part of various vendor implementations, and they come as out-of-the-box implementations and as plug and play so that any enterprise can start leveraging the same quickly. NoSQL DB to Use Scenario Vendor / Application / Tools Columnar database Application that needs to fetch entire related columnar family based on a given string: for example, search engines SAP HANA / IBM DB2 BLU / ExtremeDB / EXASOL / IBM Informix / MS SQL Server / MonetDB Key Value Pair database Needle in haystack applications (refer to the Big data workload patterns given in this section) Redis / Oracle NoSQL DB / Linux DBM / Dynamo / Cassandra Graph database Recommendation engine: application that provides evaluation of Similar to / Like: for example, User that bought this item also bought ArangoDB / Cayley / DataStax / Neo4j / Oracle Spatial and Graph / Apache Orient DB / Teradata Aster Document database Applications that evaluate churn management of social media data or non-enterprise data Couch DB / Apache Elastic Search / Informix / Jackrabbit / Mongo DB / Apache SOLR Polyglot pattern Traditional (RDBMS) and multiple storage types (files, CMS, and so on) coexist with big data types (NoSQL/HDFS) to solve business problems. Most modern business cases need the coexistence of legacy databases. At the same time, they would need to adopt the latest big data techniques as well. Replacing the entire system is not viable and is also impractical. The polyglot pattern provides an efficient way to combine and use multiple types of storage mechanisms, such as Hadoop, and RDBMS. Big data appliances coexist in a storage solution: The preceding diagram represents the polyglot pattern way of storing data in different storage types, such as RDBMS, key-value stores, NoSQL database, CMS systems, and so on. Unlike the traditional way of storing all the information in one single data source, polyglot facilitates any data coming from all applications across multiple sources (RDBMS, CMS, Hadoop, and so on) into different storage mechanisms, such as in-memory, RDBMS, HDFS, CMS, and so on. Data access layer Data access in traditional databases involves JDBC connections and HTTP access for documents. However, in big data, the data access with conventional method does take too much time to fetch even with cache implementations, as the volume of the data is so high. So we need a mechanism to fetch the data efficiently and quickly, with a reduced development life cycle, lower maintenance cost, and so on. Data access patterns mainly focus on accessing big data resources of two primary types: End-to-end user-driven API (access through simple queries) Developer API (access provision through API methods) In this section, we will discuss the following data access patterns that held efficient data access, improved performance, reduced development life cycles, and low maintenance costs for broader data access: Connector pattern Lightweight stateless pattern Service locator pattern Near real-time pattern Stage transform pattern The preceding diagram represents the big data architecture layouts where the big data access patterns help data access. We discuss the whole of that mechanism in detail in the following sections. Connector pattern The developer API approach entails fast data transfer and data access services through APIs. It creates optimized data sets for efficient loading and analysis. Some of the big data appliances abstract data in NoSQL DBs even though the underlying data is in HDFS, or a custom implementation of a filesystem so that the data access is very efficient and fast. The connector pattern entails providing developer API and SQL like query language to access the data and so gain significantly reduced development time. As we saw in the earlier diagram, big data appliances come with connector pattern implementation. The big data appliance itself is a complete big data ecosystem and supports virtualization, redundancy, replication using protocols (RAID), and some appliances host NoSQL databases as well. The preceding diagram shows a sample connector implementation for Oracle big data appliances. The data connector can connect to Hadoop and the big data appliance as well. It is an example of a custom implementation that we described earlier to facilitate faster data access with less development time. Lightweight stateless pattern This pattern entails providing data access through web services, and so it is independent of platform or language implementations. The data is fetched through restful HTTP calls, making this pattern the most sought after in cloud deployments. WebHDFS and HttpFS are examples of lightweight stateless pattern implementation for HDFS HTTP access. It uses the HTTP REST protocol. The HDFS system exposes the REST API (web services) for consumers who analyze big data. This pattern reduces the cost of ownership (pay-as-you-go) for the enterprise, as the implementations can be part of an integration Platform as a Service (iPaaS): The preceding diagram depicts a sample implementation for HDFS storage that exposes HTTP access through the HTTP web interface. Near real-time pattern For any enterprise to implement real-time data access or near real-time data access, the key challenges to be addressed are: Rapid determination of data: Ensure rapid determination of data and make swift decisions (within a few seconds, not in minutes) before the data becomes meaningless Rapid analysis: Ability to analyze the data in real time and spot anomalies and relate them to business events, provide visualization, and generate alerts at the moment that the data arrived Some examples of systems that would need real-time data analysis are: Radar systems Customer services applications ATMs Social media platforms Intrusion detection systems Storm and in-memory applications such as Oracle Coherence, Hazelcast IMDG, SAP HANA, TIBCO, Software AG (Terracotta), VMware, and Pivotal GemFire XD are some of the in-memory computing vendor/technology platforms that can implement near real-time data access pattern applications: As shown in the preceding diagram, with multi-cache implementation at the ingestion phase, and with filtered, sorted data in multiple storage destinations (here one of the destinations is a cache), one can achieve near real-time access. The cache can be of a NoSQL database, or it can be any in-memory implementations tool, as mentioned earlier. The preceding diagram depicts a typical implementation of a log search with SOLR as a search engine. Stage transform pattern In the big data world, a massive volume of data can get into the data store. However, all of the data is not required or meaningful in every business case. The stage transform pattern provides a mechanism for reducing the data scanned and fetches only relevant data. HDFS has raw data and business-specific data in a NoSQL database that can provide application-oriented structures and fetch only the relevant data in the required format: Combining the stage transform pattern and the NoSQL pattern is the recommended approach in cases where a reduced data scan is the primary requirement. The preceding diagram depicts one such case for a recommendation engine where we need a significant reduction in the amount of data scanned for an improved customer experience. The implementation of the virtualization of data from HDFS to a NoSQL database, integrated with a big data appliance, is a highly recommended mechanism for rapid or accelerated data fetch. We discussed big data design patterns by layers such as data sources and ingestion layer, data storage layer and data access layer. To know more about patterns associated with object-oriented, component-based, client-server, and cloud architectures, read our book Architectural Patterns. Why we need Design Patterns? Implementing 5 Common Design Patterns in JavaScript (ES8) An Introduction to Node.js Design Patterns
Read more
  • 0
  • 0
  • 36941
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime