Big data use case patterns
There are many technological scenarios, and some of them are similar in pattern. It is a good idea to map scenarios with architectural patterns. Once these patterns, are understood, they become the fundamental building blocks of solutions. We will discuss five types of patterns in the following section.
Note
This solution is not always optimized, and it may depend on domain data, type of data, or some other factors. These examples are to visualize a problem and they can help to find a solution.
Big data as a storage pattern
Big data systems can be used as a storage pattern or as a data warehouse, where data from multiple sources, even with different types of data, can be stored and can be utilized later. The usage scenario and use case are as follows:
- Usage scenario:
- Data getting continuously generated in large volumes
- Need for preprocessing before getting loaded into the target system
- Use case:
- Machine data capture for subsequent cleansing can be merged in multiple or single big file(s) and can be loaded in a Hadoop to compute
- Unstructured data across multiple sources should be captured for subsequent analysis on emerging patterns
- Data loaded in Hadoop should be processed and filtered, and depending on the data, we can have the storage as a data warehouse, Hadoop, or any NoSQL system.
The storage pattern is shown in the following figure:
Big data as a data transformation pattern
Big data systems can be designed to perform transformation as the data loading and cleansing activity, and many transformations can be done faster than traditional systems due to parallelism. Transformation is one phase in the Extract–Transform–Load of data ingestion and cleansing phase. The usage scenario and use case are as follows:
- Usage scenario
- A large volume of raw data to be preprocessed
- Data type includes structured as well as non-structured data
- Use case
- Evolution of ETL (Extract–Transform–Load) tools to leverage big data, for example, Pentaho, Talend, and so on. Also, in Hadoop, ELT (Extract–Load–Transform) is also trending, as the loading will be faster in Hadoop, and cleansing can run a parallel process to clean and transform the input, which will be faster
The data transformation pattern is shown in the following figure:
Big data for a data analysis pattern
Data analytics is of wider interest in big data systems, where a huge amount of data can be analyzed to generate statistical reports and insights about the data, which can be useful in business and understanding of patterns. The usage scenario and use case are as follows:
- Usage scenario
- Improved response time for detection of patterns
- Data analysis for non-structured data
- Use case
- Fast turnaround for machine data analysis (for example, analysis of seismic data)
- Pattern detection across structured and non-structured data (for example, fraud analysis)
Big data for data in a real-time pattern
Big data systems integrating with some streaming libraries and systems are capable of handling high scale real-time data processing. Real-time processing for a large and complex requirement possesses a lot of challenges such as performance, scalability, availability, resource management, low latency, and so on. Some streaming technologies such as Storm and Spark Streaming can be integrated with YARN. The usage scenario and use case are as follows:
- Usage scenario
- Managing the action to be taken based on continuously changing data in real time
- Use case
- Automated process control based on real time from manufacturing equipments
- Real-time changes to plant operations based on events from business systems Enterprise Resource Planning (ERPs)
The data in a real-time pattern is shown in the following figure:
Big data for a low latency caching pattern
Big data systems can be tuned as a special case for a low latency system, where reads are much higher and updates are low, which can fetch the data faster and can be stored in memory, which can further improve the performance and avoid overheads. The usage scenario and use case are as follows:
- Usage scenario
- Reads are far higher in ratio to writes
- Reads require very low latency and a guaranteed response
- Distributed location-based data caching
- Use case
- Order promising solutions
- Cloud-based identity and SSO
- Low latency real-time personalized offers on mobile
The low latency caching pattern is shown in the following pattern:
Some of the technology stacks that are widely used according to the layer and framework are shown in the following image: