Having a good idea of what big data and its characteristics are, let's now dig into what big data modeling is. Say we have the dataset, which we classify as big data, and before doing any analysis on the dataset, we need to have an idea of how the data looks. The goal of data modeling is to formally explore the nature of data so that you can figure out what kind of storage you need, and what kind of processing you can do on it.
Data modeling is a technique that helps to give meaningful insight into data by defining and categorizing it, and establishing official definitions and descriptors so that the data can be utilized by all information systems in a company.
We can hold at least two primary reasons for performing data modeling:
- Strategic data modeling facilitates the overall information systems development strategy
- Data modeling can help in the development of new databases
The data modeling for strategic outlining suggests defining what kind of data you will need for your company processes, while modeling in the context of analysis is more focused on representing data that exists and finding ways to classify it. In the case of big data, that process probably requires finding similarities between data from disparate sources and confirming that they, in fact, describe the same thing. In either case, the end goal is to generate a representation of your data that can be replicated in your database architecture.