Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hands-On Big Data Modeling

You're reading from   Hands-On Big Data Modeling Effective database design techniques for data architects and business intelligence professionals

Arrow left icon
Product type Paperback
Published in Nov 2018
Publisher Packt
ISBN-13 9781788620901
Length 306 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Authors (3):
Arrow left icon
James Lee James Lee
Author Profile Icon James Lee
James Lee
Tao Wei Tao Wei
Author Profile Icon Tao Wei
Tao Wei
Suresh Kumar Mukhiya Suresh Kumar Mukhiya
Author Profile Icon Suresh Kumar Mukhiya
Suresh Kumar Mukhiya
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Introduction to Big Data and Data Management 2. Data Modeling and Management Platforms FREE CHAPTER 3. Defining Data Models 4. Categorizing Data Models 5. Structures of Data Models 6. Modeling Structured Data 7. Modeling with Unstructured Data 8. Modeling with Streaming Data 9. Streaming Sensor Data 10. Concept and Approaches of Big-Data Management 11. DBMS to BDMS 12. Modeling Bitcoin Data Points with Python 13. Modeling Twitter Feeds Using Python 14. Modeling Weather Data Points with Python 15. Modeling IMDb Data Points with Python 16. Other Books You May Enjoy

Introduction to big data modeling

Having a good idea of what big data and its characteristics are, let's now dig into what big data modeling is. Say we have the dataset, which we classify as big data, and before doing any analysis on the dataset, we need to have an idea of how the data looks. The goal of data modeling is to formally explore the nature of data so that you can figure out what kind of storage you need, and what kind of processing you can do on it.

Data modeling is a technique that helps to give meaningful insight into data by defining and categorizing it, and establishing official definitions and descriptors so that the data can be utilized by all information systems in a company.

We can hold at least two primary reasons for performing data modeling:

  • Strategic data modeling facilitates the overall information systems development strategy
  • Data modeling can help in the development of new databases

The data modeling for strategic outlining suggests defining what kind of data you will need for your company processes, while modeling in the context of analysis is more focused on representing data that exists and finding ways to classify it. In the case of big data, that process probably requires finding similarities between data from disparate sources and confirming that they, in fact, describe the same thing. In either case, the end goal is to generate a representation of your data that can be replicated in your database architecture.

Uses of models

In this section, we are going to discuss why we need data models, and the main benefits we can get by studying current data models. A high-level data model illustrates the core concepts and principles of any company in a very simplistic way, employing short descriptions. One of the biggest advantages of developing the high-level model is that it helps us to arrive at common terminology and definitions of the ideas and principles.

A high-level data model utilizes simplistic graphical images to illustrate the core concepts and principles of an organization and what they mean. A database model shows the logical structure of a database, including the relationships and constraints that determine how data can be stored and accessed.

Let's consider a simple student score-recording system. A student has a First name, a Last name, and a unique identifier. Each student is associated with an institution. Each student has a Start date and other data associated with them. We can better represent this using some kind of model than in a paragraph, which is difficult to understand.

Let's convert it into a model:

Model 1.1

Now, let's consider the preceding model. It shows clearly the correlation between students and the Institution provider and how they are saved in multiple tables. It's easier to understand than a paragraph. Now let's analyze this model and see what benefits we get from the model compared with other textual representations:

  • Gaining insight: A detailed model shows the process from various angles. Like in the preceding model, we can see how students are associated with the provider institutions, the different types of plans, and when a course starts. In order to start with data modeling, it is important to know the following:
    • Understanding how the business works in order to understand data flow inside the organization.
    • Understanding what type of data is gathered and stored in the organization.
    • Understanding business processes and relationships. This knowledge guides us in building data and relationships in a data model.
  • Discussion: The detailed data model can be used for discussions with the stakeholders.
  • Knowledge transfer: This can be used as a source of documentation for instructing people or developers. Data modeling is a sort of documentation, both for business stakeholders and technical experts. Starting with providing a common vocabulary that different job roles can share, and by continuing on to providing newcomers with a well-thought-out business glossary, your knowledge to document and convey information about your business is greatly enhanced. In addition to this, the model can be used as a training aid.
  • Verification: The process models are analyzed to find errors in systems or procedures. If your requirements gathering were complete and included the merging of data from multiple sources, as well as query and reporting obligations, you'd have business intelligence opportunities that were nonexistent when your data existed in silos, or in haphazardly-designed databases.
  • Performance analysis: A detailed model made from the data can be used to analyze the performance of the system by employing several available techniques, such as simulations, and dry and run playing in the model.
  • Specification: A relevant model generated from an organization's data can be utilized to create a Software Requirement Specification (SRS) document that can be used as a roadmap between a developer and end user stakeholders.
  • Configuration: The models constructed from data can be applied to configure a system. A detailed model constructed with precision shows the relationship between modules and how a module can communicate with another module. This information can be used by any organization to enforce interoperability among the modules and module configuration parameters, and reduce redundancies.
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image