Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Hands-On Big Data Modeling
Hands-On Big Data Modeling

Hands-On Big Data Modeling: Effective database design techniques for data architects and business intelligence professionals

Arrow left icon
Profile Icon James Lee Profile Icon Wei Profile Icon Kumar Mukhiya
Arrow right icon
€23.99 €26.99
Full star icon Full star icon Half star icon Empty star icon Empty star icon 2.3 (4 Ratings)
eBook Nov 2018 306 pages 1st Edition
eBook
€23.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon James Lee Profile Icon Wei Profile Icon Kumar Mukhiya
Arrow right icon
€23.99 €26.99
Full star icon Full star icon Half star icon Empty star icon Empty star icon 2.3 (4 Ratings)
eBook Nov 2018 306 pages 1st Edition
eBook
€23.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€23.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Hands-On Big Data Modeling

Introduction to Big Data and Data Management

This chapter addresses the concept of big data, its sources, and its types. In addition to this, the chapter focuses on giving a theoretical foundation about data modeling and data management. Readers will be getting their hands dirty with setting up a platform where we can utilize big data. The major topics discussed in this chapter are summarized as follows:

  • Discover the concept of big data and its origins
  • Learn about the various characteristics of big data
  • Discuss and explore various challenges in big data mining
  • Get familiar with big data modeling and its uses
  • Understand what big data management is and its importance and implications
  • Set up a big data platform on a local machine

The concept of big data

Digital systems are progressively intertwined with real-world activities. As a consequence, multitudes of data are recorded and reported by information systems. During the last 50 years, the growth in information systems and their capabilities to capture, curate, store, share, transfer, analyze, and visualize data has increased exponentially. Besides these incredible technological advances, people and organizations depend more and more on computerized devices and information sources on the internet. The IDC Digital Universe Study in May 2010 illustrates the spectacular growth of data. This study estimated that the amount of digital information (on personal computers, digital cameras, servers, sensors) stored exceeds 1 zettabyte, and predicted that the digital universe would to grow to 35 zettabytes in 2010. The IDC study characterizes 35 zettabytes as a stack of DVDs reaching halfway to Mars. This is what we refer to as the data explosion.

Most of the data stored in the digital universe is very unstructured, and organizations are facing challenges to capture, curate, and analyze it. One of the most challenging tasks for today's organizations is to extract information and value from data stored in their information systems. This data, which is highly complex and too voluminous to be handled by a traditional DBMS, is called big data.

Big data is a term for a group of datasets so massive and sophisticated that it becomes troublesome to process using on-hand database-management tools or contemporary processing applications. Within the recent market, massive data trends to refer to the employment of user-behavior analytics, predictive analytics, or certain different advanced data-analysis methods that extract value from this new data echo system analytics.

Whether it's day-to-day data, business data, or basis data, if they represent a massive volume of data, either structured or unstructured, the data is relevant for the organization. However, it's not only the dimensions of the data that matters; it's how it's being used by the organization to extract the deeper insights that can drive them to better business and strategic decisions. This voluminous data can be used to determine a quality of research, enhance process flow in an organization, prevent a particular disease, link legal citations, or combat crimes. Big data is everywhere, and with the right tools it can be used to make the data more effective for business analytics.

Interesting insights regarding big data

Some interesting facts related to big data, and its management and analysis, are explained here, while some are presented in the Further reading section. The facts are taken from the source mentioned in the Further reading item.

  • Almost 91% of the world's marketing leaders consume customer data as big data to make business decisions.
  • Interestingly, 90% of the world's total data has been generated within the last two years.
  • 87% of people agree to record and distribute the right data. It is important to effectively measure Return of Investment (ROI) in their own company.
  • 86% of people are willing to pay more for a great customer experience with a brand.
  • 75% of companies claim they will expand investments in big data within the next year.
  • About 70% of big data is created by individuals—but enterprises are subjected to storing and controlling 80% of it.
  • 70% of businesses accept that their marketing efforts are under higher scrutiny.

Characteristics of big data

We explored the popularity of big data in the preceding section. But it is important to know what types of data can be categorized or labeled as big data. In this section, we are going to explore various features of big data. Most of the books available on the market would claim there are six different types, discussed as follows:

  • Volume: Big data implies massive amounts of data. The size of data gets a very relevant role in determining the value out of the data, and it is also a key factor that determines whether we can judge the chunk of data as big. Hence, volume justifies one of the important attributes of big data.
Every minute, 204,000,000 emails are sent, 200,000 photos are uploaded, and 1,800,000 likes are generated on Facebook; on YouTube, 1,300,000 videos are viewed and 72 hours of video are uploaded.

The idea behind such aggregation of massive volumes of data is to understand that businesses and organizations are collecting and leveraging giant volumes of data to reinforce their merchandise, whether it is safety, dependability, healthcare, or governance. In brief, the idea is to turn this abundant, voluminous data into some form of business advantage.

  • Velocity: It relates to the increasing speed at which big data is created, and the increasing speed at which data is stored and analyzed. Processing the data in real time to match its production rate as it gets generated is a remarkable goal of big data analytics. The term velocity generally applies to how fast the data is produced and processed to satisfy the demands; it discovers the real potential in the data. The flow of data is massive and continuous. Data can be stored and processed in different ways, including batch processing, near-time, real-time processing, and streaming:

    • Real-time processing refers to the ability to capture, store, and process the data in real time and trigger immediate action, potentially saving lives.
    • Batch processing refers to feeding a large amount of data into large machines and processing for days at a time. It is still very common today.
  • Variety: It refers to many sources and types of data, either structured, semi-structured, or unstructured. We will get to discuss more on these types of big data in Chapter 5, Structures of Data Models. When we think of data variety, we think of the additional complexity that results from more kinds of data that we need to store, process, and combine. Data is more heterogeneous these days, such as BLOB image data, enterprise data, network data, video data, text data, geographic maps, computer-generated or simulated data, and social media data. We can categorize the variety of data into several dimensions. Some of the dimensions are explained as follows:

    • Structural variety: This refers to the representation of the data; for example, a satellite image of wildfires from NASA is completely different from tweets sent out by people who are seeing the fire spread.
    • Media variety: Data gets delivered in various media, such as text, audio, or video. These are referred to as media variety.
    • Semantic variety: Semantic variety comes from different assumptions of conditions on the data. For example, we can measure its age using a qualitative approach (infant, juvenile, or adult) or a quantitative approach (numbers).
  • Veracity: It refers to the quality of the data, and is also designated as validity or volatility. Big data can be noisy and uncertain, full of biases and abnormalities, and it can be imprecise. The idea that data is of no value if it's not accurate—the results of the big data analysis are only as good as the data being analyzed—creates challenges in keeping track of data quality—what has been captured, where the data came from, and how it was analyzed prior to its use.

  • Valence: It refers to connectedness. The more connected data is, the higher its valences. A high valence dataset is denser. This makes many regular analytical critiques very inefficient.

  • Value: The term, in general, refers to the valuable insights gained from the ability to investigate and identify new patterns and trends from high-volume and cross-platform systems. The idea behind processing all this big data in the first place is to bring value to the query at hand. The final output of all the tasks is the value.

Here's a summed-up representation of the preceding content:

Sources and types of big data

We learned that big data is omnipresent and that it can be beneficial for enterprises in one or many ways. With the high prevalence of big data from existing hardware and software, enterprises are still struggling to process, store, analyze, and manage big data using traditional data-mining tools and techniques. In this section, we are going to explore the sources of these complex and dynamic data and how can we consume them.

We can separate the sources of the data into three major categories. The following diagram shows the three major sources of big data:

Let's look into the three major sources one by one:

  • Logs generated by a machine: A lot of the big data is generated from real-time sensors in industrial machinery or vehicles that create logs for tracking user behaviors, environmental sensors, or personal health-trackers and other sensor data. Most of this machine-created data can be grouped into the following subcategories:
    • Click-log stream data: This is the data that is captured every time a user clicks any link on a website. A detailed analysis of this data can reveal information related to customer behavior and deep interactions of the users with the current website, as well as customers' buying patterns.
    • Gaming events log data: A user performs a set of tasks when playing any online game. Each and every move the online user makes in a game can be stored. This data can be analyzed and the results can be helpful in knowing how end users are propeled through a gaming portfolio.
    • Sensors log data: Various types of sensors log data involve radio-frequency ID tags, smart meters, smartwatch sensor data, medical sensor devices such as heart-rate-monitoring sensors, and Global Positioning System (GPS) data. These types of sensors log data can be recorded and then used to analyze the actual status of the subject.
    • Weblog event data: There is extensive use of servers, cloud infrastructures, applications, networks, and so on. These applications operate and record all kinds of data about their events and operation. These data, when stored, can amount to massive volumes of data, and can be useful in understanding how to deal with service-level agreements or to predict security breaches.
    • Point-of-sale event-log data: Almost every product these days has a unique barcode. A cashier in a retail shop or department swipes the barcode of any product when selling, and all the data associated with the product is generated and can be captured. This data can be analyzed to understand the selling pattern of a retailer.
  • Person: People generate a lot of big data from social media, status updates, tweets, photos, and media uploads. Most of these logs are generated through interactions of a user with a network, such as the internet. This data reveal contains how a user communicates with the network. These interaction logs can reveal deep content-interaction models that can be useful in understanding user behavior. This analysis can be used to train a model to present personalized recommendations of web items, including next news to read, or, most likely, products to consider buying. A lot of similar researches are very hot in today's industry, including sentiment analysis and topic analysis. Most of this data is unstructured, as there is no proper format or well-defined structure available. Most of this data is either in a text format, a portable document format, a comma-separated value (CSV), or a JSON file.
  • Organization: We get a massive amount of data from an organization in terms of transaction information in databases and structured data open-stored in the data warehouse. This data is a highly structured form of data. Organizations store their data on some type of RDBMS, such as SQL, Oracle, and MS Access. This data resides in a fixed format inside the field or a table. This organization-generated data is consumed and processed in ICT technology to comprehend business intelligence and market analysis.

Challenges of big data

There are certain key aspects that make the big data very challenging. In this section, we'll discuss some of them:

  • Heterogeneity: There is a great deal of diversity in the information consumed by human beings, and they are indeed tolerated as well. In fact, the nuance and richness of natural language will provide valuable depth. However, machine-analysis algorithms expect consistent knowledge, and can't understand nuance. As a consequence, knowledge must be carefully structured as a first step to (or prior to) knowledge analysis. Computer systems work most efficiently if they can store multiple things that are all identical in size and structure. Economical representation, access, and the analysis of semi-structured knowledge require further work.
  • Personal privacy: There is a lot of personal information that is captured, stored, analyzed, and processed through internet service providers (ISPs), mobile networks, operators, supermarkets, local transportation, educational institutions, and medical and financial service organizations, including hospitals, banks, insurance companies, and credit card agencies. A great deal of information is being stored on social networks such as Facebook, YouTube, and Google. This illuminates that privacy is an issue whose importance, particularly to the customer, is growing as the value of big data becomes more apparent. This personal data is used by mining algorithms to personalize news content and to manage ads, and for other e-commerce advantages. This is clearly a violation of personal privacy.
  • Scale: As the name suggests, big data is massive. When there is an increase in size, there are underlying issues that accompany it in terms of storage, retrieval, processing, transformation, and analysis. As mentioned in the introduction, data volume is scaling much faster than computer resources and CPU speeds, which are static.
  • Timeliness: This is concerned with speed, as the larger the size of the data to be processed, the longer it will take to analyze it. There are many scenarios where in the results of the analysis are required in real-time or immediately. This creates an extra challenge when building a system that can process the big data in a timely manner.
  • Securing big data: Security is also a big concern for both enterprises and individuals. Big data stores can be engaging targets for hackers or complex persistent threats. Security is an essential attribute in the big data architecture that reveals ways to store and provide access to information securely.
Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Create effective models that get the most out of big data
  • Apply your knowledge to datasets from Twitter and weather data to learn big data
  • Tackle different data modeling challenges with expert techniques presented in this book

Description

Modeling and managing data is a central focus of all big data projects. In fact, a database is considered to be effective only if you have a logical and sophisticated data model. This book will help you develop practical skills in modeling your own big data projects and improve the performance of analytical queries for your specific business requirements. To start with, you’ll get a quick introduction to big data and understand the different data modeling and data management platforms for big data. Then you’ll work with structured and semi-structured data with the help of real-life examples. Once you’ve got to grips with the basics, you’ll use the SQL Developer Data Modeler to create your own data models containing different file types such as CSV, XML, and JSON. You’ll also learn to create graph data models and explore data modeling with streaming data using real-world datasets. By the end of this book, you’ll be able to design and develop efficient data models for varying data sizes easily and efficiently.

Who is this book for?

This book is great for programmers, geologists, biologists, and every professional who deals with spatial data. If you want to learn how to handle GIS, GPS, and remote sensing data, then this book is for you. Basic knowledge of R and QGIS would be helpful.

What you will learn

  • Get insights into big data and discover various data models
  • Explore conceptual, logical, and big data models
  • Understand how to model data containing different file types
  • Run through data modeling with examples of Twitter, Bitcoin, IMDB and weather data modeling
  • Create data models such as Graph Data and Vector Space
  • Model structured and unstructured data using Python and R

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Nov 30, 2018
Length: 306 pages
Edition : 1st
Language : English
ISBN-13 : 9781788626088
Category :
Concepts :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Nov 30, 2018
Length: 306 pages
Edition : 1st
Language : English
ISBN-13 : 9781788626088
Category :
Concepts :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 111.97
Big Data Architect???s Handbook
€45.99
Hands-On Data Science with R
€32.99
Hands-On Big Data Modeling
€32.99
Total 111.97 Stars icon

Table of Contents

16 Chapters
Introduction to Big Data and Data Management Chevron down icon Chevron up icon
Data Modeling and Management Platforms Chevron down icon Chevron up icon
Defining Data Models Chevron down icon Chevron up icon
Categorizing Data Models Chevron down icon Chevron up icon
Structures of Data Models Chevron down icon Chevron up icon
Modeling Structured Data Chevron down icon Chevron up icon
Modeling with Unstructured Data Chevron down icon Chevron up icon
Modeling with Streaming Data Chevron down icon Chevron up icon
Streaming Sensor Data Chevron down icon Chevron up icon
Concept and Approaches of Big-Data Management Chevron down icon Chevron up icon
DBMS to BDMS Chevron down icon Chevron up icon
Modeling Bitcoin Data Points with Python Chevron down icon Chevron up icon
Modeling Twitter Feeds Using Python Chevron down icon Chevron up icon
Modeling Weather Data Points with Python Chevron down icon Chevron up icon
Modeling IMDb Data Points with Python Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Half star icon Empty star icon Empty star icon 2.3
(4 Ratings)
5 star 25%
4 star 0%
3 star 0%
2 star 25%
1 star 50%
Jimmy Boam Sep 22, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Just finished reading the book. I'd regard the book as required reading for anyone involved in big data/ dataware housing. I recently asked my team lead to get 10 copies for 15 peers in my department. Buying this book is a no-brainer with respect to personal ROI.
Amazon Verified review Amazon
Jack Mok Mar 03, 2019
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
The content is so simple
Amazon Verified review Amazon
Pooja Dec 31, 2019
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
This book is just a bunch of buzz words thrown together with no real in depth discussion of any topic.
Amazon Verified review Amazon
Amazon Customer Jun 07, 2019
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
I don't normally write reviews, but decided I should for this to warn others. This is one of the worst technical books I've read in some time. It seems it was hastily thrown together as a cash grab.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.