Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
IBM Cloud Pak for Data

You're reading from   IBM Cloud Pak for Data An enterprise platform to operationalize data, analytics, and AI

Arrow left icon
Product type Paperback
Published in Nov 2021
Publisher Packt
ISBN-13 9781800562127
Length 336 pages
Edition 1st Edition
Arrow right icon
Authors (3):
Arrow left icon
Hemanth Manda Hemanth Manda
Author Profile Icon Hemanth Manda
Hemanth Manda
Sriram Srinivasan Sriram Srinivasan
Author Profile Icon Sriram Srinivasan
Sriram Srinivasan
Deepak Rangarao Deepak Rangarao
Author Profile Icon Deepak Rangarao
Deepak Rangarao
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Section 1: The Basics
2. Chapter 1: The AI Ladder – IBM's Prescriptive Approach FREE CHAPTER 3. Chapter 2: Cloud Pak for Data: A Brief Introduction 4. Section 2: Product Capabilities
5. Chapter 3: Collect – Making Data Simple and Accessible 6. Chapter 4: Organize – Creating a Trusted Analytics Foundation 7. Chapter 5: Analyzing: Building, Deploying, and Scaling Models with Trust and Transparency 8. Chapter 6: Multi-Cloud Strategy and Cloud Satellite 9. Chapter 7: IBM and Partner Extension Services 10. Chapter 8: Customer Use Cases 11. Section 3: Technical Details
12. Chapter 9: Technical Overview, Management, and Administration 13. Chapter 10: Security and Compliance 14. Chapter 11: Storage 15. Chapter 12: Multi-Tenancy 16. Other Books You May Enjoy

Collect – making data simple and accessible

The Collect layer is about putting your data in the appropriate persistence store to efficiently collect and access all your data assets. A well-architected "Collect" rung allows an organization to leverage the appropriate data store based on the use case and user persona; whether it's Hadoop for data exploration with data scientists, OLAP for delivering operational reports leveraging business intelligence or other enterprise visualization tools, NoSQL databases such as MongoDB for rapid application development, or some mixture of them all, you have the flexibility to deliver this in a single, integrated manner with the Common SQL Engine.

IBM offers some of the best database technology in the world for addressing every type of data workload, from Online Transactional Processing (OLTP) to Online Analytical Processing (OLAP) to Hadoop to fast data. This allows customers to quickly change as their business and application needs change. Furthermore, IBM layers a Common SQL Engine across all its persistence stores to be able to write SQL once, and leverage your persistence store of choice, regardless of whether it is IBM Db2 or open source persistence stores such as MongoDB or Hadoop. This allows for portable applications and saves enterprises significant time and money that would typically be spent on rewriting queries for different flavors of persistence. Also, this enables a better experience for end users and a faster time to value.

IBM's Db2 technology is enabled for natural language queries, which allows non-SQL users to search through their OLTP store using natural language. Also, Db2 supports Augmented Data Exploration (ADE), which allows users to access the database and visualize their datasets through automation (as opposed to querying data using SQL).

To summarize, Collect is all about collecting data to capture newly created data of all types, and then bringing it together across various silos and locations to make it accessible for further use (up the AI ladder). In IBM, the Collect rung of the AI ladder is characterized by three key attributes:

  • Empower: IT architects and developers in enterprises are empowered as they are offered a complete set of fit-for-purpose data capabilities that can handle all types of workloads in a self-service manner. This covers all workloads and data types, be it structured or unstructured, open source or proprietary, on-premises or in the cloud. It's a single portfolio that covers all your data needs.
  • Simplify: One of the key tenets of simplicity is enabling self-service, and this is realized rather quickly in a containerized platform built using cloud-native principles. For one, provisioning new data stores involves a simple click of a button. In-place upgrades equate to zero downtime, and scaling up and down is a breeze, ensuring that enterprises can quickly react to business needs in a matter of minutes as opposed to waiting for weeks or months. Last but not least, IBM is infusing AI into its data stores to enable augmented data exploration and other automation processes.
  • Integrate: Focuses on the need to make data accessible and integrate well with the other rungs of the AI ladder. Data virtualization, in conjunction with data governance, enables customers to access a multitude of datasets in a single view, with a consistent glossary of business terms and associated lineage, all at your fingertips. This enables the democratization of enterprise data accelerating AI initiatives and driving automation to your business. The following diagram summarizes the key facets of the Collect rung of the AI ladder:
Figure 1.3 – Collect – making data simple and accessible

Figure 1.3 – Collect – making data simple and accessible

Our portfolio of capabilities, all of which support the Collect rung, can be categorized into four workload domains in the market:

  1. First, there's the traditional operational database. This is your system of records, your point of sales, and your transactional database.
  2. Analytics databases are in high demand as the amount of data is exploding. Everyone is looking for new ways to analyze data at scale quickly, all the way from traditional reporting to preparing data for training and scoring AI models.
  3. Big data. The history of having a data lake using Hadoop at petabyte scale is now slowly transforming into the separation of storage and compute, with Cloud Object Storage and Spark playing key roles. The market demand for data lakes is clearly on an upward trajectory.
  4. Finally, IoT is quickly transforming several industries, and the fast data area is becoming an area of interest. This is the market of the future, and IBM is addressing requirements in this space through real-time data analysis.

Next, we will explore the importance of organizing data and what it entails.

You have been reading a chapter from
IBM Cloud Pak for Data
Published in: Nov 2021
Publisher: Packt
ISBN-13: 9781800562127
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image