Collect – making data simple and accessible
The Collect layer is about putting your data in the appropriate persistence store to efficiently collect and access all your data assets. A well-architected "Collect" rung allows an organization to leverage the appropriate data store based on the use case and user persona; whether it's Hadoop for data exploration with data scientists, OLAP for delivering operational reports leveraging business intelligence or other enterprise visualization tools, NoSQL databases such as MongoDB for rapid application development, or some mixture of them all, you have the flexibility to deliver this in a single, integrated manner with the Common SQL Engine.
IBM offers some of the best database technology in the world for addressing every type of data workload, from Online Transactional Processing (OLTP) to Online Analytical Processing (OLAP) to Hadoop to fast data. This allows customers to quickly change as their business and application needs change. Furthermore, IBM layers a Common SQL Engine across all its persistence stores to be able to write SQL once, and leverage your persistence store of choice, regardless of whether it is IBM Db2 or open source persistence stores such as MongoDB or Hadoop. This allows for portable applications and saves enterprises significant time and money that would typically be spent on rewriting queries for different flavors of persistence. Also, this enables a better experience for end users and a faster time to value.
IBM's Db2 technology is enabled for natural language queries, which allows non-SQL users to search through their OLTP store using natural language. Also, Db2 supports Augmented Data Exploration (ADE), which allows users to access the database and visualize their datasets through automation (as opposed to querying data using SQL).
To summarize, Collect is all about collecting data to capture newly created data of all types, and then bringing it together across various silos and locations to make it accessible for further use (up the AI ladder). In IBM, the Collect rung of the AI ladder is characterized by three key attributes:
- Empower: IT architects and developers in enterprises are empowered as they are offered a complete set of fit-for-purpose data capabilities that can handle all types of workloads in a self-service manner. This covers all workloads and data types, be it structured or unstructured, open source or proprietary, on-premises or in the cloud. It's a single portfolio that covers all your data needs.
- Simplify: One of the key tenets of simplicity is enabling self-service, and this is realized rather quickly in a containerized platform built using cloud-native principles. For one, provisioning new data stores involves a simple click of a button. In-place upgrades equate to zero downtime, and scaling up and down is a breeze, ensuring that enterprises can quickly react to business needs in a matter of minutes as opposed to waiting for weeks or months. Last but not least, IBM is infusing AI into its data stores to enable augmented data exploration and other automation processes.
- Integrate: Focuses on the need to make data accessible and integrate well with the other rungs of the AI ladder. Data virtualization, in conjunction with data governance, enables customers to access a multitude of datasets in a single view, with a consistent glossary of business terms and associated lineage, all at your fingertips. This enables the democratization of enterprise data accelerating AI initiatives and driving automation to your business. The following diagram summarizes the key facets of the Collect rung of the AI ladder:
Our portfolio of capabilities, all of which support the Collect rung, can be categorized into four workload domains in the market:
- First, there's the traditional operational database. This is your system of records, your point of sales, and your transactional database.
- Analytics databases are in high demand as the amount of data is exploding. Everyone is looking for new ways to analyze data at scale quickly, all the way from traditional reporting to preparing data for training and scoring AI models.
- Big data. The history of having a data lake using Hadoop at petabyte scale is now slowly transforming into the separation of storage and compute, with Cloud Object Storage and Spark playing key roles. The market demand for data lakes is clearly on an upward trajectory.
- Finally, IoT is quickly transforming several industries, and the fast data area is becoming an area of interest. This is the market of the future, and IBM is addressing requirements in this space through real-time data analysis.
Next, we will explore the importance of organizing data and what it entails.