Human-intensive processes only work to make sense of data that doesn't scale as data volume increases. For example, people used to put each and every record of the company record in some sort of spreadsheet, and then it is very difficult to find and analyze that information once the volume or velocity of information increases.
Traditional systems use batch jobs, scheduling them on daily, weekly, or monthly bases to migrate data into separate servers or into data warehouses. This data has schema and is categorized as structured data. It will then go through the processing and analysis cycle to create datasets and extract meaningful information. These data warehouses are optimized for reporting and analytics purposes only. This is the core concept of business intelligence (BI) systems. BI systems store this data in relational database systems. The following diagram illustrates an architecture of a traditional BI system:
Business intelligence system architecture
The main issue with this approach is the latency rate. The reports made available for the decision makers are not real-time and are mostly days or weeks old, dependent on fewer sources having static data models.
This is an era of technological advancement and everything is moving very rapidly. The faster you respond to your customer, the happier your customer will be and it will help your business to grow. Nowadays, the source of information is not just your transactional databases or a few other data models; there are many other data sources that can affect your business directly and if you don't capture them and include them in your analysis, it will hit you hard. These sources include blog posts and web reviews, social media sources, posts, tweets, photos, videos, and audio. And it is not just these sources; logs generated by sensors, commonly known as IoTs (Internet of Things), in your smartphone, appliances, robot and autonomous systems, smart grids, and any devices that are collecting data, can now be utilized to study different behaviors and patterns, something that was unimaginable in the past.
It is a fact that every type of data is increasing, but especially IoTs, which generate logs of each and every event automatically and continuously. For example, a report shared by Intel states that an autonomous car generates and consumes 4 TB of data each day, and this is from just an hour of driving. This is just one source of information; if we consider all the previously mentioned sources, such as blogs and social media, it will not just make a few terabytes. Here, we are talking about exabytes, zettabytes, or even yottabytes of data.
We have talked about data that is increasing, but it is not just that; the types of data are also growing. Fewer than 20% of the types have definite schema. The other 80% is just raw data without any specific pattern which cannot reside in traditional relational database systems. This 80% of data includes videos, photos, and textual data including posts, articles, and emails.
Now, if we consider all these characteristics of data and try to implement this in a traditional BI solution, it will only be able to utilize 20% of your data, because the other 80% is just raw data, making it outreached for relational database systems. In today's world, people have realized that the data that they considered to be of no use can actually make a big difference to decision making and to understanding different behaviors. Traditional business solutions, however, are not the correct approach to analyze data with these characteristics, as they mostly work with definite schema and on batch jobs that produce results after a day, week or month.