With big data streaming from various origins, in varying formats and models, and at various speeds, it is no surprise that we need to be able to ingest this data into a fast and scalable storage system that is flexible enough to serve many (current and future) analytical processes. When it comes to such a dynamic storage system, traditional storage systems don't fit big data challenges for streaming and batch processing applications. That is where the concept of data lakes comes in.
We can store all structured and unstructured data at various scales, in a centralized repository. Such a repository is referred to as a data lake. We can store our data as it is, without having to structure or preprocess the data. In addition to that, we can execute various types of analytics and visualizations, and even real-time analytics.