Data lakes are places used to dump tons of potentially valuable data from multiple sources. Some sources will be IoT devices, while some sources will be internal company data such as production, purchasing, or customer service records. The concept is to put all of this variety of data in one place so it can be accessible through a unified interface. In the case of Hadoop, the data lake would be stored in HDFS and probably accessed through Hive or Spark.
Managing data lakes
When data lakes turn into data swamps
Swamps are formed when water flows into an area where it collects and stagnates. Algae covers over the water. When a data lake has a mass of raw data flowing in but no organization and little usage of it to mix up the...