Background of data lakes
Let's start with the definition of a data lake:
A data lake is an environment where you collect and store (vast amounts of) raw data in its original format.
The term data lake comes from a water analogy. You can make money using water, just as you can make money by using data. But you will need to store the water somewhere until you find a use case for it. You don't necessarily know beforehand what that use case is going to be. This means you need a cheap and easy way to store the water. Putting all your water in bottles is optimal when you want to sell it as drinking water. But pouring water out of a bottle over a house that is on fire in the hope of extinguishing the fire would probably be useless. So, you wouldn't bottle it until you started selling drinking water:
Data is analogous to water. When you store it for later use, but you don't necessarily know all the (possible) use cases (yet...