As explained briefly in the previous sections, the current state of enterprise data in an organization can be summarized in bullets points as follows:
- Conventional DW (Data Warehouse) /BI (Business Intelligence):
- Refined/ cleansed data transferred from production business application using ETL.
- Data earlier than a certain period would have already been transferred to a storage, which is hard to retrieve, such as magnetic tape storage.
- Some of its notable deficiencies are as follows:
- A subset of production data in a cleansed format exists in DW; for any new element in DW, effort has to be made
- A subset of the data is again in DW, and the rest gets transferred to permanent storage
- Usually, analysis is really slow, and it is optimized again to perform queries, which are, to an extent, defined
- Siloed Big Data:
- Some departments would have taken the right step in building big data. But departments generally don’t collaborate with each other, and this big data becomes siloed and doesn't give the value of a true big data for the enterprise.
- Some of its deficiencies are as follows:
- Because of its siloed nature, the analyst is again constrained and not able to mix and match data between departments.
- A good amount of money would have been spent to build and maintain/manage this and usually over a period of time is not sustainable.
- Myriad of non-connected applications:
- There is a good amount of applications on premises and on cloud.
- Applications apart from churning structured data also produce unstructured data.
- Some of the deficiencies are as follows:
- Don't talk to each other
- Even if it talks, data scientists are not able to use it in an effective way to transform the enterprise in a meaningful way
- Replication of technology usage for handling many aspects in each business application
We wouldn't say that creating or investing in Data lake is a silver bullet to solve all the aforementioned deficiencies. But it is definitely a step in the right direction, and every enterprise should at least spend some time discussing whether this is indeed required, and if it is a yes, don't deliberate over it too much and take the next step in the path of implementation.
Data lake is an enterprise initiative, and when built, it has to be with the consent of all the stakeholders, and it should have buy-ins from the top executives. It can definitely find ways to improve processes by which enterprises do business. It can help the higher management know more about their business and can increase the success rate of the decision-making process.