Building data lakes to tame the variety and volume of big data
Along with the rise of new data types and increasing data volumes, we have seen an increase in the ways that organizations look to draw insights from data. Machine learning in particular has become a popular tool for analytics, enabling organizations to automatically extract metadata from unstructured data sources, which can then be analyzed with traditional analytic tools:
- Creating automated transcripts of call center audio recordings
- Using natural language processing (NLP) algorithms to extract sentiment data from text
- Identifying objects, people, and expressions in an image
As we saw in the previous section, enterprise data warehouses have been the go-to repositories for storing highly structured tabular data sourced from traditional run-the-business transactional applications. But the lack of a well-defined tabular structure makes unstructured and semi-structured data unsuitable for storing in...