The Big Data ecosystem
For a beginner, the landscape can be utterly confusing. There is vast arena of technologies and equally varied use cases. There is no single go-to solution; every use case has a custom solution and this widespread technology stack and lack of standardization is making Big Data a difficult path to tread for developers. There are a multitude of technologies that exist which can draw meaningful insight out of this magnitude of data.
Let's begin with the basics: the environment for any data analytics application creation should provide for the following:
- Storing data
- Enriching or processing data
- Data analysis and visualization
If we get to specialization, there are specific Big Data tools and technologies available; for instance, ETL tools such as Talend and Pentaho; Pig batch processing, Hive, and MapReduce; real-time processing from Storm, Spark, and so on; and the list goes on. Here's the pictorial representation of the vast Big Data technology landscape, as per Forbes:
It clearly depicts the various segments and verticals within the Big Data technology canvas:
- Platforms such as Hadoop and NoSQL
- Analytics such as HDP, CDH, EMC, Greenplum, DataStax, and more
- Infrastructure such as Teradata, VoltDB, MarkLogic, and more
- Infrastructure as a Service (IaaS) such as AWS, Azure, and more
- Structured databases such as Oracle, SQL server, DB2, and more
- Data as a Service (DaaS) such as INRIX, LexisNexis, Factual, and more
And, beyond that, we have a score of segments related to specific problem area such as Business Intelligence (BI), analytics and visualization, advertisement and media, log data and vertical apps, and so on.