A structured life is a good life
When learning about the benefits of Spark and big data, you may have heard discussions about structured data versus semi-structured data versus unstructured data. While Spark promotes the use of structured, semi-structured, and unstructured data, it also provides the basis for its consistent treatment. The only constraint being that it should be record-based. Providing they are record-based, datasets can be transformed, enriched and manipulated in the same way, regardless of their organization.
However, it is worth noting that having unstructured data does not necessitate taking an unstructured approach. Having identified techniques for exploring datasets in the previous chapter, it would be tempting to dive straight into stashing data somewhere accessible and immediately commencing simple profiling analytics. In real life situations, this activity often takes precedence over due diligence. Once again, we would encourage you to consider several key...