Prioritizing quality over quantity
Since Hadoop came along in 2006 and significantly reduced the cost of storing big data, data engineers have often been focused on how much data they can bring in centrally, with the assumption that they’ll use it to create value later. But by prioritizing quantity over quality, many organizations found it took so much effort to use this data that in practice they just couldn’t justify it. That left them with dark data that was poorly managed and increased the risk of misuse and leaks.
In fact, a report from Seagate found that only 32% of data available to an organization is utilized. That leaves 68% of your data incurring costs, both monetarily and in increased risk, without generating any value.17
It’s the data producers that are responsible for the quality of their data. But the data consumers are still responsible for supporting that investment. Let’s define the roles and responsibilities of both those groups...