DaaP
Historically, data has always been treated as a backend. It was used by the middle tier and then surfaced to the frontend. Applications did not do a lot with the data other than aggregating and presenting it with better visuals. Relational database systems also ensured that data adhered to a schema and format and that all mandatory fields were populated. As a result, applications received quality data and had to do minimal checks on quality. But with semi-structured data, this equation changes. Semi-structured data does not comply with fixed schemas and rules of how data is formatted and populated. Advanced analytics, ML, and big data analytics need a lot of processing on the data before it’s consumed by any algorithm and application. ML algorithms provide exponentially accurate output as the volume of quality data increases.
In a paper published in 2001 (https://homl.info/6), Microsoft researchers Michele Banko and Erik Brill showed that different ML algorithms performed...