Building a web scale news scanner
What makes data science different from statistics is the emphasis on scalable processing to overcome complex issues surrounding the quality and variety of the collected data. While statisticians work on samples of clean datasets, perhaps coming from a relational database, data scientists in contrast, work at scale with unstructured data coming from a variety of sources. While the former focuses on building models having high degrees of precision and accuracy, the latter often focuses on constructing rich integrated datasets that offer the discovery of less strictly defined insights. The data science journey usually involves torturing the initial sources of data, joining datasets that were theoretically not meant to be joined, enriching content with publicly available information, experimenting, exploring, discovering, trying, failing, and trying again. No matter the technical or mathematical skills, the main difference between an average and an expert data...