Uninformed data
The following technique could be seen as something of a game changer in how most modern data scientists work. While it is common to work with structured and unstructured text, it is less common to work on raw binary data the reason being the gap between computer science and data science. Textual processing is limited to a standard set of operations that most will be familiar with, that is, acquiring, parsing and storing, and so on. Instead of restricting ourselves to these operations, we will work directly with audio transforming and enrich the uninformed signal data into informed transcription. In doing this, we enable a new type of data pipeline that is analogous to teaching a computer to hear the voice from audio files.
A second (breakthrough) idea that we encourage here is a shift in thinking around how data scientists engage with Hadoop and big data nowadays. While many still consider these technologies as just yet another database, we want to showcase the vast array...