"Language is a process of free creation; its laws and principles are fixed, but the manner in which the principles of generation are used is free and infinitely varied. Even the interpretation and use of words involves a process of free creation."
– Noam Chomsky
Not all information exists in tables. From Wikipedia to social media, there are billions of written words that we would like our computers to process and extract bits of information from. The sub-field of machine learning that deals with textual data goes by names such as Text Mining and Natural Language Processing (NLP). These different names reflect the fact that the field inherits from multiple disciplines. On the one hand, we have computer science and statistics, and on the other hand, we have linguistics. I'd argue that the influence of linguistics was stronger when the field was at its infancy...