Representing text
Language is one of the most complex aspects of our existence. We use language to communicate our thoughts and choices. Every language is defined with a list of characters called the alphabet, a vocabulary, and a set of rules called grammar. Yet it is not a trivial task to understand and learn a language. Languages are complex and have fuzzy grammatical rules and structures.
Text is a representation of language that helps us communicate and share. This makes it a perfect area of research to expand the horizons of what artificial intelligence can achieve. Text is a type of unstructured data that cannot directly be used by any of the known algorithms. Machine learning and deep learning algorithms in general work with numbers, matrices, vectors, and so on. This, in turn, raises the question: how can we represent text for different language-related tasks?
Bag of Words
As we mentioned earlier, every language consists of a defined list of characters (alphabet...