Text fraud detection
Fraud has become an issue beyond the traditional transaction fraud. Many websites, for instance, rely on user reviews about services, such as restaurants, hotels or tourist attractions, that are monetized in different ways. If the users lose trust in those reviews, for example, by a business owner deliberately messing with the good reviews for his or her own business, then the website will find it hard to regain that trust and to remain profitable. Hence, it is important to detect such potential issues.Â
How can autoencoders help us with this? As before, the idea is to learn the representation of a normal review on a website, and then find those that do not fit the normal review. The issue with text data is that there is some processing to be done before. We will illustrate this with an example, which will also serve as a motivation for the different ways of modelling text that will be discussed in the next chapters.
From unstructured text data to a matrix
An issue with...