First, we'll take a look at suspicious behavior detection, where the goal is to learn about patterns of fraud, which corresponds to modeling known knowns.
Fraud detection in insurance claims
Dataset
We'll work with a dataset describing insurance transactions, which is publicly available in the Oracle database online documentation at http://docs.oracle.com/cd/B28359_01/datamine.111/b28129/anomalies.htm.
The dataset describes insurance claims on vehicle incidents for an undisclosed insurance company. It contains 15,430 claims; each claim is comprised of 33 attributes, describing the following components:
- Customer demographic details (Age, Sex, MartialStatus, and so on)
- Purchased policy (PolicyType, VehicleCategory...