Failing Benford's Law
So far, we've seen several datasets, all of which conform to Benford's Law, most of them quite strongly. We haven't yet seen a dataset that does not conform to this distribution of initial digits. What would a failing dataset look like?
There are many ways in which we could get data that doesn't conform. Any linear data, for example, would have a more uniform distribution of the initial digits. However, we can also simulate fraudulent data easily, and in the process, we can learn just how much noise a dataset can handle before Benford's Law begins to have trouble with it.
We'll start this experiment with the population data that we looked at earlier. We'll progressively introduce more and more junk into the dataset. We'll randomly replace items in the dataset with a random value and re-run incanter.stats/benford-test
on it. When it finally fails, we can note how many items we've replaced and how far off the new distribution is.
The primary function is shown as follows...