Using a full data model/partial data model approach to address missing data
It is common in data mining to have one category of customers more prone to having missing data. In fact, there may be a category of customers that are assured to have certain data missing. For instance, let's say that you have found in running your cell phone business that calculating the distance in time between phone upgrades is useful in estimating when the customer's next phone upgrade will be. A newly acquired customer will not have any prior phone history in the data set, but it would be risky to assume that your established customers are the same as your new customers.
How then to estimate the value of average months between new phones? One approach is to simply avoid the problem, and build a different model for your new customers and your established customers. In this recipe, we will learn how to diagnose the pattern of missing data and determine if this technique applies.
Getting ready
We will...