CHAID theory
CHAID is a decision tree model. It is one of the oldest and most popular data mining techniques. At its core, CHAID is based on the Chi-square test; it chooses the predictors that have the strongest interaction (give the largest Chi-square statistic) with the target field and then it divides the sample based on this predictor. It can use the Pearson or Likelihood ratio Chi-square statistic. In the figure shown next, notice that the Educated_fulltime
field is the most important predictor, because this is the field that has the largest Chi-square statistics. In fact, notice that the Chi-square statistic and percentages are exactly the same for both the CHAID model and the Chi-square test. In addition, if the categories of the predictor do not differ significantly from each other, the categories will be merged together. CHAID generates non-binary trees so it tends to create wider trees:
How CHAID processes different types of input variables
When using CHAID, or any algorithm, it...