Derive – Formula
Deriving a field as a Formula
is extremely common, especially when working with continuous fields. Some of the most common methods of deriving a field as a formula include creating total scores, change scores and ratios. Our data file does not contain many continuous fields, however we do have several income related fields (capital gains, capital losses, and dividends) in the dataset that might be able to shed insight. Clearly, since we are predicting income, we cannot use the actual fields, capital gains, capital losses, or dividends, because all of these fields contribute to a person's overall income, and including them would create a biased model. However, rather than using actual investment dollars, we could investigate if having investments relates to income. Therefore, in order to determine if someone has investments, we first need to create a temporary field, which we will call Stock_numbers
, which simply adds up capital gains, capital losses, and dividends. If someone...