Data and predictability – the oracle problem
Once you’ve randomly created a wiki page with a bunch of comments and tags, how do you know what the correct page should look like? In practice, you dump the text to something called wikimarkup and, as the test runs, generate what the wikimarkup should be. Another term for that is an oracle, which Dr. Cem Kaner describes (https://kaner.com/?p=190) as a tool that helps you decide whether the program passed your test:
Figure 1.16 – Example insurance app
Google’s mortgage calculator, for example, takes four inputs: Loan amount, Loan term, Interest, and Include taxes & fees, and spits out the monthly payment. It might be possible to loop through thousands of inputs and get the answers. To know they are correct, you might have someone else code up a second version of the system. Comparing the answers doesn’t prove correctness (nothing can), but it might at least demonstrate that if a mistake were made, it was reasonable to make such a mistake.
When we’ve made such automated oracles, we generally try to have them separated as much as possible. Have a second person write the oracle, someone with a different background, using a different programming language. This prevents the “made the same round-off error” sorts of mistakes. In our experience, when oracles make the same mistake as the software, there are often interpretation errors in the requirements, or elements left blank by the requirements. Truly random data will tend to help find the hidden equivalence classes in the data.
Oracles can come from anywhere. Your knowledge, dictionary spellings, the knowledge that the (E)dit menu should be to the right of (F)ile, prior experience, the specification… all of these can be oracles. Oracles can also be incorrect. Candidates who run the palindrome problem and are well-educated often cite palindrome sentences, such as, “Was it a car or a cat I saw?” and expect the spaces and punctuation to be ignored. Their oracle spots a problem, but the customer wants to just reverse the text and compare, so the sentence “should” fail.
These ideas of a fallible method to solve a problem are sometimes called heuristic. Heuristics, or, as we joke, “Heusseristics” are integral to testing because we combine a large variety of techniques and timing aspects to figure out if the software is correct.
The final problem this chapter will introduce is test data. The act of running the test generally pollutes the database the test runs over. In many cases, running the test a second time will create a new and different result. Clearing out the database before each run is tempting; it is the approach we usually take for programmer’s units or micro-tests. Yet there are often problems that only occur as data builds over time. To save time, some companies like to run more than one test at the same time, or more than one tester, and these tests can step on each other. That means separating the data used in testing, tracking it, and coming up with a strategy to optimize between simple/repeating and powerful/longer-running can make or break a test effort. We’ll come back to test data in Chapter 7.