Unit test and unit code design
Remember that first example – FizzBuzz. It took values from the command line and spat results out to the screen. To test it, we would have to test the entire system as a system. To automate that, we could do… something. It is possible, for example, to write a program that calls FizzBuzz from the command line, passing in inputs from a file, writing the results to a file, and comparing actual to expected results. That is what most customer-facing test automation looks like. That kind of automation is clunky, awkward, and painful.
Or, using the hexagonal model, we can break the line into independent components and test them. You saw this in the FizzBuzz example, where we essentially had three elements.
The main routine is as follows:
- Accept input, call
get_total_result
, and print it. Get_total_result
; loop from1
to input,calc_individual_result
, add to the total, and return it.calc_indvidual_result
; the business logic of divided by3
, divided by5
, or return value, goes here.
This works well for trivial examples that are started from scratch. Legacy code tends to have the following aspects:
- “God” objects that mix data and algorithms
- Connections to databases, sometimes with database names hard-coded
- Writing to filesystems
- Calls to external APIs that might or might not be production systems
- Extremely hard to set up objects
- Objects that “load” from a database
All these things conspire to make it difficult, if seemingly impossible, to isolate code to test it.
In his book, Working Effectively with Legacy Code (https://www.oreilly.com/library/view/working-effectively-with/0131177052/) Michael Feathers suggests finding seams for code – that is, logical ways to break methods up. The most common way to do this is to add new functionality as an independent method and test just that method. When you approach the edges, such as writing to a filesystem or the screen, you can act like our FizzBuzz example – make a calculator method and test that. Sometimes, the program interacts with a database, doing different things depending on what it finds. What then?
Joe Armstrong, the creator of Erlang, once explained that this is a natural consequence of a sort of naive adoption of object-oriented programming (OOP). As Armstrong put it, “Because the problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.” (Joe Armstrong, 2020, Banana Gorilla Jungle – OOP, https://medium.com/codemonday/banana-gorilla-jungle-oop-5052b2e4d588.)
In practice, when these problems surface, the programmers will simply abandon TDD. This is especially true if anything less than the entire team is brought into the work as the non-TDD team members will appear to make progress while “breaking the build” for anyone running tests before checking in code. As less and less of the code base contains tests, there will be more and more debugging, uncertainty about what code is executed, and, ultimately, attempts to “test quality in” by using the system end-to-end.
Our friends in the software quality world will say, “You can’t test in quality.” Strictly speaking, this is true – testing only shows the presence of (some of the) errors. In practice, however, there is a software method we call “code, test, debug, fix, recode, retest, redebug, refix, rerecode, reretest…” Another term for this is “Code it (at least) twice,” which involves delivering software that might barely be good enough late. Half of these problems come from poor components; the other half comes from a lack of understanding of the requirements.
The alternative is to make clean lines between components, which, again, is where naive OOP tends to fall. Again, you want to test the banana, but you will end up needing to create the entire jungle. In real software, that means you need a real database to run a unit test, that database needs to query a full data warehouse with predictable data, and suddenly running unit tests takes hours plus expensive cloud resources.
The common solution around this is a set of patterns called test doubles, using the example of a video game.
Using test doubles to create seams
Most of us are familiar with video games that can save the state – the player’s statistics. These include their level, experience points, class, equipment, and so on. That class is likely stored in an object that includes both data (the statistics) and an algorithm. A simple enough algorithm reads from a disk or database. In a naive implementation, a programmer would not be able to separate the two. The lowest level of automated test might be by calling the write_to_db()
function and then executing a query on the data to see if the database is populated correctly. Again, this will be a slow test – dozens to hundreds of times slower than one that could run in memory. To run it, we have to have a real database running and, if there is an error, we would need to work to debug, to figure out where the problem is and isolate the error. This style of thinking also leads to multiple implementations of the same feature as we might need a write_to_cloud()
feature when we migrate to Amazon Web Services, a write_to_API
, and so on.
In an object-oriented system, we likely represent the database represented by an object. There might be a base database object, and then children classes from it to write to every major database, such as Sybase, Oracle, Microsoft’s SQL Server, IBM’s DB II, and so on. Given a base class, it is possible to create a subclass of the database that does nothing at all. A stub might return the same value every time, say a success flag. A mock, on the other hand, might have some predefined behavior, returning success under some conditions and failure under others. A mock can also record how many times it is called. All of these fall into the broad category of test double. When testing an object that connects to other objects, the programmer passes the test double.
Let’s talk about how that works.
At runtime, our program calls the user->write()
method, including a writer
object. The writer
object includes the strategy to use (DB, database, or API), along with the actual connection. Here are the steps we follow to peel apart our program and test the internals as units:
- To test the main classes, we mock out the writer object and ask, “When write and read are called, how often are you, the writer called, and with what data?”
- To test the writer, we mock out the database connection and ask what information is sent to the database object. In this case, it would be
INSERT
orSELECT
statements to load an object. - A strategy pattern takes information and returns objects. In our case, it might take the text name of the object and the connection information. For a database, this might be
MS SQL SERVER
and a connection string. For a text file, this might beText file","C:\Matt.txt
. To test the strategy, pass in information, get the object back, and ask the object about itself.
Using test doubles in this way results in code that is separate and can be tested separately. Sometimes, the code already exists, and you have the whole jungle problem. Other times, you just don’t have the base classes in the right foundation. Most modern programming languages today have mocking libraries that are free or freely available that allow you to swap out objects with mocks at runtime.
It’s worth noting that one of the first public uses of mocking was for a cache. With a cache, you have a small amount of commonly used information somewhere (perhaps in memory) and access to much larger but slower information somewhere else (perhaps on disk). If the cache wasn’t working correctly, it might always read from disk, which would “work” but eliminate the value added by the cache. In this case, whether the read_from_disk()
method is called is important; it should not be if the data exists in memory. Thus the “how many times were you called and how” recording functionality of a mock becomes a part of the testing process.