Software testing

Software testing consists of the dynamic evaluation of the behavior of a program on a finite set of test cases, suitably selected from the usually infinite executions domain, against the expected behavior. The key concepts of this definition are depicted as follows:

Dynamic: The System Under Test (SUT) is executed with specific input values to find failures in its behavior. Thus, the actual SUT should ensure that the design and code are correct, and also the environment, such as the libraries, the operating system and network support, and so on.
Finite: Exhaustive testing is not possible or practical for most real programs. They usually have a large number of allowable inputs to each operation, plus even more invalid or unexpected inputs and the possible sequences of operations are usually infinite as well. Testers must choose a number of tests so that we can run the tests in the available time.
Selected: Since there is a huge or infinite set of possible tests and we can can afford to run only a small fraction of them, the key challenge of testing is how to select the tests that are most likely to expose failures in the system.
Expected: After each test execution, it must be decided whether the observed behavior of the system was a failure or not.

Software testing is a broad term encompassing a wide spectrum of different concepts. There is no universal classification for all the different testing forms available in the literature. For the shake of clarity, in this book we classify the different form of tests using three axis, namely testing level (unit, integration, system, and acceptance), testing methods (black-box, white-box, and non-functional testing), and testing types (manual and automated).

Next sections provide more details about all of these concepts, which are summarized in the following diagram:

Taxonomy of software testing in three categories: levels, methods, and types

For example, as we will discover, a JUnit test that exercises a method in a class according to its functional behaviour can be seen as an automated unit black-box test. When a final consumer uses a software product to validate if works as expected, according the taxonomy before we can see this as a manual black-box acceptance test. It should be noticed than not all possible combination of these three axes is always meaningful. For instance, non-functional tests (example, performance) is typically carried out automatically and at system levels (it would be very unlikely to do manually or at unit level).

Testing levels

Depending on the size of the SUT and the scenario in which it is exercised, testing can be carried out at different levels. In this book, we classify the different testing levels in four phases:

Unit testing: Here, individual program units are tested. Unit testing should focus on the functionality of objects or methods.
Integration testing: Here, units are combined to create composite components. Integration testing should focus on testing components, interfaces.
System testing: Here, all of the components are integrated and the system is tested as a whole.
Acceptance testing: Here, consumers decide whether or not the system is ready to be deployed in the consumer environment. It can be seen as a high-level functional testing performed at system level by final users or customers.

There is no universal classification in the many different forms of testing. Regarding testing levels, in this book, we use the aforementioned classification of four levels. Nevertheless, other levels or approaches are present in the literature (for example, system integration testing or regression testing). In the last part of this section, we can find a review of different testing approaches.

The first three levels (unit, integration, and system) are typically carried out during the development phases of the software life cycle. These tests are typically performed by different roles of software engineers (that is, programmers, testers, QA team, and so on). The objective of these tests is the verification of the system. On the other side, the fourth level (acceptance) is a type of user testing, in which potential or real users are usually involved (validation). The following picture provides a graphical description of these concepts:

Testing levels and its relationship with V&V

Unit testing

Unit testing is a method by which individual pieces of source code are tested to verify that the design and implementation for that unit have been correctly implemented. There are four phases executed in sequence in a unit test case are the following:

Setup: The test case initializes the test fixture, that is the before picture required for the SUT to exhibit the expected behavior.
Exercise: The test case interacts with the SUT, getting some outcome from it as a result. The SUT usually queries another component, named the Depended-On Component (DOC).
Verify: The test case determines whether the expected outcome has been obtained using assertions (also known as predicates).
Teardown: The test case tears down the test fixture to put the SUT back into the initial state.

These phases and its relationship with the SUT and DOC is illustrated as follows:

Unit test generic structure

Unit testing is done with the unit under test in isolation, that is, without interacting its DOCs. To that aim, test doubles are employed to replace any components on which the SUT depends. There are several kinds of test doubles:

A dummy object simply satisfies the real object API but it is never actually used. The typical use case for dummy objects is when they are passed as parameters to meet the method signature, but then the dummy object is not actually used.
A fake object replaces the real object with a simpler implementation, for example, an in-memory database.
A stub object replaces the real object providing hard-coded values as responses.
A mock object also replaces the real object, but this time with programmed expectations as responses.
A spy object is a partial mock object, meaning that some of its methods are programmed with expectations, but the others use the real object's implementation.

Integration testing

Integration testing should expose defects in the interfaces, and the interaction between integrated components or modules. There are different strategies for performing integration testing. These strategies describe the order in which units are to be integrated, presuming that the units have been separately tested. Examples of common integration strategies are the following:

Top-down integration: This strategy starts with the main unit (module), that is, the root of the procedural tree. Any lower-level module that is called by the main unit should be substituted by a test double. Once testers are convinced that the main unit logic is correct, the stubs are gradually replaced with the actual code. This process is repeated for the rest of the lower-unit in the procedural tree. The main advantage of this approach is that defects are more easily found.
Bottom-up integration: This strategy starts the testing process with the most elementary units. Larger subsystems are assembled from the tested components. The main advantage of this type is that test doubles are not needed.
Ad hoc integration: The components are integrated in the natural order in which are finished. It allows an early testing of the system. Test doubles are usually required.
Backbone integration: A skeleton of components is built and others are gradually integrated. The main disadvantage of this approach is the creation of the backbone, which can be labor-intensive.

Another strategy commonly referred in the literature is big-bang integration. In this strategy, testers wait until all or most of the units are developed e integrated. As a result, all the failures are found at the same time, making very difficult and time-consuming to correct the underlying faults. If possible, this strategy should be avoided.

System testing

System testing during development involves integrating components to create a version of the system and the testing the integrated system. It verifies that the components are compatible, interacts correctly, and transfer the right data at the right time, topically across its user interfaces. It obviously overlaps with integration testing, but the difference here is that system testing should involve all the system components together with the final user (typically impersonated).

There is an special type of system testing called end-to-end testing. In this approach, the final user is typically impersonated, that is, simulated using automation techniques.

Testing methods

Testing methods (or strategies) define the way for designing test cases. They can be responsibility based (black-box), implementation based (white box), or non-functional. Black-box techniques design test cases on the basis of the specified functionality of the item to be tested. White-box ones rely on source code analysis to develop test cases. Hybrid techniques (grey-box) testing designs test cases using both responsibility-based and implementation-based approaches.

Black-box testing

Black-box testing (also known as functional or behavioral testing) is based on requirements with no knowledge of the internal program structure or data. Black-box testing relies on the specification of the system or the component that is being tested to derive test cases. The system is a black-box whose behavior can only be determined by studying its inputs and the related outputs. There are a lot of specific black-box testing techniques; some of the most well-known ones are described as follows:

Systematic testing: This refers to a complete testing approach in which SUT is shown to conform exhaustively to a specification, up to the testing assumptions. It generates test cases only in the limiting sense that each domain point is a singleton sub-domain. Inside this category, some of the most commonly performed are equivalence partitioning and boundary value analysis, and also logic-based techniques, such as cause-effect graphing, decision table, or pairwise testing.
Random testing: This is literally the antithesis of systematic testing -the sampling is over the entire input domain-. Fuzz testing is a form of black-box random testing, which randomly mutates well-formed inputs and tests the program on the resulting data. It delivers randomly sequenced and/or structurally bad data to a system to see if failures occur.
Graphic User Interface (GUI) testing: This is the process of ensuring the specification of software with a graphic interface interacting with the user. GUI testing is event-driven (for example, mouse movements or menu selections) and provides a frontend to the underlying application code through messages or method calls. GUI testing at unit level is used typically at the button level. GUI testing at system level exercises the event-driven nature of the SUT.
Model-based testing (MBT): This is a testing strategy in which test cases are derived in part from a model that describes some (if not all) aspects of the SUT. MBT is a form of black-box testing because tests are generated from a model, which is derived from the requirements documentation. It can be done at different levels (unit, integration, or system).
Smoke testing: This is the process of ensuring the critical functionality of the SUT. A smoke test case is the first to be run by testers before accepting a build for further testing. Failure of a smoke test case will mean that the software build is refused. The name of smoke testing derives electrical system testing, whereby the first test was to switch on and see if it smoked.

Sanity testing: This is the process of ensuring the basic functionality of the SUT. Similarly to smoke testing, sanity tests are performed at the beginning of the test process, but its objective is different. Sanity tests are supposed to ensure that the SUT basic features continue working as expected (i.e. the rationality of the SUT), before conducting more exhaustive tests.

Smoke and sanity testing are usually confusing terms in the software testing community. It is commonly accepted that both kind of tests are performed to avoid wasting effort in rigorous testing when these tests fail, being the main difference their target (critical vs. basic functionality).

White-box testing

White-box testing (also known as structural testing) is based on knowledge of the internal logic of an application's code. It determines if the program-code structure and logic is faulty. White-box test cases are accurate only if the tester knows what the program is supposed to do.

Black-box testing uses only the specification to identify use cases, while white-box testing uses the program source code (implementation) as the basis of test case identification. Both approaches, used in conjunction, should be necessary in order to select a good set of test cases for the SUT. Some of the most significant white-box techniques are as follows:

Code coverage defines the degree of source code, which has been tested, for example, in terms of percentage of LOCs. There are several criteria for the code coverage:
1. Statement coverage: The line of code coverage granularity.
2. Decision (branch) coverage: Control structure (for example, if-else) coverage granularity.
3. Condition coverage: Boolean expression (true-false) coverage granularity.
4. Paths coverage: Every possible route coverage granularity.
5. Function coverage: Program functions coverage granularity.
6. Entry/exit coverage: Call and return of the coverage granularity.
Fault injection is the process of injecting faults into software to determine how well (or badly) some SUT behaves. Defects can be said to propagate, and in that case, their effects are visible in program states beyond the state in which the error existed (a fault became a failure).
Mutation testing validates tests and their data by running them against many copies of the SUT containing different, single, and deliberately inserted changes. Mutation testing helps to identify omissions in the code.

Non-functional testing

The non-functional aspects of a system can require considerable effort to test. Within this group it can be found different means of testing, for example, performance testing conducted to evaluate the compliance of a SUT with specified performance requirements. These requirements usually include constraints about the time behavior and resource usage. Performance testing may measure response time with a single user exercising the system or with multiple users exercising the system. Load testing is focused on increasing the load on the system to some stated or implied maximum load, to verify the system can handle the defined system boundaries. Volume testing is often considered synonymous with load testing, yet volume testing focuses on data. Stress testing exercises beyond normal operational capacity to the extent that the system fails, identifying actual boundaries at which the system breaks. The aim of stress testing is to observe how the system fails and where the bottlenecks are.

Security testing tries to ensure the following concepts: confidentiality (protection against the disclosure of information), integrity (ensuring the correctness of the information), authentication (ensuring the identity of the user), authorization (determining that a user is allowed to receive a service or perform an operation), availability (ensuring that the system performs its functionality when required), and non-repudiation (ensuring the denial that an action happened). Authorized attempts for evaluating the security of system infrastructure is often known as penetration testing.

Usability testing focuses on finding user interface problems, which may make the software difficult to use or may cause users to misinterpret the output. Accessibility testing is the technique of making sure that our product is accessibility (the ability to access the system functionality) compliant.

Testing types

There are two main types to carrying out software testing:

Manual testing: This is the process of assessing the SUT is done by a human, typically a software engineer or the final consumer. In this type of testing, we can find the so-called exploratory testing, which is a type of manual testing in which human testers evaluate the system by investigating and freely evaluating the system using its personal perception.
Automated testing: This is the process of assessing the SUT in which the testing process (test execution, reporting, and so on) is carried out with special software and infrastructure for testing. Elfriede Dustin, in her book Implementing Automated Software Testing: How to Save Time and Lower Costs While Raising Quality (2009), defined Automated Software Testing (AST) as the:

Application and implementation of software technology throughout the entire software testing life cycle with the goal to improve efficiencies and effectiveness.

The main benefits of AST are: anticipated cost savings, shortened test duration, heightened thoroughness of the tests performed, improvement of test accuracy, improvement of result reporting as well as statistical processing, and subsequent reporting.

Automated tests are typically executed in build servers in the context of Continuous Integration (CI) processes. More details about this are provided in chapter 7, Testing Management.

AST is most effective when implemented within a framework. Testing frameworks may be defined as a set of abstract concepts, processes, procedures and environments in which automated tests will be designed, created, and implemented. This framework definition includes the physical structures used for test creation and implementation, as well as the logical interactions among those components.

Strictly speaking, that definition of framework is not very far from what we can understand by library. In order to make the difference clearer, consider the following quote from the well-known software engineering guru Martin Folwer:

A library is essentially a set of functions that you can call, these days usually organized into classes. Each call does some work and returns control to the client. A framework embodies some abstract design, with more behavior built in. In order to use it you need to insert your behavior into various places in the framework either by subclassing or by plugging in your own classes. The framework's code then calls your code at these points.

Visual explanation of the difference between library and framework

Frameworks are becoming more and more important in modern software development. They provide a capability highly desired in software-intensive systems: reusability. This way, large applications will end up consisting of layers of frameworks that cooperate with each other.

Other testing approaches

As introduced at the beginning of this section, there is no an universal definition for the different forms of testing. In this section we review some of the most commonly varieties of testing available in the literature not covered so far. For instance, when the testing process is performed to determine whether the system meets its specifications, it is known as conformance testing. When a new feature or functionality is introduced to a system (we can call it a build), the way of testing this new feature in known as progression testing. In addition to that, to check that the new introduced changes do not affect the correctness of the rest of the system, the existing test cases are exercised. This approach is commonly known as regression testing.

When the system interacts with any external or third-party system, another testing could be done, known as system integration testing. This kind of testing verifies that the system is integrated to any external systems properly.

User or customer testing is a stage in the testing process in which users or customers provide input and advice for system testing. Acceptance testing is a type of user testing, but there can also be different types of user testing:

Alpha testing: This takes place at developers' sites, working together with the software's consumers, before it is released to external users or customers.
Beta testing: This takes place at customer's sites and involves testing by a group of customers who use the system at their own locations and provide feedback, before the system is released to other customers.
Operational testing: This is performed by the end user in its normal operating environment.

Finally, release testing refers to the process of testing a particular release of a system performed by a separate team outside the development team. The primary goal of the release testing process is to convince the supplier of the system that is good enough for use.