Let us see the important elements of the CI process.
Elements of CI
Version control system
This is the most basic and the most important requirement for implementing CI. A Version Control System, sometimes also called a Revision Control System, is a tool to manage your code history. It can be centralized or distributed. Some of the famous centralized version control systems are SVN and IBM Rational ClearCase. In the distributed segment, we have tools like GIT and Mercurial.
Ideally, everything that is required to build software must be version controlled. A version control tool offers many features, such as tagging, branching, and so on.
Branching strategy
When using a Version Control System, keep the branching to a minimum. A few companies have only one main branch, and all the development activity happens on that. Nevertheless, most of the companies follow some branching strategies. This is because there is always a possibility that a part of the team may work on one release, while others may work on another release. Other times, there is a need to support the older release versions. Such scenarios always lead companies to use multiple branches.
GitFlow is another way of managing your code using multiple branches. In the following method, the Master/Production branch is kept clean and contains only the releasable, ready-to-ship code. All the development happens on the Feature branches, with the Integration branch serving as a common place to integrate all the features. The following diagram is a moderate version of the GitFlow:
GitFlow branching model
The following diagram illustrates the full version of GitFlow. We have a Master/Production branch that contains only the production-ready code. The Feature branches are where all of the development takes place. The Integration branch is where the code gets integrated and tested for quality. In addition to that, we have release branches that are pulled out from the Integration branch as and when there is a stable release. All bug fixes related to a release happen in the Release branches. There is also a Hotfix branch that is pulled out of the Master/Production branch as and when there is a need for a hotfix:
CI tool
What is a CI tool? Well, it is nothing more than an orchestrator. A CI tool is at the center of the CI system, connected to the Version Control System, build tools, Binary Repository Manager tool, testing and production environments, quality analysis tool, test automation tool, and so on. There are many CI tools: Build Forge, Bamboo, and TeamCity, to name a few. But the prime focus of our book is Jenkins:
A CI tool provides options to create pipelines. Each pipeline has its own purpose. There are pipelines to take care of CI. Some take care of testing; some take care of deployments, and so on. Technically, a pipeline is a flow of jobs. Each job is a set of tasks that run sequentially. Scripting is an integral part of a CI tool that performs various kinds of tasks. The tasks may be as simple as copying a folder/file from one location to the other, or they can be complex Perl scripts to monitor machines for file modifications. Nevertheless, the script is getting replaced by the growing number of plugins available in Jenkins. Now you need not script to build a Java code; there are plugins available for it. All you need to do is install and configure a plugin to get the job done. Technically, plugins are nothing but small modules written in Java. They remove the burden of scripting from the developer's head. We will learn more about pipelines in the upcoming chapters.
Self-triggered builds
The next important thing to understand is the self-triggered automated build. Build automation is simply a series of automated steps that compile the code and generate executables. The build automation can take the help of build tools like Ant and Maven. The self-triggered automated build is the most important part of a CI system. There are two main factors that call for an automated build mechanism:
- Speed.
- Catching integration or code issues as early as possible.
There are projects where 100 to 200 builds happen per day. In such cases, speed plays an important factor. If the builds are automated, then it can save a lot of time. Things become even more interesting if the triggering of the build is made self-driven, without any manual intervention. Auto-triggered build on every code change further saves time.
When builds are frequent and fast, the probability of finding an error (build error, compilation error, or integration error) in the framework of SDLC is higher and faster:
Code coverage
Code coverage is the amount of code (in percentage) that is covered by your test case. The metrics that you might see in your coverage reports could be more or less as defined in the following table:
Type of coverage |
Description |
Function |
The number of functions called out of the total number of functions defined |
Statement |
The number of statements in the program that are truly called out of the total number |
Branches |
The number of branches of the control structures executed |
Condition |
The number of Boolean sub-expressions that are being tested for a true and a false value |
Line |
The number of lines of source code that are being tested out of the total number of lines present inside the code |
This coverage percentage is calculated by dividing the number of items tested by the number of items found. The following screenshot illustrates the code coverage report from SonarQube:
Code coverage tools
You might find several options to create coverage reports, depending on the language(s) you use. Some of the popular tools are listed as follows:
Language |
Tools |
Java |
Atlassian Clover, Cobertura, JaCoCo |
C#/.NET |
OpenCover, dotCover |
C++ |
OpenCppCoverage, gcov |
Python |
Coverage.py |
Ruby |
SimpleCov |
Static code analysis
Static code analysis, also commonly called white-box testing, is a form of software testing that looks for the structural qualities of the code. For example, it answers how robust or maintainable the code is. Static code analysis is performed without actually executing programs. It is different from the functional testing, which looks into the functional aspects of software, and is dynamics.
Static code analysis is the evaluation of software's inner structures. For example, is there a piece of code used repetitively? Does the code contain lots of commented lines? How complex is the code? Using the metrics defined by a user, an analysis report is generated that shows the code quality regarding maintainability. It doesn't question the code's functionality.
Some of the static code analysis tools like SonarQube come with a dashboard, which shows various metrics and statistics of each run. Usually, as part of CI, the static code analysis is triggered every time a build runs. As discussed in the previous sections, static code analysis can also be included before a developer tries to check-in his code. Hence, a code of low quality can be prevented right at the initial stage.
They support many languages, such as Java, C/C++, Objective-C, C#, PHP, Flex, Groovy, JavaScript, Python, PL/SQL, COBOL, and so on. The following screenshots illustrate the static code analysis report using SonarQube:
Automated testing
Testing is an important part of an SDLC. To maintain quality software, it is necessary that the software solution goes through various test scenarios. Giving less importance to testing can result in customer dissatisfaction and a delayed product.
Since testing is a manual, time-consuming, and repetitive task, automating the testing process can significantly increase the speed of software delivery. However, automating the testing process is a bit more difficult than automating the build, release, and deployment processes. It usually takes a lot of effort to automate nearly all the test cases used in a project. It is an activity that matures over time.
Hence, when beginning to automate the testing, we need to take a few factors into consideration. Test cases that are of great value and easy to automate must be considered first. For example, automate the testing where the steps are the same, although they run with different data every time. Further, automate the testing where software functionality is tested on various platforms. Also, automate the testing that involves a software application running with different configurations.
Previously, the world was mostly dominated by desktop applications. Automating the testing of a GUI-based system was quite difficult. This called for scripting languages where the manual mouse and keyboard entries were scripted and executed to test the GUI application. Nevertheless, today the software world is completely dominated by web and mobile-based applications, which are easy to test through an automated approach using a test automation tool.
Once a code is built, packaged, and deployed, testing should run automatically to validate the software. Traditionally, the process followed is to have an environment for SIT, UAT, PT, and pre-production. First, the release goes through SIT, which stands for system integration testing. Here, testing is performed on an integrated code to check its functionality altogether. If the integration testing is passed, the code is deployed to the next environment, which is UAT, where it goes through user acceptance testing, and then it can lastly be deployed in PT, where it goes through performance testing. In this way, the testing is prioritized.
It is not always possible to automate all the testing. But, the idea is to automate whatever testing that is possible. The preceding method discussed requires the need to have many environments and also a higher number of automated deployments into various environments. To avoid this, we can go for another method where there is only one environment where the build is deployed, and then the basic tests are run, and after that, long-running tests are triggered manually.
Binary repository tools
As part of the SDLC, the source code is continuously built into binary artifacts using CI. Therefore, there should be a place to store these built packages for later use. The answer is, using a binary repository tool. But what is a binary repository tool?
A binary repository tool is a Version Control System for binary files. Do not confuse this with the Version Control System discussed in the previous sections. The former is responsible for versioning the source code, and the latter is for binary files, such as .rar, .war, .exe, .msi, and so on. Along with managing built artifacts, a binary repository tool can also manage 3-party binaries that are required for a build. For example, the Maven plugin always downloads the plugins required to build the code into a folder. Rather than downloading the plugins again and again, they can be managed using a repository tool:
From the above illustration, you can see as soon as a build gets created and passes all the checks, the built artifact is uploaded to the binary repository tool. From here, the developers and testers can manually pick them, deploy them, and test them. Or, if the automated deployment is in place, then the built artifacts are automatically deployed to the respective test environment. So, what're the advantages of using a binary repository?
A binary repository tool does the following:
- Every time a built artifact gets generated, it is stored in a binary repository tool. There are many advantages of storing the build artifacts. One of the most important advantages is that the build artifacts are located in a centralized location from where they can be accessed when needed.
- It can store third-party binary plugins, modules that are required by the build tools. Hence, the build tool need not download the plugins every time a build runs. The repository tool is connected to the online source and keeps updating the plugin repository.
- It records what, when, and who created a build package.
- It provides a staging like environments to manage releases better. This also helps in speeding up the CI process.
- In a CI environment, the frequency of build is too high, and each build generates a package. Since all the built packages are in one place, developers are at liberty to choose what to promote and what not to promote in higher environments.
Automated packaging
There is a possibility that a build may have many components. Let's take, for example, a build that has a .rar file as an output. Along with that, it has some Unix configuration files, release notes, some executables, and also some database changes. All of these different components need to be together. The task of creating a single archive or a single media out of many components is called packaging. Again, this can be automated using the CI tools and can save a lot of time.