Admitting that our code is difficult to work with is one thing, but to move past that and write good code, we need to understand why code is bad. Let’s identify the technical issues.
Bad variable names
Good code is self-describing and safe to change. Bad code is not.
Names are the most critical factor in deciding whether code will be easy to work with or not. Good names tell the reader clearly what to expect. Bad names do not. Variables should be named according to what they contain. They should answer “why would I want to use this data? What will it tell me?”
A string variable that has been named string
is badly named. All we know is that it is a string. This does not tell us what is in the variable or why we would want to use it. If that string represented a surname, then by simply calling it surname
, we would have helped future readers of our code understand our intentions much better. They would be able to easily see that this variable holds a surname and should not be used for any other purpose.
The two-letter variable names we saw in the listing in Figure 1.1 represented a limitation of the BASIC language. It was not possible to do better at the time, but as we could see, they were not helpful. It is much harder to understand what sn
means than surname
, if that’s what the variable stores. To carry that even further, if we decide to hold a surname in a variable named x
, we have made things really difficult for readers of our code. They now have two problems to solve:
- They have to reverse-engineer the code to work out that
x
is used to hold a surname
- They have to mentally map
x
with the concept of surname every time that they use it
It is so much easier when we use descriptive names for all our data, such as local variables, method parameters, and object fields. In terms of more general guidelines, the following Google style guide is a good source: https://google.github.io/styleguide/javaguide.html#s5-naming.
Best practice for naming variables
Describe the data contained, not the data type.
We now have a better idea of how to go about naming variables. Now, let’s look at how to name functions, methods, and classes properly.
Bad function, method, and class names
The names of functions, methods, and classes all follow a similar pattern. In good code, function names tell us why we should call that function. They describe what they will do for us as users of that function. The focus is on the outcome – what will have happened by the time the function returns. We do not describe how that function is implemented. This is important. It allows us to change our implementation of that function later if that becomes advantageous, and the name will still describe the outcome clearly.
A function named calculateTotalPrice
is clear about what it is going to do for us. It will calculate the total price. It won’t have any surprising side effects. It won’t try and do anything else. It will do what it says it will. If we abbreviate that name to ctp
, then it becomes much less clear. If we call it func1
, then it tells us absolutely nothing at all that is useful.
Bad names force us to reverse-engineer every decision made every time we read the code. We have to pore through the code to try and find out what it is used for. We should not have to do this. Names should be abstractions. A good name will speed up our ability to understand code by condensing a bigger-picture understanding into a few words.
You can think of the function name as a heading. The code inside the function is the body of text. It works just the same way that the text you’re reading now has a heading, Recognizing bad code, which gives us a general idea of the content in the paragraphs that follow. From reading the heading, we expect the paragraphs to be about recognizing bad code, nothing more and nothing less.
We want to be able to skim-read our software through its headings – the function, method, class, and variable names – so that we can focus on what we want to do now, rather than relearning what was done in the past.
Method names are treated identically to function names. They both describe an action to be taken. Similarly, you can apply the same rules for function names to method names.
Best practice for method and function names
Describe the outcome, not the implementation.
Again, class names follow descriptive rules. A class often represents a single concept, so its name should describe that concept. If a class represents the user profile data in our system, then a class name of UserProfile
will help readers of our code to understand that.
A name’s length depends on namespacing
One further tip applies to all names with regard to their length. The name should be fully descriptive but its length depends on a few factors. We can choose shorter names when one of the following applies:
- The named variable has a small scope of only a few lines
- The class name itself provides the bulk of the description
- The name exists within some other namespace, such as a class name
Let’s look at a code example for each case to make this clear.
The following code calculates the total of a list of values, using a short variable name, total
:
int calculateTotal(List<Integer> values) {
int total = 0;
for ( Integer v : values ) {
total += v;
}
return total ;
}
This works well because it is clear that total
represents the total of all values. We do not need a name that is any longer given the context around it in the code. Perhaps an even better example lies in the v
loop variable. It has a one-line scope, and within that scope, it is quite clear that v
represents the current value within the loop. We could use a longer name such as currentValue
instead. However, does this add any clarity? Not really.
In the following method, we have a parameter with the short name gc
:
private void draw(GraphicsContext gc) {
// code using gc omitted
}
The reason we can choose such a short name is that the GraphicsContext
class carries most of the description already. If this were a more general-purpose class, such as String
, for example, then this short name technique would be unhelpful.
In this final code example, we are using the short method name of draw()
:
public class ProfileImage {
public void draw(WebResponse wr) {
// Code omitted
}
}
The class name here is highly descriptive. The ProfileImage
class name we’ve used in our system is one that is commonly used to describe the avatar or photograph that shows on a user’s profile page. The draw()
method is responsible for writing the image data to a WebResponse
object. We could choose a longer method name, such as drawProfileImage()
, but that simply repeats information that has already been made clear given the name of the class. Details such as this are what give Java its reputation for being verbose, which I feel is unfair; it is often us Java programmers who are verbose, rather than Java itself.
We’ve seen how properly naming things makes our code easier to understand. Let’s take a look at the next big problem that we see in bad code – using constructs that make logic errors more likely.
Error-prone constructs
Another tell-tale sign of bad code is that it uses error-prone constructs and designs. There are always several ways of doing the same thing in code. Some of them provide more scope to introduce mistakes than others. It therefore makes sense to choose ways of coding that actively avoid errors.
Let’s compare two different versions of a function to calculate a total value and analyze where errors might creep in:
int calculateTotal(List<Integer> values) {
int total = 0;
for ( int i=0; i<values.size(); i++) {
total += values.get(i);
}
return total ;
}
The previous listing is a simple method that will take a list of integers and return their total. It’s the sort of code that has been around since Java 1.0.2. It works, yet it is error prone. In order for this code to be correct, we need to get several things right:
- Making sure that
total
is initialized to 0
and not some other value
- Making sure that our
i
loop index is initialized to 0
- Making sure that we use
<
and not <=
or ==
in our loop comparison
- Making sure that we increment the
i
loop index by exactly one
- Making sure that we add the value from the current index in the list to
total
Experienced programmers do tend to get all this right first time. My point is that there is a possibility of getting any or all of these things wrong. I’ve seen mistakes made where <=
has been used instead of <
and the code fails with an ArrayIndexOutOfBounds
exception as a result. Another easy mistake is to use =
in the line that adds to the total value instead of +=
. This has the effect of returning only the last value, not the total. I have even made that mistake as a pure typo – I honestly thought I had typed the right thing but I was typing quickly and I hadn’t.
It is clearly much better for us to avoid these kinds of errors entirely. If an error cannot happen, then it will not happen. This is a process I call designing out errors. It is a fundamental clean-code practice. To see how we could do this to our previous example, let’s look at the following code:
int calculateTotal(List<Integer> values) {
return values.stream().mapToInt(v -> v).sum();
}
This code does the same thing, yet it is inherently safer. We have no total
variable, so we cannot initialize that incorrectly, nor can we forget to add values to it. We have no loop and so no loop index variable. We cannot use the wrong comparison for the loop end and so cannot get an ArrayIndexOutOfBounds
exception. There is simply far less that can go wrong in this implementation of the code. It generally makes the code clearer to read as well. This, in turn, helps with onboarding new developers, code reviews, adding new features, and pair programming.
Whenever we have a choice to use code with fewer parts that could go wrong, we should choose that approach. We can make life easier for ourselves and our colleagues by choosing to keep our code as error-free and simple as possible. We can use more robust constructs to give bugs fewer places to hide.
It is worth mentioning that both versions of the code have an integer overflow bug. If we add integers together whose total is beyond the allowable range of -2147483648 to 2147483647, then the code will produce the wrong result. The point still stands, however: the later version has fewer places where things can go wrong. Structurally, it is simpler code.
Now that we have seen how to avoid the kinds of errors that are typical of bad code, let’s turn to other problem areas: coupling and cohesion.
Coupling and cohesion
If we have a number of Java classes, coupling describes the relationship between those classes, while cohesion describes the relationships between the methods inside each one.
Our software designs become easier to work with once we get the amounts of coupling and cohesion right. We will learn techniques to help us do this in Chapter 7, Driving Design–TDD and SOLID. For now, let’s understand the problems that we will face when we get this wrong, starting with the problem of low cohesion.
Low cohesion inside a class
Low cohesion describes code that has many different ideas all lumped together in it in a single place. The following UML class diagram shows an example of a class with low cohesion among its methods:
Figure 1.2 – Low cohesion
The code in this class attempts to combine too many responsibilities. They are not all obviously related – we are writing to a database, sending out welcome emails, and rendering web pages. This large variety of responsibilities makes our class harder to understand and harder to change. Consider the different reasons we may need to change this class:
- Changes to the database technology
- Changes to the web view layout
- Changes to the web template engine technology
- Changes to the email template engine technology
- Changes to the news feed generation algorithm
There are many reasons why we would need to change the code in this class. It is always better to give classes a more precise focus, so that there are fewer reasons to change them. Ideally, any given piece of code should only have one reason to be changed.
Understanding code with low cohesion is hard. We are forced to understand many different ideas at once. Internally, the code is very interconnected. Changing one method often forces a change in others because of this. Using the class is difficult, as we need to construct it with all its dependencies. In our example, we have a mixture of templating engines, a database, and code for creating a web page. This also makes the class very difficult to test. We need to set up all these things before we can run test methods against that class. Reuse is limited with a class like this. The class is very tightly bound to the mix of features that are rolled into it.
High coupling between classes
High coupling describes where one class needs to connect to several others before it can be used. This makes it difficult to use in isolation. We need those supporting classes to be set up and working correctly before we can use our class. For the same reason, we cannot fully understand that class without understanding the many interactions it has. As an example, the following UML class diagram shows classes with a high degree of coupling between each other:
Figure 1.3 – High coupling
In this fictitious example of a sales tracking system, several of the classes need to interact with each other. The User
class in the middle couples to four other classes: Inventory
, EmailService
, SalesAppointment
, and SalesReport
. This makes it harder to use and test than a class that couples to fewer other classes. Is the coupling here too high? Maybe not, but we can imagine other designs that would reduce it. The main thing is to be aware of the degree of coupling that classes have in our designs. As soon as we spot classes with many connections to others, we know we are going to have a problem understanding, maintaining, and testing them.
We’ve seen how the technical elements of high coupling and low cohesion make our code difficult to work with, but there is a social aspect to bad code as well. Let’s consider the effect bad code has on the development team.