While a theoretical understanding of what patterns and anti-patterns are and how they might generally be helpful is great, you are no doubt looking for more from this book than just a theoretical discussion.
In this section, we will go through the value of learning from mistakes, both your own and other people’s, and show you how we are going to use anti-patterns to sharpen your architecture chops in the context of a concrete example.
How great architects learn from mistakes
We work in an industry where failure is a normal occurrence. The Standish Group CHAOS report, which is the most commonly cited source on these matters, in 2020 estimated that 66% of all technology projects globally ended in partial or complete failure.
This is a little bit better than when I started my career more than 20 years ago. Then, the figure stood in the low 70s. However, while things have improved, they have only improved a little bit, despite agile development, cloud computing, artificial intelligence, and great software platforms such as Salesforce.
This is disheartening, but only proves the point of one of my personal heroes, Fred Brooks, that “The complexity of software is an essential property, not an accidental one," which unfortunately means that we will never find a silver bullet to solve all problems in software architecture and design. Instead, we are faced with the hard work of learning how to manage this complexity in a reasonable way. To do so is the principal job of an architect.
It stands to reason that in an area with high failure rates and irreducible complexity, we need to have good rules and guidelines to keep us on the right path. That is what patterns and best practices are for. They are indispensable, but they are not enough.
To become great at our profession and to be able to design software that bucks the industry trend, we need not just learn from our own failures but also from the vast repository of failed projects and bad practices we see all around us.
That usually isn’t hard to do. Many times, when the architect is brought into a project, it is because there is already an issue to fix. Using these occurrences as learning opportunities and analyzing them with that view in mind can be greatly rewarding.
However, there is a step further to go in this direction, which is what anti-patterns offer. They encompass the ways in which things frequently go wrong in a way that allows both post-hoc learning and also real-time intervention.
The thing is that while projects go wrong, they don’t do so randomly. There are systematic patterns that repeat time and again. Learning how things go wrong in systematic ways can give you an entirely new set of responses in your toolbox that you can deploy to help make your project one of the 34% that don’t fail.
We will start that learning journey with an explanatory example.
An example: The Hero anti-pattern
There is no better way to start learning than using an example. We won’t have the chance to cover many general management-level anti-patterns in this book, so I will use one of the classics in this genre to show you how the template works and how to read it to get the most out of it.
First, we will present the anti-pattern and then provide an analysis of what we can learn from it.
Hero (development life cycle and deployment planning)
Tom is the project manager for a large greenfield Salesforce implementation in the manufacturing industry working with Sales and Service Cloud for 2,000 users. The project is meant to be quite simple, a basic MVP to get the platform off the ground, and that is how it has been scoped and staffed. The project is meant to go into full production after six months with four months of implementation followed by test, training, rollout, and hypercare.
The first three months of implementation fly by and everything on the project management dashboard stays green. The team makes steady progress and relations with the external consultancy that is helping provide specialist resources remain good.
However, when Tom delivers the first round of business reviews, things start to change quickly. It turns out that a lot of detailed requirements have been missed from the initial scope and that many edge cases aren’t covered by the current configuration. The feedback from the business is that they won’t be able to go live with the system unless a large list of additional functionality is included.
Tom goes to the steering committee to ask for more budget and a scheduled increase to accommodate the business demands. The steering committee grants the request for an increased budget but tells him that the schedule is immovable. He must find a way to get it done within the current timeline.
Tom replans the project from the ground up. He can just make it all fit by compressing the testing and cutover plan if he adds some new resources from the external partner and asks for some overtime from his existing team. He sends out the new plan to the team along with a rousing email calling on everyone to rise to the challenge.
Over the course of the next month, the project slips again, and Tom’s new plan is looking less and less likely to succeed. It’s not that anything big goes wrong, but lots of little things just aren’t completed on time or require rework because of misunderstandings. In particular, the new consultants he has brought in from the external partner seem to make a lot of basic mistakes.
Tom calls his boss, the Senior Director for IT, to tell him about the situation and ask for help in getting an extension to the schedule. She tells him that the schedule has been committed to the board of directors of the company and that heads will roll if it is not met. This is the time for the team to pull out all the stops and get it done, she says.
Tom goes back to his team to relay the news and once again calls for everyone to give it everything to get things over the line. Unfortunately, most people are already working as hard as their situations allow. In addition, relations with the external partner have soured and they are not willing to put in additional hours without additional funding, which Tom does not have in the budget.
There are some bright spots, however. In particular, two young members of the technical staff, Kayleigh and Negash, prove willing to go above and beyond in order to get things done. Over the final month of delivery, they work 24/7 on the project with Tom cheering them on.
Figure 1.3 – The dangerous feeling one might have when engaging in the Hero anti-pattern
Between the two of them, they manage to clear away enough of the new features and change requests during the final stretch that Tom feels growing confidence that he will be able to meet enough of the requests for the project launch to not be a disaster. There will be exceptions, but he can find a way of managing those later. As long as the impending go-live goes well, the project can still succeed.
However, User Acceptance Testing (UAT) throws a spanner in the works as major quality issues are discovered. The steering committee holds a crisis meeting that ends up concluding that the go-live will have to be postponed for a week. The team will have to work flat out during this period to fix the issues.
While everyone pitches in, the responsibility falls disproportionately on Kayleigh and Negash, who are both starting to show the strain of the continuous effort. Tom gives them encouragement at every chance and singles them out for public praise. He also promises them a cash bonus and extra holidays when the project is done.
The day for retesting arrives and while many issues have been fixed satisfactorily, there are quite a few remaining issues, including a good number that had previously been fixed and are now recurring.
The steering committee holds another crisis meeting and they take the decision to go ahead with the launch despite the issues. These issues will need to be fixed during the hypercare period, but they can be tolerated for a short amount of time.
The next few weeks of Tom’s, Kayleigh’s, and Negash’s lives happen in a blur of constant motion. They are pulled from escalation to escalation as issues occur, are fixed, and reoccur. Kayleigh and Negash start buckling under the pressure, but with no alternative resources knowing the configuration, they are effectively forced to carry on.
Eventually, the issues settle down. The important bugs are fixed, the business puts in place manual workarounds for the things that were missed, life starts to get back to normal. Tom calls the team for a victory celebration, but it is a muted affair.
After taking their extra holidays, Kayleigh and Negash both accept offers from big consulting companies, leaving the company with no one to support large chunks of functionalities on their newly implemented platform.
Problem
The Hero anti-pattern generally purports to fix an urgent delivery problem that has occurred either in a project context, as in our example, or during normal operations. When it occurs in normal operational mode, this is often in a context where firefighting issues with the system have become a run-of-the-mill occurrence.
Usually, the problem occurs in a context characterized by some of the following factors:
- There are limited resources to carry out the work needed to fix the urgent problem and there are good reasons why new resources cannot be brought in at this time.
- The project has a tight schedule that is perceived to be set in stone or the issue is live, critical, and affecting important business users adversely in ways that cause a lot of noise.
- There is knowledge about the problem concentrated in a small number of heads, that is to say, a few people, such as Kayleigh and Negash, who volunteered to take on the role, or frequently a lead developer who is the only one with the technical knowledge to fix the issue at the given time.
- The situation is considered somehow special: either this is a new project and there isn’t a precedent, or the issue is considered so unique that you can’t really plan for it.
- The crisis element is often quite visible in situations that foster the Hero anti-pattern. Sometimes, important parts of the company’s future success or even survival are brought into play.
These factors can all make the problem seem more important to fix in a limited time scale and make the Hero option seem attractive.
Proposed solution
The Hero anti-pattern proposes to solve the problem described in the preceding section by allowing an individual or a small group to take too much responsibility for resolving it effectively by working as much as is required, even at some cost to themselves, to get things done.
This can be attractive both to management and to the people involved for a variety of reasons:
- The effort does tend to produce some results in the short term, giving a sense of momentum and success.
- Everyone, or at least nearly everyone, wants to be a hero and be singled out for praise and rewards. To some people, that is worth the inconvenience of the additional effort.
- It is always possible to imagine that the current situation is somehow unique and not reflective of a deeper problem with process or culture within the organization, thereby justifying what is done as exceptional.
- Even if we acknowledge that there are underlying issues, often these can be put out of mind as something to be dealt with later. Of course, in organizations that rely on the Hero anti-pattern, later never comes.
There are several common variants of the Hero anti-pattern that are worth mentioning:
- Superman, a variant where someone, usually a senior technical person, is glorified and held up as the only person who can fix serious issues with a given system. Often, this myth becomes self-perpetuating.
- Rookies, the variant seen in the example, where junior team members take on extra responsibilities in an effort to step up to the challenge that is being presented to them.
- No Time for Knowledge Transfer, a situation where heroics are required by a seemingly never-ending time crunch that would make it possible for the hero or heroes to transfer required knowledge to others.
While this anti-pattern is clearly seductive, and many of us have fallen prey to it several times over the course of our careers, it almost invariably has negative long-term consequences, which we’ll explore next.
Results
While the Hero anti-pattern tends to give good short-term results, which is a major source of its enduring appeal, there is a long list of negative results that tend to accumulate over time in organizations that rely on this anti-pattern to get things done.
Some of the most common negative results include the following:
- The creation of a single point of failure that increases risks to an organization substantially, should the Hero fall under the proverbial bus, and gives the Hero a lot of leverage in negotiations with the organization.
- The Hero, over time, will start to feel the pressure, as Kayleigh and Negash did in our example, but will have very limited options to change the situation. This situation is highly conducive to burnout, which brings with it all the problems of the first point as well as the risk of the Hero making serious errors due to the strain.
- Heroes don’t scale. That is to say, the organization won’t be able to deploy projects at a bandwidth that is wider than what the Hero can accommodate. This can be seriously limiting to new initiatives in some cases.
- Heroes aren’t replicable. You can’t easily replicate the Hero or their special powers and therefore you have limited options for creating a predictable and repeatable process.
- Heroes can accumulate serious technical debt, which may often go unmanaged, because they must do things quickly, under pressure, and without real supervision. This can lead to major maintenance issues in the long term.
- There is low transparency into the process by which Heroes get things done, leading to a lack of predictability and manageability.
- Heroes don’t have time to explain how things were implemented, so there is often poor or entirely missing documentation.
- The rest of the team may feel disempowered, overlooked, and demotivated as all the attention goes to the Heroes, with little opportunity for others to make contributions in a non-heroic way.
You don’t necessarily see all these negative outcomes in all instances of this anti-pattern, and this list unfortunately isn’t exhaustive either. But hopefully, this is enough to make you think twice about applying this anti-pattern and look at better options, which we’ll explore next.
Better solutions
The fundamental problem with the Hero anti-pattern is that you are relying on specific individuals, with often hidden knowledge, working hard – usually too hard – to get things done rather than on repeatable, transparent, and manageable processes that will allow you to continue to deliver, even as the context and the people involved change.
The primary way to get away from the Hero anti-pattern is therefore to work on your processes and spread skills and knowledge systematically across the team. In our example, there were potential issues with scope management, with the initial discovery work, with governance and its understanding of the real issues on the ground, and with the way the project had been structured to go live with a big bang rather than in small increments.
What specific interventions will provide the most leverage will vary a lot between organizations, but some good places to look include the following:
- Moving towards a DevOps culture with smaller incremental releases that have lower risk profiles
- Having multi-functional teams with frequent direct collaboration and peer review to spread knowledge around
- Encouraging and rewarding leads and specialists more for mentoring and bringing up juniors rather than for putting out the latest fire
- Incorporating better risk management and governance in projects to have the right contingencies in place when things go wrong, as they inevitably will
- Challenging the cultural norms that put primacy on delivering big dramatic wins against the odds, rather than on making steady, undramatic, but repeatable progress on a regular basis
- Emphasize roles and processes, not individuals when planning, building, and operating systems, especially when communicating with the wider stakeholder community
- Make the costs of the Hero anti-pattern visible by capturing the technical debt, the risk, and the missed opportunity to be able to replicate efforts that the organization loses by relying on this pattern
- Ensure that detailed requirements and edge cases are planned for when beginning the project, which reduces the probability that you will need a hero
In truth, it is not always possible to completely avoid the Hero anti-pattern. Sometimes, things are on fire and there is only one person around who can fix it. What we need to recognize is that this is not a good situation, but an anti-pattern that we need to work hard to fix so that it doesn’t recur. The more you do this, the less you’ll have to rely on heroes and the fewer fires you’ll have to put out.
Having covered our first example of a real anti-pattern, we will go on to analyze it a little more deeply to see how we can maximize our learning from it.
Analyzing the example
The Hero anti-pattern is a classic and most seasoned IT professionals will have encountered it during their careers. However, interesting as it is, in this book, we are also looking to pull out the larger patterns, we can learn from our examples to hone our architecture skills.
Throughout this book, we will do this by having a section towards the end of a chapter that extracts key learning points for you to take on your future architecture journey. We do this both with a mind to real-life practice, but we also list learning points specifically for those of you who are on the track towards the CTA exam.
Considering the Hero anti-pattern, a few learning points you might extract for real-life practice are as follows:
- When you are faced with a crisis that calls for extraordinary effort on the part of some or all of the team, take the time to step back and consider the process failures that led to this situation. Capture this for future use.
- Relying on a small number of extremely skilled individuals can be dangerous in the long run, even if it’s useful right now.
- The pressure you might feel towards going above and beyond may reflect a culture that doesn’t have its priorities right from a technical point of view. You may want to challenge that if possible.
- Go out of your way to empower and bring up junior staff to avoid being in the position where you have to be the hero.
- Be diligent about advocating for good governance both at the project and technical levels as well as capturing and remedying the technical debt that accumulates from “special” situations.
Looking at the CTA review board, you can note the following lessons:
- Be careful about suggesting big bang delivery approaches. They can be the right choice, but frequently they can lead to the issues highlighted in the example. Prefer using agile, incremental approaches unless there is a specific reason not to.
- Ensure that you do include the necessary governance functions, including Project Management Office (PMO), steering committee, design authority, and maybe a change advisory board. In this example, much could have been avoided if the latter two had been in place.
- Be explicit about risks and risk management. Include risks upfront and be ready to talk about how to manage and mitigate them.
While we will be able to directly pull out many learning points, we also encourage you to go further with this method and see how much more you can get out of them. Learning from anti-patterns is a continuous and very rewarding activity for aspiring architects.
We have now achieved a foundational understanding of what anti-patterns are and how they can help us achieve greater mastery of Salesforce architecture. It only remains to summarize our progress before we dive into the deep end by looking at anti-patterns in the system architecture domain.