Strategies
Once you have inherited a legacy system, what do you do with it and how do you approach any issues? Most books on software and system design assume it is a Greenfield project where you can build how you please. Legacy systems have constraints on what you can do and how you can do it.
This section suggests some strategies for managing these systems.
Ignore ItWhat is it?
I'm not being facetious, but I've had to include this as a possible strategy as it's the most common! If you choose to ignore your legacy system, then you should be aware that you've still made an active decision – which is to make no changes and not improve knowledge about any issues. You're not fooling anyone if you say it's on a long-term plan either.
Ignoring a legacy system is very tempting because of the issues and problems we discussed in previous sections. A lack of documentation and lost knowledge will make any project or investigation hard to even start and the politics surrounding it might make you unpopular.
Advantages
The only advantages to ignoring a legacy system is that you'll avoid any political battles in the short term and can get on with other work.
Disadvantages
The risks to ignoring the system are huge. We've already seen some of them, such as regulatory changes and zombie technologies. We also must remember that hardware, degrades and old processes might not work as expected any longer. We could argue that the biggest risk is the lost opportunity of not improving processes that could make the organization more efficient.
Investigation
What is it?
Performing an investigation of your legacy system's structural components, runtime attributes and usage – that is, what makes up the system (hardware, software and configuration); how the system runs (inputs, outputs, and so on); and the use cases it is involved with.
Although this is a strategy (as you may discover that much of your system requires no further treatment or can be decommissioned), it should also be viewed as a stage to be performed before the other strategies.
Note
I worked on a project to rationalize a system that was made up of a set of components linked together by some manual processes. One process involved a user who received a file via email, which they then uploaded into an internal web application. This application performed some actions on the file and then dumped it to a central location. The users then loaded it into another system. On investigation, it turned out that all the web application was doing was changing the line endings of the file from Unix to Windows. However, the system this was loaded into would cope with either style of line endings. We had no idea why the intermediate step was first introduced (presumably, it was required at one point and then became part of the 'ritual') but it meant we could drop that part of the process and turn a sub-system off. There were good cost savings as the department was being charged a reasonable sum by centralized IT to host the web application.
You might perform a detailed investigation before deciding upon a strategy or this may have already been decided for you (we'll discuss migrations later, which are often cost- or politically-driven). Even if your strategy is predetermined, the investigation phase is important but is often passed over by keen workers desperate to show progress – this is a mistake.
Advantages
This is a good way to start working with a legacy system and shouldn't incur much political opposition as no decisions on actions are taken yet. There is also a good chance that, after an investigation, you'll change your preferred strategy – often because you've discovered value in the system you weren't aware of.
Disadvantages
You can become stuck in 'analysis paralysis' if the system is complex and opaque, so you should time-box your initial investigation. It can also be difficult to spend the required time performing the investigation properly if other commitments with higher priority repeatedly bump the work (investigations are usually ranked as a low-priority).
Advice on Implementation
An investigation should involve all the planning steps that are detailed in Part II and you should aim to generate a set of artefacts that can be used to facilitate a discussion of the system and future maintenance and upgrades.
When performing the investigation, you should also ask the following:
What is being done?
How is it being done?
Why is it being done?
Is it necessary?
Is there an alternative?
These are leading questions (looking for possible de-commissioning) but you are trying to get a deeper understanding of the system rather than just capturing it. It will also help you understand any value that it contains.
Maintenance
What is it?
All IT systems need maintenance, unless you intend to decommission any unused systems immediately. By maintenance I'm referring to keeping the system in its current state and correcting problems, rather than introducing functional or non-functional changes and improvements. If you intend to introduce improvements, then this would be considered upgrading, migrating, or possibly incremental improvements.
Rather than trying to improve the system, you just want to stop the system from degrading. Real systems degrade with time so monitoring and maintenance procedures are necessary to stop this from occurring. Examples of maintenance might include:
Operating system patches
Essential third-party security patches
Replacing failing hardware such as hard disks with sector errors
Removing unused users
Rebuilding indexes to maintain performance as data is modified
Deleting temporary files to stop a file system filling
You are simply trying to 'sweat the assets' and avoid any catastrophic issues. Hopefully, when the system was built, a set of maintenance actions was documented but this may not be the case; they might have been lost or evolved over time. You may require a large investigation stage, and if maintenance has been neglected, a substantial stabilization task (covered later) before resuming with regular maintenance.
Advantages
There are many benefits to maintaining a system rather than upgrading it. For example, you may work in an environment where systems require large amounts of security investigation or sign-off for functional provability. NASA is rumored to have kept systems from the 1970s running the space shuttle program because the overhead of proving reliability was too high to ever perform major upgrades or outright replacement.
Keeping a system's functionally static may also help to avoid many of the political issues that we listed in the previous section.
Disadvantages
When a system is receiving only maintenance, all the subcomponents will gradually age, as they are not being upgraded. As hardware components approach their expected lifespan, they may fail. Eventually, these hardware components will need to be replaced but replacements can be difficult to find for old systems. Eventually, an upgrade will become unavoidable, but this is best done at leisure rather than in a panic after a failure.
Software ages in a different way and it may no longer be supported, compatible, or known by new staff members.
Many projects in the maintenance phase end up morphing into other projects. It's tempting to gradually add features into a tactical system while a strategic system is being developed, which often means the strategic system never catches up.
Advice on Implementation
Scheduling investigation, stabilization, and maintenance as individual tasks will help with planning and estimation. Telling the business owner that these are separate will help with budgets and timescales (more politics). (By business owner, I'm referring to the individual or group that is the sponsor for the project. They are usually the budget holder and therefore have a lot of influence over the project. This includes organizations in the public sector and not just commercial businesses.)
You should resist allowing a business owner to 'just add a little feature' if you've agreed to maintenance only, as any scope creep will cause issues. Adding features involves another strategy, which is approached differently. I would advise getting stakeholders to agree to this strategy up-front so that, if new features are requested, you can justify extra resources.
Upgrade
What is it?
This strategy involves upgrading each component of a system with the latest, supported version. This may include:
Operating system (for example, Windows 2008->20XX)
Versions of Java/CLR (for example, JDK1.3->JDK1.8)
Database (for example, Oracle X -> Oracle X+1)
Processor architecture (for example, newer/faster/more processors)
Communications infrastructure (for example, 100MB NIC -> 1GB NIC)
Messaging infrastructure
Third-party products such as CRM/BPM systems
The aim is for little functional or non-functional change (although, for some third-party business-process components, this may be unavoidable).
The hardware and software development life cycle make it difficult for vendors to support old versions. Therefore, if your organization is using very old third-party software or hardware, then the original manufacturer may no longer provide support and force you into upgrading. Even if your organization wrote the main applications you are using, there will still be frameworks and infrastructure affected by this.
A strategy that involves new functionality may involve an upgrade step first.
Advantages
Upgrading current components allows users to continue using a product they are familiar with. The product should be easier to maintain as it will be better supported. Components that are upgraded to their latest versions are more likely to comply with regulations and standards, and they are often viewed as 'best practice'.
There are important cost implications for both support (out-of-date products often incur higher support costs) and in comparison, to buying new products (upgrading software should be cheaper than entirely new software and upgrading hardware components should be cheaper than an entire new system).
Disadvantages
Upgrades are rarely as simple or straightforward as they should be – here are a few issues you might face:
Dependencies between components
Performing an upgrade on a single component is often impossible. You may want to upgrade one specific component (perhaps due to known issues, such as security) but there are often dependencies between components that force you to upgrade others. These dependencies are both vertical and horizontal.
Note
Recently, I had to upgrade a piece of third-party software due to the version entering end of life. This software came with a much later version of a database driver, which didn't work well with the old version of the database we were using, so we needed to upgrade it (horizontal dependency). The latest version of the database required a lot more memory and would only work correctly in a 64-bit configuration. This required us to upgrade the operating system on the database machine (vertical dependency) and the combination of the new database and OS meant that we needed new hardware, which we virtualized to manage better.
An upgrade to a single piece of software necessitated upgrades across multiple components right down to the hardware and introduced extra layers.
Issues with data
I mentioned decaying data in the Common Issues section, and bad data creates specific problems for upgrades.
It is common for newer versions of software to assume that the data is of higher quality than was required or guaranteed by earlier versions. The operators of a system will use it in a way not originally envisioned by its creators and will insert data that is not specifically excluded. For example, adding comments to a telephone field because the software didn't enforce numeric values only. When the software vendors tested the upgrade process, they probably used sensible and expected values rather than the values found in the real world.
This unexpected data may cause repeated upgrade failures or, even worse, might create unexpected behavior in the system.
Several versions at once
If the legacy system has been in maintenance only (or ignore) mode for a while, then components of the system may be several versions behind in the upgrade cycle. It may not be possible to directly upgrade software components from your production version to the latest and you may have to move to an intermediate version (assuming you can even get hold of one). This can introduce a large amount of testing. See later sections about using virtualization to help with upgrading, as you'll want to take snapshots at these intermediate points so you can roll back to positions other than your starting point.
Sometimes, if the version you are upgrading is old enough, then you are really performing a migration and you might want to skip the intermediate versions. In this case, you would perform a fresh install and then migrate and import the data.
When you're asked to upgrade a component, you should be very careful to consider the true time required. Upgrading an item in a legacy system may be time-consuming and involve work in unforeseen areas. It can be very different to upgrading part of a non-legacy system. Please also remember that, after an upgrade, you'll still need to maintain the system – don't upgrade and forget.
Advice on Implementation
I've already listed some specific issues with upgrading and some advice for dealing with them. More generally, I would always suggest following the advice given in later sections of Chapter 3, Safely Making Changes. Just to reiterate, upgrades are always more complex and difficult than they first appear, and they need to be approached in a structured way.
Migration
What is it?
This is like upgrading, except the components are moved to a new technology or provider rather than later versions of the current technology. This can be applied to any part of the system across both the hardware and software stack. This may include:
Operating systems (for example, Solaris -> Linux)
Vendors of Java/CLR (for example, Sun JDK -> JRocket JDK)
Database vendors (for example, Oracle -> Sybase)
Processor architecture (for example, Intel -> Sparc)
Communications infrastructure (for example, Cisco -> Juniper)
Third-party products such as CRM/BPM systems (for example, Microsoft CRM -> Salesforce) and so forth.
Advantages
There are many reasons why an organization might choose to migrate rather than upgrade (or just maintain). For example, another vendor or product now has a better technical solution, or the original vendor may have ceased trading entirely. With legacy systems, it is very common for better technologies to have emerged and technology teams will want to take advantage of this.
There could be significant cost savings to be found by moving to a different product or consolidating to a single product across an organization. A good example can be found in databases where a site-wide license is purchased. There will be huge pressure to use this single product across the whole organization and stop paying maintenance fees for others.
Disadvantages
Migrations may be driven by technical or business requirements. It can be very frustrating for technical teams if the reason for migrating is cost, and equally annoying for the business owner if migrations are driven by technical desires. These kinds of conflicts are political risks and are dangerous to a project.
Many of the same issues I mentioned with upgrades are relevant to migrations but are often more severe. Problems with data aren't just due to bad data but also incompatible formats and missing, required data. You often need to enrich the data (make stuff up) to get a migration to work.
Like an upgrade, a migration may also necessitate modifications to components around a software/hardware stack.
Both upgrades and migrations will require operational changes in processes and in actions performed by the end user. The training overhead is likely to be greater for migrations than upgrades.
Advice on Implementation
There can be other drivers for the selection of technologies or requirements for migrations, such as finding suitable skills in the job market or support contract simplification. It is important for you to find out exactly why decisions are made.
The effort involved in migrating data between systems should not be underestimated. Although modifying formats can be challenging, it is missing data that will cause the most issues. You really need to make sure that these items are identified early and that you get agreement from the other stakeholders as to the resolution. It may not be possible to fill in missing data and you should make sure that you have a record of exactly what is 'made up' in order to get the system running. For example, if a new system insists that a phone number is included (and you don't have access to these) you should make sure that any 'fake' number you insert is obvious.
Lastly, please don't forget to modify any documentation to bring it into line with your upgraded/migrated system. It's very frustrating to think you have some relevant documentation only to find out that it refers to a much older version. It makes the user unsure as to whether it can be trusted or is useful.
Incremental Improvements
What is it?
In this strategy, you keep the basic infrastructure and system architecture the same (probably with some upgrades or migrations first) and then either add new components or add functionality to current components. Functional additions may be driven by internal requirements to improve or expand the product or external factors such as regulation. The changes could also be driven by non-functional requirements such as coping with an increased number of users (there is a debate about what separates a functional and non-functional requirement, which we won't go into here!).
legacy systems are frequently systems that have had several incremental improvements made over their lifetime. It can be amazing to try to track down the earliest unchanged source code file, the earliest date in a comment, or the most ancient piece of hardware in a system that's been changed beyond all recognition (I was recently told that a certain modern, 3D, popular football game still had all the 2D sprites in it from a version 10 years earlier that no one had ever gotten around to removing).
Advantages
Incremental improvement allows you to give end users specific functionality on a per-requirement basis. They can see a real, defined benefit rather than lots of background improvements that make little difference to their jobs. Hopefully, you can deliver this in regular, small deployments.
Disadvantages
There is often huge pressure from business owners to "just stick it in" and get the required functionality as soon as possible (are these deadlines for an external reason or just made up?) but this leads to the dreaded technical debt. It's important to refactor but NOT JUST THE CODE BASE. If the usage changes considerably, then you might also need to change the way that software components are hosted, run, and communicate. You need to apply refactoring techniques to the frameworks and infrastructure right down to the hardware. If you are starting with hardware changes, then this might also work up through the stack as well.
Advice on Implementation
I would strongly advise you to perform any upgrades required to bring the components to their latest versions before adding any functionality. This may be opposed by the business owner if incremental improvement is viewed as a cheap option ("just stick it in"). This may be particularly true if this is driven by external factors, such as regulation, where the organization won't see any tangible benefit.
If you are writing code to add features or fix old bugs, I suggest first creating a new baseline for the code. This involves getting all the files and simply formatting them and organizing them in your preferred project structure. The new baseline should not have any functional modifications and the files should be checked into source control and labelled.
This means that any functional modifications you make from this point will show as clean and simple diffs. Without a new/clean baseline, you will find that any diff you run will include formatting and structural changes (such as file moves). You should also perform this baseline formatting on configuration files such as XML. It is amazing how inconsistent the formatting of files can become over a period, especially with many people working on them, but also how formatting fashions change.
Replacement
What is it?
This is a complete re-write with no reuse beyond business knowledge. It is likely that some of the original system will be reused and, certainly, the data will be migrated. However, the intention is to replace as much as possible with a top-down approach that is, not refactoring and rewriting the system from the code upward but re-implementing from requirements down.
Advantages
This is often the preferred option for the technology team, as this gives maximum scope to use new technologies and techniques. It also avoids having to learn the idiosyncrasies of the legacy system and understand supplanted technologies. It also allows them to use familiar languages, tools, and equipment.
Disadvantages
However, you must ask yourself if you really understand everything the legacy system does or can do. We should remember the problems of lost, hidden, and implicit knowledge and understand that a replacement is very tempting but often incredibly hard to achieve. Do you really understand the business requirements or is this driven by the technology team's desires? All the issues listed for data in the Migration section are also true for a replacement.
The legacy system will have to be maintained while a replacement is being developed and new features might even have to be added due to external drivers. I have seen legacy systems that have had so much incremental improvement while a replacement was being developed, that the replacement never went live – it was constantly chasing a moving target.
Other costs to consider are those for the complete retraining of users and operations staff.
Advice on Implementation
The biggest and most common mistake people make when replacing legacy systems is to not understand all the functionality they are trying to replace. This is often because they assume the legacy system has little or no value, but this is a mistake – the system is legacy because it has value and you need to understand what this is.
You should pay attention to the data in the old system, as you will almost certainly need to import this into your new system. You don't want to get to the release stage of your replacement and realize that there is a large dataset that can't be imported or is not dealt with.
You should consider trying to run a replacement system and its legacy system in parallel configuration rather than a big-bang release. I'll go into this in more detail in the next section.
A Special Note on Decommissioning
Decommissioning is the process of shutting down a legacy system. This is often combined with a 'replacement' project/strategy, as once the new system is released, then the legacy system should be turned off. This sounds simple but often isn't.
Issues
You will not want to run multiple, overlapping systems of different ages and technologies but that is a very common outcome. Often, a replacement system will not cover all the functionality of the legacy system it supposedly replaces, and the legacy system will be left running to perform a small subset of this original functionality. This means the organization has all the maintenance issues and costs of the legacy system, as well as those of the new system (a friend recently commented that he has never known a system to be fully decommissioned and his organization was filled with almost dead, zombie systems). This includes multiple teams with overlapping responsibilities, and this invariably leads to complex politics.
Advice on Implementation
You need a specific plan for decommissioning – do not just assume you can turn it off. I would suggest, at a minimum, the following steps:
Firstly, you must make sure all the stakeholders are committed to the decommissioning – there may be many hidden agendas. Please refer to the Stakeholders section.
Secondly, you need to make sure you understand all the external connections and dependencies. You are interested in external systems that are tightly coupled with your legacy system. These may need special treatment for compatibility. Please refer to the Architectural Sketches and Further Analysis sections.
Lastly, you need to decide on your actual strategy for moving from legacy to replacement systems. This may include:
The Big Bang
This involves developing and then deploying an entirely new system and turning off the old one at the same time. This has the advantage of making sure the legacy system is deactivated. However, if your analysis missed some important functionality or dependency, you might find yourself having to turn the entire system back on again. This might get repeated for multiple features until you run out of development budget – at which point, you are left with both systems forever.
You also must make sure that all users are completely trained for the new system on day one. This can be difficult and if you are forced into a rollback, there could be chaos.
The Parallel Run
This is similar to the Big Bang, in that you have developed (or possibly bought in) a complete replacement system, but you deliberately run both systems for a period of time and gradually move users and functionality from one system to the other in small increments. This has advantages in terms of user training and impact and means that a problematic feature can be individually dealt with rather than a large and embarrassing rollback.
However, it's possible (due to time or budget constraints) to not fully move the system over or miss important features. Either will force you to leave the entire legacy system running, even though only a small subsection is now being used.
Agile Component-Based Replacement
Rather than creating and deploying an entire system, the individual components within it are created, deployed and decommissioned on a one-by-one basis. This reduces any single impact and means that any unknown functionality is handled in the same way as any other known functionality. This can be unpopular with project managers, who demand a fixed timescale and cost estimate, but is more likely to deliver the functionality that users want.
It can work out much cheaper than a Big Bang release, as only the functionality that is used is replicated. You simply need to select components/functionality, replicate it in your new framework, and repeat until there is nothing left. If there is a large amount of unused functionality, you will save yourself from unnecessary re-implementation. You need to be careful of hidden or lost knowledge and track the project carefully to make sure that the legacy system is deactivated at the end. The problem here is defining what the end point is.
Conclusion
You might (and probably should) mix and match the suggested strategies somewhat. It is also possible to treat each component in a different way, that is, maintain some, upgrade those that need it, and replace ones that need completely new behavior. You need to understand the business' motivation and set expectations accordingly for timescales and cost. As developers, we usually favor re-writing codes –users often want little to no impact (they have jobs to get on with) and the organization's management care about cost (and, very often, the focus is on short-to-medium term costs). Choosing a strategy is difficult and involves many trade-offs.