Managing risk
In my experience, most unsuccessful projects fail because they don’t properly deal with project risk. Project risk refers to the potential for change that a team will fail to meet some or all of a project’s objectives. Risk is defined to be the product of an event’s likelihood of occurrence times its severity. Risk is always about the unknown. There are many different kinds of project risk. For example:
- Resource risk
- Technical risk
- Schedule risk
- Business risk
Risks are always about the unknown and risk mitigation activities – known as spikes in agile literature – are work undertaken to uncover information to reduce risk. For example, a technical risk might be that the selected bus architecture might not have sufficient bandwidth to meet the system performance requirements. A spike to address the risk might measure the bus under stress similar to what is expected for the product. Another technical risk might be the introduction of new development technology, such as SysML, to a project. A resulting spike might be to bring in an outsider trainer and mentor for the project.
The most important thing you want to avoid is ignoring risk. It is common, for example, for projects to have “aggressive schedules” (that is to say, “unachievable”) and for project leaders and members to ignore obvious signs of impending doom. It is far better to address the schedule risk by identifying and addressing likely causes of schedule slippage and replan the schedule.
Purpose
The purpose of the Managing risk recipe is to improve the likelihood of project success.
Inputs and proconditions
Project risk management begins early and should be an ongoing activity throughout the project. Initially, a project vision, preliminary plan, or roadmap serves as the starting point for risk management.
Outputs and postconditions
Intermediate outputs include a risk management plan (sometimes called a risk list) and the work effort resulting from it, allocated into the release and iteration plans. The risk management plan provides not only the name of the risk but also important information about it. Longer-term results include a (more) successful project outcome than one that did not include risk management.
How to do it
Figure 1.16 shows how risks are identified, put into the risk management plan, and result in spikes. Figure 1.17 shows how, as spikes are performed in the iterations, the risk management plan is updated:
Figure 1.16: Managing risk
Figure 1.17: Reducing risk
Identify a potential source of risk
This is how it starts, but risk identification shouldn’t just be done at the outset of the project. At least once per iteration, typically during the project retrospective activity, the team should look for new risks that have arisen as the project has progressed. Thus, the workflow in Figure 1.16 isn’t performed just once but many times during the execution of the project. In addition, it sometimes happens that risks disappear if their underlying causes are removed, so you might end up removing risk items, or at least marking them as avoided, during these risk reassessments.
Characterize risk
The name of the risk isn’t enough. We certainly need a description of how the risk might manifest and what it means. We also need to know how likely the negative outcome is to manifest (likelihood) and how bad it is should that occur (severity). Some outcomes have a minor impact, while others may be show-stoppers.
Add to risk list in priority order
The risk management plan maintains the list in order sorted by risk magnitude. If you have quantified both the risk’s likelihood and severity, then risk magnitude is the product of those two values. The idea is that the higher-priority risks should have more attention and be addressed earlier than the lower-priority risks.
Identify a spike to address risk
A spike is work that is done to reduce either the likelihood or the severity of the risk outcome, generally the former. We can address knowledge gaps with training; we can address bus performance problems with a faster bus; we can solve schedule risks with featurecide. Featurecide is the removal of features of low or questionable stakeholder value, or work items that you just don’t have the bandwidth to address. Whatever the approach, a spike seeks to reduce risk, so it is important that the spike uncovers or addresses the risk’s underlying cause.
Create a work item for a spike
Work items come in many flavors. Usually, we think of use cases or user stories (functionality) as work items. But work items can refer to any work activity, as we discussed in the earlier recipe for backlog management. Specifically, in this case, spikes are important work items to be put into the product backlog.
Allocate a spike work item to an iteration plan
As previously discussed, work items must be allocated to iterations to result in a release plan.
Perform a spike
This action means performing the identified experiment or activity. If the activity is to get training, then complete the training session. If it is to perform a lab-based throughput test, then do that.
Assess the outcome
Following the spike, it is important to assess the outcome. Was the risk reduced? Is a change in the plan, approach, or technology warranted?
Update the risk management plan
The risk management plan must be updated with the outcome of the spike.
Replan
If appropriate, adjust the plan in accordance with the outcome of the spike. For example, if a proposed technology cannot meet the project needs, then a new technology or approach must be selected and the plan must be updated to reflect that.
Example
Here is an example risk management plan, captured as a spreadsheet of information. Rather than show the increasing level of detail in the table step by step, we’ll just show the end state (Table 1.13) to illustrate a typical outcome from the workflow shown in Figure 1.16.
It can be sorted by the State and Risk Magnitude columns to simplify its use:
Risk Management Plan (Risk List) |
|||||||||||||
Risk ID |
Headline |
Description |
Type |
Impact |
Probability |
Risk magnitude |
State |
Precision |
Raised on |
Iteration # |
Impacted stakeholder |
Owner |
Mitigation strategy (spike) |
1 |
Robustness of the main motor |
The system must be able to maintain 2,000 W for up to 5 minutes and sustain 1,000 W for 4 hours, with an MTBF of 20,000 hours. The current motor is unsuitable. |
Technical |
80% |
90% |
72% |
Open |
High |
1/5/2020 |
1 |
Maintainer, user |
Sam |
Meet with motor vendors to see if 1) they have an existing motor that meets our needs, or 2) they can design a motor within budget to meet the need. |
2 |
Agile MBSE impact |
The team is using both agile and MBSE for the first time. The concern is that this may lead to poor technical choices. |
Technical |
80% |
80% |
64% |
Open |
Medium |
1/4/2020 |
0 |
User, buyer, product owner |
Jill |
Bring in a consultant from aPriori Systems for training and mentoring |
3 |
Robustness of USB connection |
Users will be inserting and removing the USB while under movement stress, so it is likely to break. |
Technical |
40% |
80% |
32% |
Open |
Medium |
2/16/2020 |
3 |
User, manufacturing |
Joe |
Standard USB connectors are too weak. We need to mock up a more robust physical design. |
4 |
Aggressive schedule |
Customer schedule is optimistic. We need to address this either by changing the expectations or figuring out how to satisfy the schedule. |
Schedule |
40% |
100% |
40% |
Mitigated |
Low |
12/5/2019 |
0 |
Buyer |
Susan |
Iteration 0, work with the customer to see if the project can be delivered in phases, or if ambitious features can be cut. |
5 |
Motor response lag time |
To simulate short high-intensity efforts, the change in resistance must be fast enough to simulate the riding experience. |
Technical |
20% |
20% |
4% |
Open |
High |
12/19/2019 |
6 |
User |
Sam |
Do a response time study with professional riders to evaluate the acceptability of the current solution. |
6 |
Team availability |
Key team members have yet to come off the Aerobike project and are delayed by an estimated 6 months. |
Resource |
60% |
75% |
45% |
Obsolete |
Low |
3/1/2020 |
0 |
Product owner, buyer |
|
See if the existing project can be sped up. If not, work on a contingency plan to either hire more or delay the project start. |
Table 1.3: Example risk list
For an example of the risk mitigation workflow in Figure 1.17, let’s consider the first two risks in Table 1.3.
Perform a spike
For Risk 2, “Agile MBSE impact,” the identified spike is “Bring in a consultant from A Priori Systems for training and mentoring.” We hire a consultant from A Priori Systems. They then train the team on agile MBSE, gives them each a copy of their book Agile Systems Engineering, and mentors the team through the first three iterations. This spike is initiated in Iteration 0, and the mentoring lasts through Iteration 3.
For Risk 1, “Robustness of the main motor,” the identified spike is “Meet with motor vendors to see if 1) they have an existing motor that meets our needs, or 2) they can design a motor within our budget to meet the need.” Working with our team, the application engineer from the vendor assesses the horsepower, torque, and reliability needs and then finds a version of the motor that is available within our cost envelope. The problem is resolved.
Assess outcome
The assessment of the outcome of the spike for Risk 2 is evaluated in four steps. First, the engineers attending the agile MBSE workshop provide an evaluation of the effectiveness of the workshop. While not giving universally high marks, the team was very satisfied overall with their understanding of the approach and how to perform the work. The iteration retrospective for the next three iterations look at expected versus actual outcomes and find that the team is performing well. The assessment of the risk is that it has been successfully mitigated.
For Risk 1, the assessment of the outcome is done by the lead electronics engineer. He obtains five instances of the suggested motor variant and stress-tests them in the lab. He is satisfied that the risk has been successfully mitigated and that the engineering can proceed.
Update the risk management plan
The risk management plan is updated to reflect the outcomes as they occur. In this example, Table 1.4, we can see the updated State field in which the two risk states are updated to Mitigated:
Risk Management Plan (Risk List) |
|||||||||||||
Risk ID |
Headline |
Description |
Type |
Impact |
Probability |
Risk Magnitude |
State |
Precision |
Raised On |
Iteration # |
Impacted Stakeholder |
Owner |
Mitigation Strategy (Spike) |
1 |
Robustness of the main motor |
The system must be able to maintain 2,000 W for up to 5 minutes and sustain 1,000 W for 4 hours, with an MTBF of 20,000 hours. The current motor is unsuitable. |
Technical |
80% |
90% |
72% |
Mitigated and updated motor selection to the appropriate variant |
High |
1/5/2020 |
1 |
Maintainer, user |
Sam |
Meet with the motor vendors to see if 1) they have an existing motor that meets our needs, or 2) they can design a motor without our OEM costing to meet the need. |
2 |
Agile MBSE impact |
The team is using both agile and MBSE for the first time. The concern is that this may lead to back technical choices. |
Technical |
80% |
80% |
64% |
Mitigated, updated modeling tool for Rhapsody, and MBSE workflows updated. |
Medium |
1/4/2020 |
0 |
User, buyer, product owner |
Jill |
Bring in a consultant from A Priori Systems for training and mentoring. |
Table 1.4: Updated risk plan (Partial)
Replan
In this example, the risks are successfully mitigated and the changes are noted in the State field. For Risk 1, a more appropriate motor is selected with help from the motor vendor. For Risk 2, the tooling was updated to better reflect the modeling needs of the project, and minor tweaks were made to the detailed MBSE workflows.