1
Software Risk Management
Dillon Hasselmann
Department of Software Engineering
University of Wisconsin – Platteville
Platteville, WI 53818
Abstract
Risk management is vital in ensuring project survivability. It is inevitable that issues will arise during development and these issues must be dealt with. Identifying risks is a systematic and organized approach to determining actual risks to a project. There are many factors to look at and multiple methods that can be used to uncover risks in a project. Once the risks are identified, they have to be analyzed to figure out what caused them and also what can be done in the case they become real problems. The last component of risk management is actually dealing with the problems when they arise. There are a multitude of different techniques to manage risk and every project requires its own unique approach. Project leaders who adapt these ideas will be far more successful than those who do not pay attention to the risks.
Defining Risk
In order to understand how to manage risk, one must first understand what a risk is. The IEEE defines risk as “the likelihood of an event, hazard, threat, or situation occurring and its undesirable consequences”[2]. Risks can be further broken down by looking at their properties. They can be thought of as having three characteristics: uncertainty, associated loss, and being manageable. A risk has uncertainty because no one knows the true probability of the risk occurring nor if it is actually going to be become a major problem. The associated loss of a risk is the estimated impact or damage the risk will have on the project. A risk is manageable because human action and intervention can have an influence on both the uncertainty (reduce probability) and the associated loss of a risk, either positively or negatively.
Defining Risk Management
The main goal of risk management is to identify and respond to potential problems with enough time to avoid a crisis [2]. This can be accomplished using a simple four step process. The first step is to identify the risk factors. There are a number of ways this can be done and will be addressed in the following sections. The second step is to analyze factors. This is done by estimating the uncertainty and associated loss (two of the three risk characteristics) and it aids in prioritizing which risks should be dealt with in which order. The third step is to develop treatment options to deal with the risks should they arise as problems. And the last step is the most important, yet requires the most effort, which is to monitor risks on an ongoing basis. As a project grows and becomes more complex, new unforeseen risks may surface and others may fade away. Ongoing analysis can help to address new risks as well as ignore risks which are no longer a threat to the project.
Issues
Risk management can be an asset to any project, but there are reasons that not every project uses this process. First of all, the tasks involved in risk management are complex. There is mathematics at play and not everyone can reason in terms of probabilities and put a numeric value on damages and loss. Also, it takes effort to identify risks and come up with possible solutions, especially for very large projects, but the bulk of the effort comes with the ongoing monitoring. One cannot simply do all the work upfront and then forgo risk management for the rest of the project because as the project evolves, more possible risks come up and/or go away and these need to be taken into account for the process to be effective. Another issue is that people have limited insight, meaning they cannot predict the future. “Crystal balls are rarely found on sale” [2]. After all, if someone knew what issues would pop up, there would be no use for risk management. A third issue is that a number of projects operate in survival mode, which means that their only goal is to have a completed project despite costs and other obstacles [2]. Typically morale reaches rock bottom and there are a ton of problems that developers are trying to address. Thus the staff does not really want to turn their attention and stretched resources to problems which have not occurred yet nor may not even happen at all. The last issue which affects the difficulty of risk management is the culture of an organization, or how the company promotes the handling of risk. Many companies focus on the optimization of getting projects done, such as streamlining the cost and budget of development, rather than attempting to avoid problems [2]. This is ironic because avoiding problems usually ends up saving both money and time. The reason this happens is because some companies don’t want to even think about bad things happening and go into denial about a project having major risks [2]. This could be because the developers do not want to upset the executive board or even management is oblivious to the technical difficulties of a problem and assume that their crack team of programmers can get anything done. This can lead to the company seeing anyone that comes up with the idea that something can or will go wrong as a trouble maker. They go after the bearer of bad news and this discourages people from coming up with ways to handle risk and thus risk management becomes nearly impossible in that particular company.
Risk Identification
Risk management can be thought of as a series of 3 processes, the first process being risk identification. Risk identification is an organized approach to determining what poses a legitimate threat to a project. It is not a brainstorming session where everyone in the room throws out random ideas, such as “what if the sun burns out?” It should be taken seriously if risk management is going to be at all useful. Once risks become problems, they could ultimately be the downfall of any project if they are not properly dealt with and mitigated. It is also worth mentioning that no one should be afraid to speak up and address risks which they may think are too dire or unthinkable to happen, such as the death of an important developer.
Paradigms
Risk identification can be done in a multitude of ways. One simple method is to think in terms of two paradigms: direct and indirect [2]. With the direct paradigm, it is useful to think in terms of causes- defects, bad developers, terrible coding standards- and work up to areas of impact- quality, cost overruns, time delay. In this way, it is easy to see that, for example, a bug left undiscovered until deployment, can lead to a costly late night fix. On the other hand, it can be useful to think in terms of effects and work backwards to the causes. This is the indirect paradigm. Both are useful and are a matter of personal taste. It may even be helpful to apply both to a project. While these are great starting blocks for finding risks, there are many other ways to look for risks in a project.
Sources
There are many sources of risks and it is beneficial to know where to look for them. First of all, one can look at traditional or folk knowledge [4]. These are stereotypes that many people have when looking at what types of projects are at hand and the amount of work that will be involved. For example, a common stereotype that many developers have is that embedded systems development is far more difficult than other types of development because assembly is involved and the hardware is very different than traditional desktop hardware. Note that many stereotypes could be true and thus can be a valuable source of information about the types of risks involved on a project. Another source of information can be other developers [4]. There are many different types of projects, but there is enough overlap where risks that are present in one project are going to be found on another. By looking at what others have done and the problems they faced, one can learn more about the types of problems they can expect to find as well. A third last source of information is common sense [4]. This is where a developer looks at their past experiences and identifies issues and problems that came up in previous projects that they believe could also show up on their current project. One last source of information is the results of tests [4]. This involves obtaining a similar product to the one that is being developed and toying around with it to find out more about it. Just by looking at how the product works can help identify risks and it is even more helpful if one finds defects in the obtained product.
Common Risk Factors
There are many breeding grounds for risks within the project environment. One of these is the organization in which development is occurring in, or in other words, the company the engineers work for [2]. Factors such as the structure of the organization, the maturity (how long the company has been in business) and the leadership can add or subtract risks to a project. Another risk factor is estimation, or more accurately, lack thereof [2]. Inaccurate estimation can lead to inadequate time and resources allocated to a project, which can doom it before it even starts development. An additional risk factor is monitoring, which typically involves finding risks [2]. Not finding risks is a risk factor itself. Development methodology is another risk factor which is present on nearly all projects [2]. Not all projects are suited for every development process. What works using the waterfall method may not work with the agile process and vice versa. Also, what the team is comfortable with should be taken into account as well. The tools used to do the development are another major factor for risk [2]. This includes the workstations as well as any IDEs used to compile code. The reliability of the software is another source of risk [2]. The software needs to be able to run in the environment that it is designed for as well as run for long periods of time in some cases. A lot of development ignores this factor and find out their product breaks very easily soon after deployment. One last risk factor is the personnel on the development team [2]. Such issues include how well the team members work together and their qualifications. One important aspect to note is that a strong leader can help to mitigate risk in this particular area.
Types of Risks
There are five common types of risks. For each type, there are specific methods that are best suited to identify each one. The first of these are schedule risks. These are risks which typically involve long delays or in some cases cause the project to never be completed at all. One of the best methods to finding these is to make an activity network [2]. This is done by making a network of tasks, where each one is connected sequentially to the task that can be done after a certain task is completed. For example, work on a database could not be done until the connection protocol is completed. This makes it easy to spot risks because tasks which need to be done before several more can be started are a possible source of delay, as well as a task which needs to wait for two or more tasks to be completed before work can be started on it. Another common risk which is the bane of many projects is cost risks [2]. These typically occur when estimates are poor, requirements start to pile up, or when there are unreasonable budgets (companies out bidding each other for example). Requirements risks are another major risk category. These are very important because building the right system is crucial, or else development is pointless [2]. Requirements risks show up when there are incorrect requirements, inconsistent requirements, infeasible or very difficult requirements, unverifiable requirements, and unclear requirements. Quality risks are another important type of risk to assess. They are very common and straight forward (fairly easy to detect), yet they plague many projects [2]. These risks include unreliable software, meaning it breaks constantly; unusable software, which means that it is hard for the end user to figure out how to use the product; unmaintainable software, meaning that it is difficult to fix defects in the product, typically due to bad coding practices; non-portable software, meaning that it can only be used in one or two environments; and lastly non-expandable software, which means that it is difficult to add features to the product. The last risk type is operational risks [2]. These happen when the deployment environment is different from the testing/development environment and it affects the behavior of the product. The team assumes that if the product works in testing, it should work anywhere, but if this turns out to be false, then the whole project falls flat.
Risk Analysis: Risk Exposure
This is the second major element of risk management. This step is characterized by looking closely at the risks identified from the first step and determining what caused them and what can be done to mitigate them should they become problematic [2]. Risk identification and risk analysis can be done concurrently if one desired, which means that as a risk is identified, it is analyzed right away, rather than finding all the risks upfront and then moving on to the analysis phase. One technique that can be used is to compute risk exposure. This is a mathematical approach to risk analysis. Risk exposure is computed by taking the product of the estimated probability of occurrence and the estimated cost [1]. This is by no means an exact way of looking at risks, but it can help with prioritization. As a rule of thumb, it is best to not spend the risk exposure amount to prevent a specific risk or else more money is spent on prevention than the actual cost of the damage of the problem. Generally speaking, it is best to first focus on risks which have both high probability and high cost (obviously), then next look at risks with high probability but low cost, and then afterwards the high cost and low probability risks. Those with both low probability and low costs generally should be worried about last.
Nonmathematical Approaches
There are other less mathematical methods to analyzing risks. The first of these is the creation of a risk list [2]. This is simply a list of all the risks that were found from the identification phase. It is easy and anyone could do it, but it offers no solutions. At the very least, it shows that developers are aware of the risks, but they do not know how they will be handled. A better alternative is the risk action list [2]. This is a risk list, but with potential solutions to mitigate the problem. The most ideal way to carry out risk analysis is by means of a risk registry or a watch list. An example of this can be found below in Table 1. This consists of a number of elements; the first of which is the trigger event. This describes what will happen when the risk starts to become a problem. The next element is the actions taken to stop the problem from growing. The last element is the people responsible for carrying out the action. Accountability is an excellent method to making sure that something gets done.
Table 1: An example of a risk registry or watch list.
Trigger Event / Action Taken / Responsible Person(s)Code does not compile / Conduct code reviews to find the errors / Bob
Engineers unfamiliar with implementation language / Training by a certain date / Team leader
Budget overrun / Talk with executive board to receive additional funding / Jack, Jill
Risk Treatment
The final step of risk management is risk treatment. This step is characterized by actively carrying out solutions to problems [2]. This is different from risk analysis because analysis is more about coming up with plans to deal with problems and treatment is actively applying the solution to the problem. There are five major techniques to carrying out risk treatment. Often times one is enough to solve a problem, but there are problems which may not suit a particular treatment and as such another should be used. The first of these techniques is risk avoidance. This can be done by selecting another lower risk requirement over one with higher risk [2]. This is a great solution because it is easy to do and in some cases, eliminate a problem entirely. However, it involves a modification to the requirements and great care must be exercised to make sure that it still satisfies the customer’s needs. Another treatment technique is risk acceptance. This can be done by doing nothing, being aware of the problem yet choosing to accept its potential consequences [2]. The reason this may be the only treatment option is because the project may have to be done a certain way or satisfy a specific, hard to implement requirement. A possible drawback to this approach can occur if the consequences of the problem are far worse than anticipated. Problem control and prevention is a more elegant and practical approach to risk treatment. This is done by being aware of potential problems and taking measures to make sure they do not bring the entire project down [2]. One downside to this technique is that it requires a level of effort and commitment to the control and prevention plan. It needs constant monitoring to make sure that the risk is being prevented properly. Risk transfer is an additional and useful treatment option. This is done by transferring the responsibilities of one party to another party [2]. The idea is that a task is being handled by the right people, those who know what they are doing or are experts. An example of this would be outsourcing, a very common business practice today. The last treatment technique is refinement of knowledge. This is an ongoing activity whose sole purpose is to reduce uncertainty [2]. The idea behind this practice is that it reduces risk, or at the very least prepares oneself for dealing with problems if they arise. Some examples include prototyping, modeling, benchmarking and studying. Table 2 below contains an example with two risks and an example of each treatment option for both.