Canadian utility HydroQuébec offers a new approach for identifying the factors that most contribute to overall risk, and using the results to prioritize dam safety work.
By Marc Smith, Claude Marche, and Benoît Robert
It is impossible to predict risk with certainty because two of the main characteristics of risk are complexity and uncertainty.^{1} Even if a dam is judged to be safe, residual risk exists when people or property are in the flood zone. Dam failures, although not common, have serious consequences to people, property, and the environment. Therefore, establishing measures for reducing risk is an important responsibility of a dam owner.
Risk analysis involves breaking a complex system down into its fundamental components, then determining potential failure mechanisms and the physical processes that could cause each mechanism. Risk analysis methodologies commonly applied to dams include use of event trees and fault trees. An event tree is a visual representation of all the events that can occur in a system; a fault tree is a graphical tool that shows the relations between the various elements of a system to compute the reliability of that system. Use of either method allows the detailed analysis of potential failure mechanisms and provides qualitative insight as to how a series of events leading to dam failure might unfold. They also can be used quantitatively, with the help of probabilities, to assess the reliability of the system.
However, these methodologies present shortcomings related to representing dam risk in absolute terms. First, it is difficult to determine the safety of a dam according to rigid standards requiring a strictly binary (“safe” or “unsafe”) outcome based on an absolute value of risk.^{2} Second, use of these trees does not allow the determination of the most critical component of the series of events that could potentially lead to dam failure. Third, it is difficult to model, using these methods, the potential positive effects of rehabilitation measures on risk. Fourth, these methods do not take into account interactions between the various failure mechanisms. Although assumed to be independent, such mechanisms typically are strongly interrelated. For example, one initiating factor can trigger several failure mechanisms, and the occurrence of one mechanism can promote other mechanisms.
The interactions between failure mechanisms must be considered to accurately assess the overall risk of the dam system. Also, a more global approach is required to identify the factors contributing most to overall risk and to justify rehabilitation measures according to their potential to reduce risk.
One way to overcome some weaknesses in existing tools and capitalize on their merits is to use a Bayesian network. A Bayesian network is a graphical and mathematical tool showing the causeandeffect relationships between the components of a system. To illustrate the use of Bayesian networks for dam safety analysis, we tested their application for analysis of the geotechnical and hydrological risks related to an embankment dam in Asia. We also used these networks to determine the risk related to reliability of the electrical/ mechanical components of the spillway at this dam. The results allowed selection of the rehabilitation method that presented the greatest improvement to dam safety with the lowest cost.
Understanding interactions between failure mechanisms
To demonstrate the interactions between failure mechanisms, consider an example where risk is represented by the probability of failure of an embankment dam. The example will be analyzed using event trees, then Bayesian networks.
Event tree
A strong earthquake can cause differential settlement and cracking in an embankment dam, possibly leading to internal erosion and catastrophic failure.^{3} Such an earthquake also can trigger slides, followed by a loss of freeboard and failure due to overtopping. The event trees in Figure 1 on page 42 show these two failure mechanisms. The probability of failure for either mechanism is the product of the conditional probabilities of all the individual events associated with that mechanism. The total probability of failure for this dam is equal to the sum of the probability of failure of the two mechanisms.
However, analysis of the overall risk to the dam must take into account the interactions between these failure mechanisms. For example, settlement can increase the risk of overtopping, and cracks resulting from settlement can lower the stability of the dam. In addition, a higher reservoir level contributes to the overtopping risk and influences both the potential of sliding and internal erosion by increasing pore pressures in the embankment. The reservoir level is controlled by the inflow and spillway operation. A strong earthquake also can affect the reliability of the spillway.
Therefore, a single initiator (an earthquake) can trigger different failure mechanisms (settlement and sliding), and realization of one failure mechanism (settlement leading to cracking, followed by internal erosion) can promote another (sliding followed by overtopping). These interrelations can be referred to as commoncause or commonmode effects, where a shared condition affects different failure mechanisms or when one mechanism affects another.
Figure 1: These generic event trees represent the risk to an embankment dam based on two failure mechanisms: settlement and sliding caused by a strong earthquake. Click here to enlarge image 
The total probability of failure for a dam also can serve as the basis for comparing the acceptability of the overall risk and the effectiveness of riskreduction measures. However, the probabilities related to each failure mechanism need to be combined by taking into account that they are not mutually exclusive and therefore not simply additive. Commoncause or commonmode effects among multiple failure modes may be more the rule than the exception in dam safety.^{4} These interrelations are difficult to describe using event trees because the different failure mechanisms are assumed to proceed independently using this mechanism, starting from an initial event. Also, as the number of events increases, the number of branches can grow to a point where the event tree becomes an inefficient way to represent overall risk.
Dam safety problems do not always fall neatly into customary loading categories. It is easy to overlook the interactive influences between the failure mechanisms.^{2} The causeandeffect relationships defining these mechanisms also are strongly related.
Bayesian network
Bayesian networks can be used to analyze dam risk in a global manner by describing the interrelationships between failure mechanisms. The use of Bayesian networks will help answer the following questions:

– What are the most significant factors contributing to overall risk?
– How should rehabilitation work be prioritized?
For a Bayesian network, the components of a system are determined using causeandeffect logic and are represented by variables that form nodes. The dependencies between each node are represented by arrows going from cause to effect. Such a series of nodes and arrows describing a system forms a causal model. The strength of these dependencies is quantified by conditional probability tables that underly the causal model. These tables include the probabilities of occurrence of effects, given their causes. The causal model and the associated probabilities constitute a Bayesian network.
Figure 2 on page 46 shows a Bayesian network indicating the probability of failure of a generic embankment dam. For clarify, only the causal model is shown. The individual events of the event trees in Figure 1 are represented in this network. However, the interactions between the failure mechanisms are now taken into account. Moreover, important variables in terms of risk of overtopping and internal erosion – namely reservoir level, inflow, and spillway operation – are included. The probability of occurrence of failure in Figure 2 on page 46 combines the probabilities of the two failure modes (settlement and sliding) and considers the effects of the interacting failure mechanisms.
Each variable in the Bayesian network is defined by states that can include numerical values (reservoir level less than 100 meters, 100 to 110 meters, and greater than 110 meters) or literal descriptors (such as “yes” and “no” for internal erosion and “deep” and “surface” for cracking). These states are part of the conditional probability tables that underly the causal model.
For each variable, there is a conditional probability table that contains a conditional probability for every state of that variable given every combination of states of its causes. These probabilities are determined using data analysis, models of the phenomenon, and input from experts.^{5}
One main function of Bayesian networks is the realization of inferences, which is the updating of conditional probabilities for some variables given new information about other variables. For example, the effect on the probability of failure of new evidence, such as the observation of deep cracking, is propagated numerically in the network using software – Hugin from Hugin Expert A.S. in Denmark – and algorithms based on Bayes’ theorem, which may be considered the mathematical expression of learning by experience.^{6} This approach is used to draw conclusions from observations. In the context of dam safety, these observations can result from changes in the behavior of a dam or from implementation of riskreduction measures. In the latter example, the observation of deep cracking will increase the probability of failure of the dam.
Diagnoses provided through the use of Bayes’ theorem include: determination of the most likely cause of a potential dam failure and identification of the most significant component of overall risk. A generic example of a diagnosis would be determining, between overtopping and internal erosion, the most likely cause of failure. This can be done by comparing the conditional probabilities calculated. For example, in determining the effect of sliding on the probability of failure, the Bayesian network in Figure 2 takes into account the interactions between all the failure mechanisms.
With this method, the probability of failure can be used as a common denominator for comparing the relative effectiveness of riskreduction measures.^{7} For example, possible riskreduction measures could include constructing a filtering berm or upgrading a spillway. The effect of the first measure would be to cancel the risk of internal erosion. The effect of the second measure would be to keep the spillway operational after an earthquake. Prioritization is then realized by assessing the effects of these measures on the probability of failure by comparing the probability of failure without internal erosion and the probability of failure with the spillway operational.
Application of the method
We used a Bayesian network to solve a risk analysis problem for a dam in Asia that impounds water for irrigation and provides flood protection. The dam and spillway were of questionable integrity and were in urgent need of repair.
The 25meterhigh, 860meterlong clay embankment dam does not have internal drainage and was founded on untreated soil. Construction occurred sporadically over ten years, based on the material and human resources available. Fill was placed without the use of heavy machinery or quality control procedures. The spillway, near the left abutment of the dam, is equipped with three gates that are lifted manually using a winch on a mobile gantry crane. An electrical motor is available to facilitate gate lifting.
A damsafety assessment showed that internal erosion had initiated and was progressing due to the absence of filters in the dam and foundation. The dam owner lowered the reservoir level to slow the progression of internal erosion. However, this greatly reduced the water available for irrigation, negatively affecting the local farmers and economy. Repairs were needed so the reservoir level could be returned to its full capacity.
Insufficient spillway capacity and a lack of adequate freeboard represent a significant overtopping risk for the dam during typhoons because of the heavy precipitation and waves caused by strong winds. Moreover, the mechanical/electrical components of the spillway are unreliable. Also, the gantry crane becomes unstable during strong winds and cannot be operated.
Rehab work was needed to increase the safety of the civil works and to reduce the risks to the population living downstream. Such measures also would restore irrigation and floodcontrol capabilities. Four options were considered:

– Rehabilitate the existing gates and lifting mechanism;
– Add a fourth gate to increase spilling capacity;
– Construct a filtering berm to control ongoing internal erosion; and
– Build a parapet wall on the dam crest to increase storage volume and flood routing capabilities.
These measures could not be implemented quickly because of economical constraints, so it was essential to prioritize the options. Considering the importance of this rehab project for the local population and the numerous interrelations between the failure mechanisms, an assessment of the overall risk was required. This would make it possible to select the option offering the maximum riskreduction potential at the lowest cost.
Development of a Bayesian network for this dam requires three steps. First, a global causal model is constructed to establish the causeandeffect relationships between the failure mechanisms and the main variables defining the risk in terms of probability of failure. Second, some variables of this global model are divided into more specific components for a detailed analysis. Third, probability values are assessed to express the strength and degree of uncertainty of the relationships in the causal model.
The resulting Bayesian network is used to identify the most critical variables related to the probability of failure. It is used to prioritize rehabilitation options in terms of their potential positive effects on the probability of failure.
Causal model
Failure of this dam was analyzed by considering internal erosion and overtopping (see Figure 3 on page 48). These failure mechanisms are affected by the reservoir level, which depends on precipitation and spillway operation. Wind speed affects both spillway operation and the risk of overtopping.
Some variables in the global causal model in Figure 3 (spillway gates lifting, overtopping, and internal erosion) were divided into more specific components for a detailed analysis. Figure 4 on page 50 shows the results, which form the Bayesian network for this dam.
In general, the spillway is operational if the gates and lifting mechanism are functioning. The latter depends on the gantry crane and the electrical or manual winches. The three gates were considered separately to model partial opening of the spillway. The gantry crane also is affected by wind speed.
Overtopping depends on both wave runup caused by strong winds and reservoir level. The latter depends on spillway operation and precipitation.
Internal erosion requires carried soil particles to originate from the dam or its foundation, which can occur in the presence of an unfiltered exit, erodible soil, and a sufficiently high hydraulic gradient. The latter depends on the reservoir level and the presence of pervious zones.
Probabilities
Each variable in Figure 4 on page 50 is related to a conditional probability table that expresses the strength and degree of uncertainty of the causal dependencies pertaining to this variable. The conditional probability values were determined by a risk analysis team that included personnel familiar with the site. The evaluations were based on data, knowledge, and models that could be applied to the problem at hand. The assessment process is summarized in the following paragraphs.
The probability values for the lifting mechanism were determined using a fault tree. The lifting mechanism is always functional if the gantry crane is functional and the electrical or manual winch is functional. These logical relations are included in the Bayesian network so as to consider the reliability of the mechanical/electrical components of the spillway in a more global manner that includes the other failure mechanisms and the effect of wind speed on the gantry crane.
Other approaches also were used to determine conditional probability values. When large data sets were available, the probabilities were computed using statistics. This option was used for meteorological data, which helped determine probabilities related to wind speed and precipitation. Mathematical models – such as hydrological and flood routing calculations – were used to determine reservoir level for various return periods of interest. In this study, these models took into account precipitation and the number of gates in operation (zero to three).
Other probabilities, such as those related to internal erosion, were determined using expert judgement based on geotechnical models (such as seepage analyses) and knowledge of the specific characteristics of the dam and its behavior, derived from site inspections.
Risk reduction measures identified
Using the causal model shown in Figure 4 on page 50 and the underlying probabilities, a Bayesian network can be developed to identify the variable contributing the most to overall risk. Here, the probability of failure represents the overall risk and the decision basis for selecting the optimal riskreduction measure. The probability of failure serves as the common denominator with which the negative effect of the observation or nonobservation of one or more variables is calculated.
A variable with the most negative effect on the probability of failure is not necessarily the most critical. The probability of occurrence of that variable also should be considered. The variables contributing most to overall risk are those with both a greater negative effect and greater probability of occurrence. These parameters define the criticality of a variable or group of variables by multiplying the negative effect by the probability of the variable.
Inferences and calculations were made for every variable in Figure 4 on page 50 using specialized software. These calculations indicate that overtopping would be the most probable failure mechanism. The most critical variable related to overtopping is the gantry crane, which is affected by strong winds and the low reliability of its components. A functional failure of the gantry crane causes a nonoperational spillway, which greatly increases the overtopping risk. Therefore, rehabilitation of the gantry crane would be the most efficient riskreduction measure.
Prioritizing dam safety work needed
The main objective of structural rehabilitation work is to decrease the probability of failure. The most efficient options are directed to critical variables, to maximize the potential positive effect.
However, rehabilitation measures always carry a cost. In most circumstances, a dam owner would optimize its investment by prioritizing the measures and determining the option offering a combination of the highest riskreduction potential for the lowest cost. The priority index is calculated by dividing the potential positive effect of the rehabilitation option by its projected cost.
Rehabilitation of the gates and lifting mechanism (option 1) includes the mechanical/electrical components as well as the gantry crane. In this case, all the equipment needs to be replaced because of the low reliability of every component. As a first approximation, the effect of this intervention would be a fully operational spillway. The probability of failure with the spillway being fully operational is then compared to the overall probability of failure to compute the positive effect. This process is repeated for the three other options.
Increasing spillway capacity (option 2) involves adding a gate and includes rehabilitation of the existing spillway. This would significantly decrease the risk of overtopping.
Construction of a filtering berm on the downstream toe of the dam (option 3) provides filtration to every seepage exit, decreasing the risk of an unfiltered exit.
Construction of a parapet wall (option 4) reduces the risk of overtopping but could temporarily increase reservoir level during flood events, thus negatively affecting the internal erosion risk. The net effect of this option on overall risk can be taken into account by modifying the Bayesian network shown in Figure 4 by adding a variable for the parapet wall.
The analysis indicated that construction of a filtering berm (option 3) would reduce the probability of failure by 35 percent, at a cost of about $1 million. Rehabilitation of the existing spillway (option 1) would reduce the probability of failure by 45 percent but would cost about $1.7 million. The priority index is higher for option 3 than for options 1. Options 2 and 4 had much lower priority index values. Therefore, the optimal riskreduction measure in technical and monetary terms is construction of a filtering berm.
This analysis covers the technical side of the problem, but there also are social, environmental, and legal aspects. For example, construction of a filtering berm may be the optimal riskreduction measure, but local legislation could specify a minimum spilling capacity. Therefore, a spillway upgrade could be compulsory.
In addition, some rehab measures can have negative outcomes. Increased spilling capacity would provide more safety for the structure but could endanger the downstream population when the spillway operates during a large flood. In this case, the probability of failure and consequences given a failure or large outflows must be considered jointly.
Mr. Smith may be reached at HydroQuébec, 75 ReneLevesque, Montreal, Québec H2Z 1A4 Canada; (1) 5142892211, extension 5162; Email: smith. marc@hydro.qc.ca. Messrs. Marche and Robert may be reached at Ecole Polytechnique de Montreal, CP 6079, Succ. Centreville, Montreal, Québec H3C 3A7 Canada; (1) 5143404711, extension 4801 (Marche) or 4226 (Robert); Email: claude.marche@polymtl.ca or benoit.robert@polymtl.ca.
Notes
 Denis, H., Comprendre et Gérer les Risques Sociotechnologiques Majeurs, éditions de l’école Polytechnique de Montréal, Montréal, Québec, Canada, 1998.
 Vick, S.G., “Engineering Application of Dam Safety Risk Analysis,” Proceedings of the 20th International Congress on Large Dams, International Commission on Large Dams, Paris, France, 2000, pages 325335.
 Pells, S., and R. Fell, “Damage and Cracking of Embankment Dams by Earthquake and the Implications for Internal Erosion and Piping,” Proceedings of the 21st International Congress on Large Dams, International Commission on Large Dams, Paris, France, 2003, pages 259274.
 Vick, S.G., Degrees of Belief, ASCE Press, Reston, Va., 2002.
 Hartford, D.N.D., and G.B. Baecher, Risk and Uncertainty in Dam Safety, Thomas Telford, London, 2004.
 Hacking, I., Probability and Inductive Logic, Cambridge University Press, Cambridge, Mass., 2001.
 Smith, Marc, “Réduction du Risque Barrage par l’utilisation des Réseaux Bayésiens,” PhD thesis, école Polytechnique de Montréal, Montreal, Québec, Canada, 2005.
Marc Smith is an engineer with HydroQuébec. Claude Marche and Benoît Robert are professors at École Polytechnique de Montréal. The authors developed the new approach detailed in this article as part of a research project on dam safety risk analysis.
µ Peer Reviewed
This article has been evaluated and edited in accordance with reviews conducted by two or more professionals who have relevant expertise. These peer reviewers judge manuscripts for technical accuracy, usefulness, and overall importance within the hydroelectric industry.