Managing Uncertainty in the Context of Risk Acceptance - - PowerPoint PPT Presentation

managing uncertainty in the context of
SMART_READER_LITE
LIVE PREVIEW

Managing Uncertainty in the Context of Risk Acceptance - - PowerPoint PPT Presentation

Managing Uncertainty in the Context of Risk Acceptance Decision-Making at NASA: Thinking Beyond The Model Presented at the Rigorous Test and Evaluation for Defense, Aerospace, and National Security" Workshop Crystal City, VA April


slide-1
SLIDE 1

Managing Uncertainty in the Context of Risk Acceptance Decision-Making at NASA: Thinking Beyond “The Model”

Presented at the “Rigorous Test and Evaluation for Defense, Aerospace, and National Security" Workshop

Crystal City, VA April 11, 2016 Homayoon Dezfuli, Ph.D. Technical Fellow, System Safety Office of Safety and Mission Assurance NASA Headquarters

slide-2
SLIDE 2

Acknowledgments

  • Opinions expressed in this presentation are not necessarily

those of NASA

  • Most of the present discussion is based on work performed by

the Office of Safety and Mission Assurance in conjunction with:

NASA System Safety Handbook, Volume 1 (NASA/SP-2010-580)

NASA System Safety Handbook, Volume 2 (NASA/SP-2014-612)

NASA’s initiatives to formalize its processes for risk acceptance

2

slide-3
SLIDE 3

Overview

  • In the NASA risk management context, “risk” means “potential for

falling short of performance requirements”

E.g., a particular value of Probability of Loss of Crew (P(LOC)) might be a safety performance requirement (threshold for maximum acceptable risk)

The risk is the probability that the “actual” P(LOC) > the threshold

Roughly analogous to MIL-HDBK-189C consumer risk: the probability of accepting a system when the true reliability is below the technical requirement

  • In a mission context, the scope of performance requirements spans the

domains of safety, technical, cost, and schedule

  • Specifying acceptable levels of performance for a given system is a

question of requirements setting and relates to policy decisions (not a topic of this presentation)

  • Uncertainty about what the “actual” performance of a system is, or will

be, relates to epistemic uncertainty, and is a topic of this presentation

  • At issue is the need to make sure that the decision maker (DM) is

adequately apprised of all the relevant uncertainty when making risk acceptance decisions

For the above example, in order to justify a risk acceptance decision, DM needs assurance (enough confidence) that P(LOC) < “threshold”

3

slide-4
SLIDE 4

How Safe Is Safe Enough?

  • The trigger for dealing with the issue of “adequate safety” was the

NASA Aerospace Safety Advisory Panel (ASAP) Recommendation 2009- 01-02a:

“The ASAP recommends that NASA stipulate directly the acceptable risk levels— including confidence intervals for the various categories of activities (e.g., cargo flights, human flights)—to guide managers and engineers in evaluating “how safe is safe enough.”

  • NASA accepted the ASAP recommendation and committed to

establishing safety thresholds and goals for human space flight

Safety threshold expresses an initial minimum tolerable level of safety

Safety goal expresses expectations about the safety growth of the system in the long term

  • Additionally, because of spaceflight’s high risk, NASA also recognized

an ethical obligation to pursue safety improvements wherever practicable

In other words, NASA systems should be As Safe As Reasonably Practicable (ASARP)

The ASARP principle applies regardless of meeting safety thresholds and goals

  • Threshold and goal values, as well as the level of ASARP application,

are a function of risk tolerances

4

slide-5
SLIDE 5

Adequate Safety

5

Adequate Safety Meeting Minimum Levels

  • f Safety

Being ASARP

  • Establish safety thresholds, safety

goals, safety growth profiles

  • Establish safety performance

margins to account for UU risk

  • Levy safety performance

requirements and associated verification procedures (e.g., Probabilistic Risk Assessment (PRA), tests)

  • Conduct verifications
  • Analyze a range of alternatives during major design, product realization, operations and

sustainment decisions (i.e., risk-informed decision making (RIDM))

  • Prioritize safety during decision making
  • Implement design-for-safety strategies (e.g., hazard elimination, hazard control (e.g.,

Design for Minimum Risk (DFMR)), failure tolerance (e.g., redundancy/diversity), safing, emergency operations)

  • Analyze and test (e.g., Hazard Analysis, Failure Modes & Effects Analysis and Critical

Items List, PRA, qualification/acceptance testing)

  • Monitor and respond to performance (e.g., precursor analysis, Problem Reporting and

Corrective Action (PRACA), closed-loop risk management)

  • Adhere to appropriate codes and standards
  • Etc.
slide-6
SLIDE 6

Risk Models

  • Risk model development (synthetic analysis) attempts to forecast

performance within a probabilistic framework that accounts for known, quantifiable sources of epistemic uncertainty.

Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input Input

Performance Parameter 1 Performance Parameter 2 Performance Parameter m

Performance Measure 1 Performance Measure 2 Performance Measure n

Performance Measure Values for Alternative i Cost & Schedule Safety Technical Decision Alternative i

6

slide-7
SLIDE 7

Real World vs. Models

  • Risk models must be

constantly and critically re-examined for consistency with system configuration/

  • peration, and updated

with relevant information (e.g., accident precursor analysis…) to ensure the closest correlation and fastest convergence between the “real world” and the “risk model”

7

slide-8
SLIDE 8

The Gap

  • However, in NASA contexts there is typically a gap between the

real world and the model that is initially dominating and does not converge until long after most major decisions have been made

Executing first-of-a-kind missions with first-of-a-kind hardware

Employing systems that operate at the edge of engineering capability

  • This gap is the domain of so-called Unknown and/or

Underappreciated (UU) risks

  • UU risks live outside the model due to:

Model incompleteness

Being outside the scope of the model

Violating the model assumptions

Remaining latent in the system until revealed by operational failures, precursor analysis, etc.

Tending to be most significant early in the system life cycle

Disproportionally reflecting complex intra-system and environmental interactions

8

slide-9
SLIDE 9

How Significant is the Gap?

9

MODEL GAP

UU scenarios have historically represented a significant fraction of actual risk, especially for new systems

slide-10
SLIDE 10

Launch System Reliability Trends

10

Source: Morse et al., “Modeling Launch Vehicle Reliability Growth as Defect Elimination,” AIAA

Space Conference and Exhibition (2010).

slide-11
SLIDE 11

Results of Retrospective Analysis of Space Shuttle Risk

11

0.02 0.04 0.06 0.08 0.1 0.12 50 100 150 P(LOC) Chronological Flight Number Backward-Look PRA Results Accounting for Revealed LOC Accidents Backward-Look PRA Results Not Accounting for Revealed LOC Accidents RU is the contribution of UU scenarios to the P(LOC) level Final System Risk RU Risk from Known Scenarios Actual Risk (Known + UU Scenarios) RK

Source: Shuttle Risk Progression: Use of the Shuttle Probabilistic Risk Assessment (PRA) to Show

Reliability Growth, Teri L Hamlin et al. (AIAA, 2010) (downloadable from http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20110004917_2011004008.pdf)

slide-12
SLIDE 12
  • Aerospace Safety Advisory Panel (ASAP) and others have

identified the need to consider the gap between known risk and actual risk when applying NASA safety thresholds and goals

  • We use the concept of safety performance margin to account

for UU risks

Accounting for Unknown/Underappreciated (UU) Risks

12

  • Based on historical

discrepancies between initially-calculated and eventually- demonstrated safety performance

  • Provides a rational

basis for deriving probabilistic requirements on known risk

Risk Acceptance Threshold for Actual Risk Requirement for Known Risk

slide-13
SLIDE 13

The Case for the “Safety Case”

  • The “safety case” goes beyond traditional system-centric risk analysis to

address the totality of the “uncertainty story” about the actual safety performance of the system

Presented and defended by the provider at key decision points

Provides the DM with a rational basis for identifying assurance deficits (inadequacies in the evidentiary support of the safety claims)

Involves serious consideration of things that live outside traditional risk models (e.g., organizational and management factors)

13

Unknown / Underappreciated Known

  • In order to be adequately informed, risk acceptance decision-making must go

beyond the risk analysis

  • A holistic “safety case” must be made that the system is adequately safe: a

coherent and evidentiary statement of how safe we are (or will be) at a given stage of the life cycle

Substantiation that UU risks are adequately managed via application of the ASARP principle:

  • Minimize the presence of UU scenarios (e.g., via margin, programmatic commitments)
  • Maximize discovery of UU hazards (e.g., via testing, liberal instrumentation, monitoring, and

trending, anomaly investigation, Precursor Analysis, use of best safety analysis techniques)

  • Provide broad-coverage safety features (e.g., abort capability, safe haven, rescue)

Substantiation that the known risk (calculated by PRA) is within the specified safety performance requirement

  • Known risks are managed by applying controls that are designed to mitigate identified accident

scenarios

slide-14
SLIDE 14

The Model as Evidence

  • The risk model counts as (major) evidence in the safety case
  • But how good is it? To what extent can the DM rely on it?
  • NASA-STD-7009, Standard for Models and Simulation (M&S) presents

a framework for assessing the credibility of models and simulations in the context of the uses to which they are put

  • The credibility assessment is presented with the model and model

results, as an integral part of the case

14

slide-15
SLIDE 15

Summary

  • In general:

When a system is being acquired or licensed, someone (acquirer, licensing authority) is making a risk-acceptance decision…

  • Potentially affecting a range of stakeholders (public, workers, …) in different

ways (safety, technical, cost, schedule …)

The decision is informed by some combination of modeling, analysis, experience with the subject system, experience with related technology, and, in some cases, a sense of the provider’s (or the applicant’s) capability

The responsible decision-maker has to have a sense of the uncertainties affecting the decision, including the limitations of the model

  • At NASA:

The challenge is to execute high-stakes, first-of-a-kind missions that are subject to significant uncertainty in all domains (safety, technical, cost, schedule)

  • Actual risks are typically not accurately knowable

As in other complex, high-stakes undertakings, modeling is a vital ingredient in the development process

But responsible risk-acceptance decision-making requires the decision- maker to think beyond the model

15