to estimate correlations between
play

to Estimate Correlations between Distributions Presented by: Marc - PowerPoint PPT Presentation

2015 ICEAA Professional Development & Training Workshop June 09-12, 2015 San Diego, California A Common Risk Factor Method to Estimate Correlations between Distributions Presented by: Marc Greenberg Cost Analysis Division (CAD)


  1. 2015 ICEAA Professional Development & Training Workshop June 09-12, 2015 • San Diego, California A “Common Risk Factor” Method to Estimate Correlations between Distributions Presented by: Marc Greenberg Cost Analysis Division (CAD) National Aeronautics and Space Administration

  2. Outline • Correlation Overview • Why Propose Another Correlation Method? • Underlying Basis for “Common Risk Factor” Method – Concept of Mutual Information – Using the Unit Square to Estimate Mutual Information • “Common Risk Factor” Method (for pair of activities) – Apply 7 Steps to Estimate Correlation between 2 Distributions • Examples – Correlation of Durations for Two Morning Commutes – Correlation of Costs for Two WBS Elements of a Spacecraft • Conclusion, Other Potential Applications & Future Work Slide 2

  3. Correlation Overview (1 of 3) a • What is Correlation? – A statistical measure of association between two variables. – It measures how strongly the variables are related, or change, with each other. • If two variables tend to move up or down together, they are said to be positively correlated. • If they tend to move in opposite directions, they are said to be negatively correlated. – The most common statistic for measuring association is the Pearson (linear) correlation coefficient, r P – Another is the Spearman (rank) correlation coefficient, r S • Used in Crystal Ball and @Risk (a) Source: Correlations in Cost Risk Analysis , Ray Covert, MCR LLC, 2006 Annual SCEA Conference, June 2006 Slide 3

  4. Correlation Overview (2 of 3) a • Functional Correlation: – Captured through mathematical relationships w/in cost model • Applied Correlation: – Specified by the analyst and implemented w/in cost model – Correlations (or dependencies) between the uncertainties of WBS CERs are generally determined subjectively • However, as we collect more data, more and more of these correlations are determined using historical data • Whether functional, applied or both types of correlation, total variance can be calculated using the following:  n n k 1    where r jk is the correlation between     r   2 2 2 uncertainties of WBS CERs j and k Total k jk j k    k 1 k 2 j 1 (a) Source: Joint Agency Cost Schedule Risk and Uncertainty Handbook (Sec. 3.2 & Appendix A) , 12 March 2014 Slide 4

  5. Correlation Overview (3 of 3) a Currently, there are 2 general paths to obtain r … r Statistical Non-Statistical Data Available: No Data: (CADRE, CERs) Educated Guess Retro- Residual Effective Causal N-Effect Knee in curve r Analysis ICE Guess Guess (Steve Book Method) Regressed Residuals for 2 CERs (X and Y) for 8 Programs ( Correlation = Spearman's Rho = 0.88 ) 0.60 0.55 Strength Positive Negative Example: 0.50 None 0 0 0.45 Example: Weak 0.3 -0.3 0.40 0.35 Medium 0.5 -0.5 0.30 Strong 0.9 -0.9 0.25 Perfect 1 -1 0.20 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 (a) Schematic from Correlations in Cost Risk Analysis , Ray Covert, MCR LLC, 2006 Annual SCEA Conference, June 2006 Slide 5

  6. Why Propose Another Correlation Method? 1. For statistical methods, lack of data makes it difficult to calculate robust Pearson’s R or Spearman’s Rho – Example: Residuals from previous slide produces Rho = 0.88. However, the residuals exhibit an “influential observation.” 2. For non-statistical methods, there can be many issues: – “N - Effect” and “Knee -in-the- Curve” methods are not inherently intuitive to the non-practitioner. – Although “Causal Guess” method is simple and intuitive, the analyst and/or subject matter expert are still guessing. – Whenever parameters of 2 uncertainty distributions lack basis, the correlation between them is difficult to justify. Unlike these other methods, the Common Risk Factor Method provides correlation between 2 uncertainties based upon common root-causes. Applying this method may lessen the degree of subjectivity in the estimate. Slide 6

  7. Correlation Overview (Revisited) a Currently, there are 2 This presentation proposes general paths to obtain r … a fourth Non-Statistical r method to obtain r … Statistical Non-Statistical Data Available: No Data: (CADRE, CERs) Educated Guess Probabilistic: Retro- Residual Effective Causal N-Effect Knee in curve Common Risk ICE r Analysis Guess Guess (Steve Book Method) Factor Method “Common Risk Factor Method” Notional Example (Output Only) : Given Tasks 1 & 2 each have an apprentice welder, we expect added uncertainty in Labor Labor Skillset Skillset the duration of Tasks 1 & 2 due to the lack of skills for each “untested” welder. during during Task 1: Max Duration will go up by 5 days due to adding P/T welder to team Task 1 Task 2 Task 2: Max Duration will go up by 10 days due to adding F/T welder to team Tasks 1 and 2 Correlation = 0.40 , partly driven by common skillset in each task. (a) Schematic from Correlations in Cost Risk Analysis , Ray Covert, MCR LLC, 2006 Annual SCEA Conference, June 2006 Slide 7

  8. Outline • Correlation Overview • Why Propose Another Correlation Method? • Underlying Basis for “Common Risk Factor” Method – Concept of Mutual Information – Using the Unit Square to Estimate Mutual Information • “Common Risk Factor” Method (for pair of activities) – Apply 7 Steps to Estimate Correlation between 2 Distributions • Examples – Correlation of Durations for Two Morning Commutes – Correlation of Costs for Two WBS Elements of a Spacecraft • Conclusion, Other Potential Applications & Future Work Slide 8

  9. Concept of Mutual Information • Whenever two objects share common features, these features can be perceived as “mutual information” 2 of the 8 Binary string x: 0 0 0 1 0 1 1 1 Mutual information: pairs are = 2 / 8 or 0.25 or 25% Binary string y: 1 0 1 1 1 0 0 0 the same 16 oz. OJ 8 oz. of OJ The “least common Mutual information: denominator” is = 8 / 16 or 0.50 or 50% 8 oz. of OJ Mutual information can also be applied to risk factors that are common among a pair of uncertainty distributions. Slide 9

  10. Mutual Information between 2 groupings Weighted Ave: Mutual Information = S Weight * (Minimum ( X, Y ) / Maximum ( X, Y )) Minimum Maximum Mutual Wtd Mutual Group X Group Y Weight ( X, Y ) ( X, Y ) Information Information 16 oz. 8 16 8 / 16 16 / 32 0.50 x 0.50 8 oz. = 0.50 = 0.50 = 0.25 4 12 4 / 12 12 / 32 0.333 x 0.375 12 oz. = 0.333 = 0.375 = 0.125 4 oz. 4 4 4 / 4 4 / 32 1.00 x 0.125 4 oz. = 1.00 = 0.125 = 0.125 4 oz. ------------------------------------------------------------------------------------- Sum: 32 0.50 ------------------------------------------------------------------------------------- Mutual Information between Group X and Y Slide 10

  11. The Unit Square: Meeting Times Example a Example Problem: A boy & girl plan to meet at the park between 9 &10am (1.0 hour). Neither individual will wait more than 12 minutes (0.20 of an hour) for the other. If all times within the hour are equally likely for each person, and if their times of arrival are independent, find the probability that they will meet. Solution (Part 1 of 2): X and Y are uniform RV’s The boy’s actions can be depicted as a single continuous random variable X that takes all values over an interval a to b with equal likelihood. This distribution, called a uniform distribution, has a density function of the form Similarly, the girl’s actions can be depicted as a single continuous random variable Y that takes all values over an interval a to b with equal likelihood. In this example, the interval is from 0.0 to 1.0 hour. Therefore a = 0.0 and b = 1.0. Notation for this uniform distribution is U [0, 1] Slide 11 (a) K. Van Steen, PhD, Probability and Statistics , Chapter 2: Random Variables and Associated Functions

  12. The Unit Square: Meeting Times Example Solution (Part 2 of 2): Model Frequency when |X – Y| < 0.20 Neither person will wait more than 0.20 of an hour. This can be modeled as a simulation where a “meeting” occurs only when |X – Y| < 0.20 . Simulation of Joint Density Function of Uniformly Distributed Random Variables Probability of |X - Y | < 0.20 on Unit Square Iteration rv (X) rv (Y) |X - Y| |X - Y| < 0.2? 1.00 1 0.142 0.318 0.176 1 2 0.368 0.733 0.365 0 0.90 3 0.786 0.647 0.138 1 4 0.375 0.902 0.528 0 0.80 5 0.549 0.935 0.386 0 6 0.336 0.775 0.439 0 0.70 7 0.613 0.726 0.113 1 Random Variable Y : : : : : 0.60 : : : : : 0.50 9998 0.157 0.186 0.029 1 0.40 9999 0.384 0.991 0.607 0 0.30 10000 0.045 0.399 0.354 0 Total = 3630 0.20 This simulation indicates that out of 10,000 0.10 trials, the boy and girl meet 3,630 times. 0.00 Probability they will meet = 0.363 or 36% 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Random Variable X Slide 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend