generalized degrees of freedom gdf
play

Generalized Degrees of Freedom (GDF) 9 June 2015 Dr. Shu-Ping Hu - PowerPoint PPT Presentation

Generalized Degrees of Freedom (GDF) 9 June 2015 Dr. Shu-Ping Hu Los Angeles Washington, D.C. Boston Chantilly Huntsville Dayton Santa Barbara Albuquerque Colorado Springs Goddard Space Flight Center


  1. Generalized Degrees of Freedom (GDF) 9 June 2015 Dr. Shu-Ping Hu  Los Angeles  Washington, D.C.  Boston  Chantilly  Huntsville  Dayton  Santa Barbara  Albuquerque  Colorado Springs  Goddard Space Flight Center  Johnson Space Center  Ogden  Patuxent River  Washington Navy Yard  Ft. Meade  Ft. Monmouth  Dahlgren  Quantico  Cleveland  Montgomery  Silver Spring  San Diego  Tampa  Tacoma  Aberdeen  Oklahoma City  Eglin AFB  San Antonio  New Orleans  Denver  Vandenberg AFB PRT-191 30 Mar 2015 PRT-191 30 Mar 2015 Approved for Public Release Approved for Public Release 1

  2. Outline  Constrained Process (Background Info)  Objectives  Error Terms (Additive vs. Multiplicative)  Multiplicative-Error Models  ZMPE CER Unbiased?  SPE Comparison: ZMPE vs. MUPE  Definitions of DF and GDF  Calculate Fit Statistics Using GDF  Examples  Conclusions Note: SPE is standard percent error and MUPE stands for m inimum- u nbiased- p ercent error. Other acronyms will be explained on next page PRT-191 30 Mar 2015 Approved for Public Release 2

  3. Constrained Process (1/2) Introduction  Solver (an Excel add-in program) is a popular tool used to generate nonlinear cost estimating relationships (CER), especially when constraints are specified. A few examples are given below:  Minimizing the sum of squared percentage errors under the Zero- Percentage Bias constraint (i.e., the ZMPE CER)  Minimizing the sum of squared residuals under the Zero-Percentage Bias constraint, (i.e., the mean of the % errors is zero)  Minimizing the sum of squared percentage errors or residuals in log space under the Zero-Bias constraint (i.e., the mean of the residuals is zero) using the Balance-Adjustment Factor (BAF) 1  In the above examples, we may not have the degrees of freedom (DF) as given by the traditional definition when no constraints are specified Book, S., “ Significant Reasons to Eschew Log-Log OLS Regression when Deriving Estimating 1. Relationships,” 2012 ISPA/SCEA Joint Annual Conference, Orlando, FL, 26 -29 June. PRT-191 30 Mar 2015 Approved for Public Release 3

  4. Constrained Process (2/2) Suggestions  Do not abuse Solver  Do not specify constraints excessively just because it is easy to do so in Solver  Explore different starting points to see if the solution stabilizes when using Solver  Solver can be sensitive to starting points — different starting points may lead to different solutions  Solver can be trapped in local minima, especially when fitting complicated or ZMPE equations  Specify “meaningful” constraints  Make sure the constraints are necessary, logical, and statistically sound as DF can be reduced by additional constraints  Calculate the DF properly when constraints are specified PRT-191 30 Mar 2015 Approved for Public Release 4

  5. Objectives  Explain why degrees of freedom (DF) should be adjusted if constraints are specified in the curve-fitting process  Recommend a Generalized Degrees of Freedom (GDF) measure to compute fit statistics properly for constraint-driven equations  Explain why ZMPE’s standard error underestimates the spread of the CER error distribution  We will illustrate how to calculate standard error properly for ZMPE CERs PRT-191 30 Mar 2015 Approved for Public Release 5

  6. Additive Error Term: Y = f(X) +  Additive Error Term : y = aX^b +  Y Additive Error Term : y = f(x) +  X Note: This requires non-linear regression. Y Cost variation is independent of the scale of the project X Note: Error distribution is independent of the scale of the project. (OLS) PRT-191 30 Mar 2015 Approved for Public Release 6

  7. Multiplicative Error Term: Y = f(X)*  Multiplicative error assumption is Multiplicative Error Term : y = ax^b *  appropriate when  Errors in the dependent variable are UpperBound believed to be proportional to the f(x) LowerBound level of the function (the value of the variable) Y  Dependent variable ranges over more than one order of magnitude Multiplicative Error Term : y = (a + bx) *  X Note: This equation is linear in log space. UpperBound f(x) LowerBound Y Cost variance is proportional to the scale of the project X Note: This requires non-linear regression. PRT-191 30 Mar 2015 Approved for Public Release 7

  8. Multiplicative Error Model: Y = f(X)*   Log-Error:  ~ LN(0, s 2 )  Least squares in log space If f(x) is linear in log space, it is  Error = Log (Y) - Log f(X) termed log-linear or LOLS CER  Minimize the sum of squared errors; process is done in log space  MUPE: E(  ) = 1, V(  ) = s 2  Least squares in weighted space  Error = (Y-f(X)) / f(X) Note: E((Y-f(X)) / f(X)) = 0 v ariance of error term V((Y-f(X)) / f(X)) = s 2  Minimize the sum of squared (%) errors iteratively (i.e., minimize S i {(y i -f(x i ))/f k-1 (x i )} 2 , k is the iteration number)  MUPE (an iterative, weighted least squares) has zero sample bias  ZMPE: E(  ) = 1, V(  ) = s 2  Least squares in weighted space  Error = (Y-f(X)) / f(X)  Minimize the sum of squared (percentage) errors with a constraint: S i (y i - f(x i ))/f(x i ) = 0 We will focus on MUPE/ZMPE equations in this paper  ZMPE is a constrained minimization process  Average sample bias is eliminated by the constraint PRT-191 30 Mar 2015 Approved for Public Release 8

  9. ZMPE CER Unbiased? Don’t Know  Both MUPE and ZMPE methods have zero percentage bias (ZPB) for the  ˆ n 1 y y sample data points:   y = actual value i i 0 ˆ ŷ = predicted value n y  i 1 i  For MUPE, this condition is achieved through the iterative minimization process; for ZMPE, ZPB is obtained by using a constraint If a CER is unbiased, then E( Ŷ ) = E(Y) = f(X, b )  Does the “ZPB” property imply that the CER is unbiased?  Not necessarily The ZPB constraint can be applied to any proposed methodologies (i.e., objective  functions), but there is no guarantee that the CER result will be unbiased; namely, this condition “ E( Ŷ ) = f(X, b )” may not be satisfied  MUPE is the best linear unbiased estimator (BLUE) for linear models For linear CERs, e.g., Y = (a + bX 1 + cX 2 )*  , the MUPE method produces unbiased  estimates of the parameters and the function mean; it also provides smaller variances for the parameters and for any linear function of the parameters MUPE’s parameter estimators are the quasi maximum likelihood  estimators (QMLE) of the parameters; MUPE also provides consistent estimates of the parameters. ZMPE CERs, however, do not have statistical properties readily available. PRT-191 30 Mar 2015 Approved for Public Release 9

  10. SPE Comparison: ZMPE vs. MUPE (1/5)  The standard percent error (SPE) for Y = f(X)*  is given by SPE is CER’s standard error of estimate, which is used to measure the model’s overall error of estimation. It is the one- sigma spread of the MUPE or ZMPE CER.  n = sample size, p = total # of estimated parameters, y = actual value, and ŷ = predicted value  SPE 2 (i.e., MSE) is used to estimate s 2 , the variance of  Equal sign holds only for  SPE (ZMPE) ≤ SPE (MUPE) simple factor equations  ZMPE always produces a smaller SPE when compared to MUPE except for simple factor CERs (Book, 2006)  Is a smaller SPE better?  No , not necessarily. If it is true, we should develop MPE CERs, which are proven to be over-estimating (see Hu and Sjovoid, 1994)  Beware of using SPE alone for selecting CERs; we should also evaluate other useful stats (see Hu, 2010) PRT-191 30 Mar 2015 Approved for Public Release 10

  11. SPE Comparison: ZMPE vs. MUPE (2/5) E(SPE 2(ZMPE) ) ≤ E(SPE 2(MUPE) ) = s 2 Q: Is ZMPE’s SPE 2 (i.e., MSE) an unbiased estimator of s 2 ? No (I) When the CER is linear:  MUPE’s SSE = S i w i (y i – ŷ i ) 2 = Z’(I – H)Z  MUPE can be converted to OLS in weighted space; w i ( = 1/( ŷ i ) 2 ) is the weighting factor of the i th observation  Z is the new vector variable in the weighted space ( ). V( Z ) = I s 2 z  w y i i i and H is Z ’s hat matrix. See Morrison (1983) & Draper and Smith (1981).  For MUPE CERs: E(SSE) = s 2 (n – p) and E(SPE 2 (MUPE) ) = s 2  E(SSE/(n-p)) = E(MSE) = E(SPE 2(MUPE) ) = s 2 Equal sign holds  This equation is true regardless of the distribution type only for simple factor equations  This is an approximation if the CER is nonlinear  ZMPE’s SPE underestimates the true s , except for simple factor CERs  Since SPE 2(ZMPE) ≤ SPE 2(MUPE) , E(SPE 2(ZMPE) ) ≤ E(SPE 2(MUPE) ) = s 2 Caution: ZMPE’s SPE underestimates the spread of the CER error distribution PRT-191 30 Mar 2015 Approved for Public Release 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend