input distributions
play

Input Distributions Reading: Chapter 6 in Law Input Distributions - PowerPoint PPT Presentation

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory for Choosing Distributions Peter J. Haas Data-Driven Approaches Other Approaches to Input Distributions CS 590M: Simulation Spring Semester 2020


  1. Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory for Choosing Distributions Peter J. Haas Data-Driven Approaches Other Approaches to Input Distributions CS 590M: Simulation Spring Semester 2020 1 / 22 2 / 22 Overview To specify a simulation model, we need to define clock-setting “input” sequences Examples: Input Distributions ◮ Interarrival sequences Overview ◮ Processing time sequences for a production system Probability Theory for Choosing Distributions ◮ Asset-value sequence for a financial model Data-Driven Approaches Other Approaches to Input Distributions Even if we assume i.i.d. sequences for simplicity... ◮ What type of distribution should we use (gamma, Weibull, normal, exponential, . . . )? ◮ Given a type of probability distribution (i.e., a “distribution family”) what parameter values should we use? Two approaches: probability theory and historical data 3 / 22 4 / 22

  2. Theoretical Justification for Normal Random Variables Theoretical Justification for Lognormal Random Variables Suppose that X can be expressed as a sum of random Suppose that X can be expressed as a product of random variables: X = Y 1 + Y 2 + · · · + Y n variables: X = Z 1 Z 2 · · · Z n ◮ Example: Value of a financial asset In great generality, versions of CLT imply that X D ∼ N ( µ, σ 2 ) (approx.) for large n , where µ = E [ X ] and σ 2 = Var[ X ] Then log( X ) = Y 1 + Y 2 + · · · + Y n , where Y i = log( Z i ) ◮ Y i ’s need not be i.i.d., just not “too dependent” and not “too By prior discussion, log( X ) D non-identical” ∼ N ( µ, σ 2 ) (approx.) for large n , i.e. X D � � ∼ exp N (0 , 1) , so that X is approximately lognormally Q: Examples where CLT breaks down? distributed Moral: Moral: If X is the product of a large number of other random quantities, If X is the sum of a large number of other random quantities, then then X can be approx. modeled as a lognormal random variable. X can be approx. modeled as a normal random variable. 5 / 22 6 / 22 Theoretical Justification for Poisson Arrival Process Theoretical Justification for Weibull Distribution Suppose that the arrival process is a superposition of arrivals from a variety of statistically independent sources Suppose that X can be expressed as a minimum of nonnegative random variables: X = min 1 ≤ i ≤ n Y i source 1 ◮ Example: Lifetime of a complex system source 2 source 3 queue : Extreme-value theory (Gnedenko’s Theorem) says that if Y i ’s : : are i.i.d. then X has approximately a Weibull distribution source n when n is large: P ( X ≤ x ) = 1 − e − ( λ x ) α The Palm-Khintchine Theorem says that superposition of n ◮ α is the shape parameter and λ is the scale parameter i.i.d. sources looks ≈ Poisson as n becomes large ◮ Can relax i.i.d. assumption Moral: If X is the the lifetime of a complicated component, then X can be Moral: approx. modeled as a Weibull random variable. Poisson process often a reasonable model of arrival processes. (But beware of long-range dependence) 7 / 22 8 / 22

  3. Controlling the Variability Squared coefficient of variation: ρ 2 ( X ) = Var( X ) ( E [ X ]) 2 D Case 1: X ∼ exp( λ ) Input Distributions ρ 2 ( X ) = 1 Overview Probability Theory for Choosing Distributions D f X ( x ) = λ e − λ x ( λ x ) α − 1 / Γ( α ) Case 2: X ∼ gamma( λ, α ) Data-Driven Approaches Other Approaches to Input Distributions ρ 2 ( X ) = α/λ 2 ( α/λ ) 2 = 1 α Three scenarios to play with: ◮ α > 1: less variable than exponential ◮ α = 1: exponential ◮ α < 1: more variable than exponential 9 / 22 10 / 22 Goodness-of-Fit Software Feature Matching to Data GoF software applies a set of goodness-of-fit tests to select Pragmatic approach: match key features in the data distribution family with “best fit” to data ◮ chi-square, Kolmogorov-Smirnov, Epps-Singleton, Ex: Use gamma dist’n, match first two empirical moments Anderson-Darling, . . . ◮ Hence match the empirical coefficient of variation (see below) ◮ Note: nothing about “process physics” implies gamma, we use GoF software must be used with caution it for it’s convenence and flexibility in modeling a range of ◮ Low power: Poor discrimination between different distributions variability ◮ Sequential testing: Test properties mathematically ill-defined ◮ Discourages sensitivity analysis: Unwary users stop with “best” Can use even more flexible distribution families ◮ Can obscure non-i.i.d. features: e.g., trends, autocorrelation ◮ Ex: Johnson translation system (4 parameters) ◮ Fails on big data: All test fail on real-world datasets ◮ Ex: Generalized lambda distribution (4 parameters) ◮ Over-reliance on summary statistics: Should also plot data ◮ Useful when extreme values not important to simulation ◮ Ex: Q-Q plots [Law, p. 339-344] better indicate departures [Nelson (2013), pp. 113–116] from candidate distribution (lower/upper, heavy/light tails) 11 / 22 12 / 22

  4. Estimating Parameters: Maximum Likelihood Method Maximum Likelihood, Continued Ex: Poisson arrival process to a queue Ex: Geometric distribution ◮ Exponential interarrivals: 3.0, 1.0, 4.0, 3.0, 8.0 ◮ Number of failures before first success in Bernoulli trials, ◮ Goal: estimate λ success probability = p ◮ For a continuous dist’n, likelihood of an observation = pdf ◮ P { X = k } = (1 − p ) k p for k ≥ 0 [Law, Problem 6.26] ◮ How to estimate p given four observations of X ? X = 3 , 5 , 2 , 8 ◮ Given a value of λ , likelihoods for observations: ◮ Given a value of p , likelihood for ◮ Joint likelihood: L ( λ ) = obs1: obs2: obs3: obs4: ◮ Joint log-likelihood: ˜ L ( λ ) = ◮ Joint likelihood L ( p ) for all four observations: ◮ Estimate ˆ λ = ◮ Choose estimator ˆ p to maximize L ( p ) (“best explains data”) ◮ Equivalently, maximize ˜ � � L ( p ) = log L ( p ) : Q: Why is this a reasonable estimator? 13 / 22 14 / 22 Maximum Likelihood, Continued Maximum Likelihood, Continued Maximizing the likelihood function General setup (continuous i.i.d. case) ◮ Simple case: solve ◮ Given X 1 , . . . , X n i.i.d. samples from pdf f ( · ; α 1 , . . . , α k ) ◮ MLE’s ˆ α 1 , . . . , ˆ α k maximize the likelihood function ∂ ˜ L n (ˆ α 1 , . . . , ˆ α k ) = 0 , for i = 1 , 2 , . . . , k ∂α i n � L n ( α 1 , . . . , α k ) = f ( X i ; α 1 , . . . , α k ) ◮ Harder cases: maximum occurs on boundary, constraints i =1 (Kuhn-Tucker conditions) or, equivalently, the log-likelihood function ◮ Hardest cases (typical in practice): solve numerically n Why bother? ˜ � � � L n ( α 1 , . . . , α k ) = log f ( X i ; α 1 , . . . , α k ) ◮ MLEs maximize asymptotic statistical efficiency i =1 ◮ For large n , MLEs “squeeze maximal info from sample” ◮ For discrete case use pmf instead of pdf (i.e., smallest variance ⇒ narrowest confidence intervals) 15 / 22 16 / 22

  5. Estimating Parameters: Method of Moments Bayesian Parameter Estimation Idea: equate k sample moments to k true moments & solve View unknown parameter θ as a random variable with prior distribution f ( θ ) Example: Exponential distribution ◮ Encapsulates prior knowledge about θ before seeing data ◮ k = 1 and E [ X ] = 1 /λ ◮ Ex: If we know that θ ∈ [5 , 10] set f = Uniform(5,10) ◮ So equate first moments: After seeing data Y , use Bayes’ Rule to compute posterior X n = 1 / ˆ ¯ λ = 1 / ¯ ˆ λ ⇒ X n f ( θ | Y ) = f ( Y | θ ) f ( θ ) / c Example: Gamma distribution where f ( Y | θ ) is the likelihood of Y under θ and c is normalizing ◮ k = 2, E [ X ] = α/λ , and Var[ X ] = α/λ 2 constant ◮ equate moments: Estimate θ by mean (or mode) of posterior ◮ f ( θ | Y ) usually very complex ¯ α/ ˆ λ and s 2 α/ ˆ λ 2 α = ¯ X 2 n / s 2 n and ˆ λ = ¯ X n / s 2 X n = ˆ n = ˆ ⇒ ˆ n ◮ Use Markov Chain Monte Carlo (MCMC) to estimate mean— see HW 2 Can match other stats, e.g., quantiles [Nelson 2013, p. 115] 17 / 22 18 / 22 Bayesian Parameter Estimation, Continued Simple example (see refresher handout) ◮ Goal: Estimate success probability for Bernoulli distribution ◮ Assume a Beta( α, β ) prior on θ Input Distributions ◮ Observe n Bernoulli trials with Y successes Overview Probability Theory for Choosing Distributions ◮ Y | θ has Binomial( n , θ ) likelihood distribution Data-Driven Approaches ◮ Posterior, given Y = y , is Beta( α + y , β + n − y ) Other Approaches to Input Distributions (Beta is “conjugate prior” to binomial) α + y ◮ ˆ θ = mean of posterior = α + β + n Note: Maximum likelihood is a special case of Bayesian estimation (with a uniform prior and MAP estimation) 19 / 22 20 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend