cs626 data analysis and simulation
play

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, - PowerPoint PPT Presentation

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Today: Recap before midterm 1 Big Picture: Model-based Analysis of Systems portion/facet real world perception transfer


  1. CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Today: Recap before midterm 1

  2. Big Picture: Model-based Analysis of Systems portion/facet real world perception transfer solution to real world problem real world problem description decision formal model transformation presentation probability model, solution, rewards, stochastic process qualitative and formal / computer aided quantitative properties analysis 2

  3. Reminder This is no pipe! ... and this is no serpentine accumulator in a production line! 3

  4. System - Model - Study Model vs System  largely simplified formal/mathematical/stochastic model implemented in software in a fully controlled environment  set of physical devices interacting in space-time in an largely uncontrolled, not fully understood environment Model  includes some of the rules how the system operates, excludes others  includes some aspects of the real world as random variables, ignores others or assumes them as constant  is parameterized with respect to certain design variables Study  has an objective, a clear question  delivers values that are probabilities like R(0,t)  Interpretation?  evaluates effects of different design choices 4

  5. CS 626 Topics From Data to Stochastic Input Models  Input Modeling  Probability, Distributions  Exploratory Data Analysis, Statistical tests  Stochastic processes, Markov Processes  DTMC, CTMC  Phase type distributions, MAPs, MAP Fitting  Tools  for data analysis: R  for MAP fitting: KPC toolbox Simulation Modeling  Simulation  Output Data Analysis  Verification, Validation,  Trace driven simulation  Debugging of simulation models  Tools for simulation: Mobius, (+Traviando) Applications  Reliability analysis, Dependability modeling of a LEO satellite  Modeling traffic in computer networks 5  Emulation: Testing, Debugging, Training in Automated Material Handling Systems

  6. From Data to Stochastic Input Models Probability  Axiomatic Definition  Frequentist Definition 6

  7. Frequency Definition of Probability If our experiment is repeated over and over again then the proportion of time that event E occurs will just be P(E). Frequency Definition of Probability: P(E) = lim m(E) / m m → ∞ where m(E) is the number of times event E occurs, m is the number of trials Note:  Random experiment can be repeated under identical conditions  if repeated indefinitely, relative frequency of occurrence of an event converges to a constant  Law of large numbers states that limit does exist.  For small m, m(E) can show strong fluctuations. 7

  8. Axiomatic Definition of Probability Definition For each event E of the sample S, we assume that a number P(E) is defined that satisfies Kolmogorov’s axioms: 8

  9. Outline on Problem Solving (Goodman & Hedetniemi 77) Identify sample space S  All elements must be mutually exclusive, collectively exhaustive.  All possible outcomes of experiment should be listed separately. (Root of “tricky” problems: often ambiguity, inexact formulation of the model of a physical situation) Assign probabilities  To all elements of S, consistent with Kolmogorov’s axioms. (In practice: estimates based on experience, analysis or common assumptions) Identify events of interest  Recast statements as subsets of S.  Use laws (algebra of events) for simplifications  Use visualizations for clarification Compute desired probabilities  Use axioms, laws, often helpful: express event of interest as union of mutually exclusive events and sum up probabilities 9

  10. More relations What is the probability of a UNION of events ? What is the probability of a union of a set of events? Is there a better way to calculate this? Sum of disjoint products (SDP) formula 10

  11. Conditional Probabilities E given F happens EF EF F F Definition The conditional probability of E given F is if P(F) > 0 and it is undefined otherwise. Interpretation: Given F has happened, only events in EF are still possible for E, so original probability P(EF) is scaled by 1/P(F). Multiplication rule: 11

  12. Independent events Definition  Two events E and F are independent if: This also means: In English, E and F are independent  if knowledge that F has occurred does not affect the probability that E occurs. Notes:  if E, F independent then also E,F c and E c ,F and E c ,F c  Generalizes from 2 to n events e.g. n=3 every subset independent  Mutually exclusive vs independent 12

  13. About independent events Venn diagrams For independent events: consider A, B being not empty S and not S, 1) if A ⊂ B, then A and B cannot be independent A B 2) if A ∩ B = ∅ , then A and B cannot be independent Tree diagrams of sequential sample spaces  Throw coin twice Joint sample space from cross product of individual sample spaces. H T First, second throw are independent. H T T H (H,H) (H,T) (T,H) (T,T) 13

  14. Joint and pairwise independence A ball is drawn from an urn containing four balls numbered 1, 2, 3, 4. Then we have: They are pairwise independent, but not jointly independent A sequence of experiments results in either a success or a failure where E i , i >= 1 denotes a success. If for all i 1 , i 2 , …, i n : we say the sequence of experiments consists of independent trials 14

  15. Independence is a very important property Independence  simplifies calculations significantly => very popular assumption for theoretical results  input modeling, workload modeling  statistical tests  output analysis of simulation models: confidence intervals for estimate of mean  ...  independence need not be present in real data  data traffic in networks: often correlated  output data of a (simulated) system, i.e. response of a system to some workload  ways to investigate independence  graphics: correlation plot  tests: chi-square test for vectors, rank von Neumann test, runs test  see Law/Kelton Chap 6.3 and Chap 7.4.1 15

  16. Bayes’ Formula Let F 1 , F 2 , …, F n be events of S, all mutually exclusive and collectively exhaustive. Theorem of total probability (also Rule of Elimination) Bayes’ Formula helps us to determine which F j happened given we observed E 16

  17. Random Variable RV Definition  A random variable X on a probability space (S,F,P) is a function X : S -> R that assigns a real number X(s) to each sample point s ∈ S, such that for every real number x, the set of sample points {s|X(s) ≤ x} is an event, that is a member of F. RVs can be discrete or continuous More concepts  cumulative distribution function  density  moments E[X i ], centralized moments, Variance, Skewness, Kurtosis Particular examples  Normal distribution  Poisson distribution  Exponential distribution  Pareto distribution 17

  18. Parameterization of distributions Parameters of 3 basic types Location  specifies an x-axis location point of a distribution’s range of values  usually the midpoint (e.g. mean for normal distribution) or lower end point for the distribution’s range  sometimes called shift parameter since changing its value shifts the distribution to the left or right, e.g., for Y = X + γ Scale  determines the scale (unit) of measurement of the values in the range of the distribution (e.g. std deviation σ for normal distribution)  changing its value compresses/expands distribution but does not alter its basic form, e.g., for Y = β X Shape  determines basic form/shape of a distribution  changing its values alters a distribution’s properties, e.g. skewness more fundamentally than a change in location or scale 18

  19. Properties of Mean, Variance and Covariance X x X stochastic variable ! F ( x ) P ( X x ) f ( y ) dy distributi on function = ! = X X f ( x ) density function X # " # " " = ! x E(cX) cE(X) = E(X) yf ( y ) dy expected value X E(X Y) E(X) E(Y) + = + # " E(X Y) E(X) E(Y) E(cX) cE(X) independen t : P(X x, Y independen t : P(X x, Y y) P(X x) P(Y y), E(XY) E(X)E(Y) = = = = = = 2 var( aX b ) a var( X ) 2 2 + = ! = var( X ) = E (( X " E ( X )) ) X var( X Y ) var( X ) var( Y ) 2 cov( X , Y ) + = + + var( X Y ) var( X ) var 2 var( aX b ) a var( X ) covariance : cov( X , Y ) E (( X E ( X ))( Y E ( covariance : cov( X , Y ) E (( X E ( X ))( Y E ( Y ))) = " " X Y cov( X , Y ) cov( X , Y ) independen t : cov( X , Y ) 0 = correlatio n : 2 2 ! X ! Y independen t : cov( X , Y ) For any random variables X, Y, Z and constant c,

  20. Proposition 2.4 X 1 , …, X n are independently and identically distributed with expected value µ and variance σ 2 . Then, Confidence intervals for estimate of mean Then, the (1 - ! ) confidence interval about x can be expressed as: ( ) ( ) t 1 s t 1 s ! ! " " N 1 N 1 2 2 ˆ " ˆ " µ " # µ # µ + N N Where – ! ( ) ( ) t N 1 is the 100 1 th percentile of the student' s t distributi on with " " ! ! 1 2 2 ! N 1 degrees of freedom (values of this distributi on can be found in tables) . ! 2 – ! s = s is the sample standard deviation. – ! N is the number of observations. 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend