statistics for machine learning
play

Statistics for Machine Learning Prof. Seungchul Lee Industrial AI - PowerPoint PPT Presentation

Statistics for Machine Learning Prof. Seungchul Lee Industrial AI Lab. Statistics and Probability statistics data model probability 2 Populations and Samples A population includes all the elements from a set of data A parameter is a


  1. Statistics for Machine Learning Prof. Seungchul Lee Industrial AI Lab.

  2. Statistics and Probability statistics data model probability 2

  3. Populations and Samples • A population includes all the elements from a set of data • A parameter is a quantity computed from a population – mean, 𝜈 – variance, 𝜏 2 • A sample is a subset of the population. – one or more observations • A statistic is a quantity computed from a sample – sample mean, ҧ 𝑦 – sample variance, 𝑡 2 – sample correlation, 𝑇 𝑦𝑧 3

  4. How to Generate Random Numbers • Data sampled from population/process/generative model 4

  5. Histogram • Graphical representation of data distribution ⇒ rough sense of density of data counts/freq ... ... bin 5

  6. Inference • True population or process is modeled probabilistically • Sampling supplies us with realizations from probability model • Compute something, but recognize that we could have just as easily gotten a different set of realizations 6

  7. Inference 7

  8. Inference • We want to infer the characteristics of the true probability model from our one sample. 8

  9. The Law of Large Numbers • Sample mean converges to the population mean as sample size gets large • True for any probability density functions 9

  10. Sample Mean and Sample Size • Sample mean and sample variance 10

  11. The Central Limit Theorem • Sample mean (not samples) will be approximately normally distributed as a sample size 𝑛 → ∞ • More samples provide more confidence (or less uncertainty) • Note: true regardless of any distributions of population 11

  12. Uniform Distribution: 𝒚~𝑽 𝟏, 𝟐 12

  13. Sample Size 13

  14. Variance Gets Smaller as 𝒏 is Larger • Seems approximately Gaussian distributed • Numerically demonstrate that sample mean follows Gaussian distribution 14

  15. Multivariate Statistics • 𝑛 observations 𝑦 𝑗 , 𝑦 2 , ⋯ , 𝑦 𝑛 15

  16. Correlation of Two Random Variables • Correlation – Strength of linear relationship between two variables, 𝑦 and 𝑧 16

  17. Correlation of Two Random Variables • Assume 17

  18. Correlation Coefficient • +1 → close to a straight line • −1 → close to a straight line • Indicate how close to a linear line, but • No information on slope • Does not tell anything about causality 18

  19. Correlation Coefficient 19

  20. Correlation Coefficient 20

  21. Correlation Coefficient Plot • Plots correlation coefficients among pairs of variables • http://rpsychologist.com/d3/correlation/ 21

  22. Covariance Matrix 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend