statistical machine learning
play

Statistical Machine Learning Lecture 03: Statistics Refresher - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 03: Statistics Refresher Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters Statistical Machine Learning Summer Term 2020 1 / 64 Todays Objectives Make you


  1. Statistical Machine Learning Lecture 03: Statistics Refresher Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 1 / 64

  2. Today’s Objectives Make you remember your sweetest high school dreams: statistics & probabilities. This topic is harder than most of remaining chapters, but you will need it to continue! Covered Topics: Random Variables: discrete & continuous Distributions: discrete & continuous Expected values and moments Joint distributions, conditional distributions, independence K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 2 / 64

  3. Outline 1. Random Variables and Common Distributions Random Variables Discrete Distributions Continuous Distributions 2. Basic Rules of Probability 3. Expectations, Variance and Moments 4. Exponential Family 5. Information and Entropy 6. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 3 / 64

  4. 1. Random Variables and Common Distributions Outline 1. Random Variables and Common Distributions Random Variables Discrete Distributions Continuous Distributions 2. Basic Rules of Probability 3. Expectations, Variance and Moments 4. Exponential Family 5. Information and Entropy 6. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 4 / 64

  5. 1. Random Variables and Common Distributions : Random Variables Random Variables What is a random variable? Is a random number determined by chance More formally, drawn according to a probability distribution Typical random variables in statistical learning: input data, output data, noise What is a probability distribution? Describes the probability (density) that the random variable will be equal to a certain value. The probability distribution can be given by the physics of an experiment (e.g., throwing dice) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 5 / 64

  6. 1. Random Variables and Common Distributions : Random Variables Random Variables Important concept: The data generating model E.g., what is the data generating model for: i) throwing dice, ii) regression, iii) classification, iv) visual perception? Problem: On which time scale is a distribution observed? K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 6 / 64

  7. 1. Random Variables and Common Distributions : Random Variables Uniform Distribution All data is equally probable within a bounded region R p ( x ) = 1 R The uniform distribution plays an important role in entropy methods and information theory. K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 7 / 64

  8. 1. Random Variables and Common Distributions : Discrete Distributions Discrete Distributions The random variables take on discrete values E.g, when throwing a dice, the possible values are (countably finite set): x i ∈ { 1 , 2 , 3 , 4 , 5 , 6 } E.g., the number of sand grains at the beach (countably infinite set): x i ∈ N K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 8 / 64

  9. 1. Random Variables and Common Distributions : Discrete Distributions Discrete Distributions The probabilities sum to 1 � p ( x i ) = 1 i Discrete distributions are particularly important in classification and decision making A discrete distribution is described by a probability mass function (or frequency function), which is a normalized histogram K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 9 / 64

  10. 1. Random Variables and Common Distributions : Discrete Distributions Bernoulli Distribution A Bernoulli random variable only takes on two values, for example 0 and 1 ∈ { 0 , 1 } x p ( x = 1 | µ ) = µ µ x ( 1 − µ ) 1 − x Bern(x| µ ) = E [ x ] = µ var [ x ] µ ( 1 − µ ) = The only parameter of a Bernoulli distribution is µ , i.e., it is completely defined using only this parameter K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 10 / 64

  11. 1. Random Variables and Common Distributions : Discrete Distributions Bernoulli Distribution Bernoulli distributions are often modeled with sigmoidal nonlinearites in statistical learning K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 11 / 64

  12. 1. Random Variables and Common Distributions : Discrete Distributions Binomial Distribution Binomial variables are a sequence of N repeated Bernoulli variables One interpretation is “what is the probability of getting m ∈ N heads in N trials?” � N � µ m ( 1 − µ ) N − m Bin ( m | N , µ ) = m N � E [ m ] = m Bin ( m | N , µ ) = N µ m = 0 N � ( m − E [ m ]) 2 Bin ( m | N , µ ) = N µ ( 1 − µ ) var [ m ] = m = 0 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 12 / 64

  13. 1. Random Variables and Common Distributions : Discrete Distributions Binomial Distribution The Binomial distribution is completely defined with N - the number of samples - and µ - the probability that one sample is equal to 1 Binomial variables are important for example in density estimation: “What is the probability that k out of n data points fall into region R ?” K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 13 / 64

  14. 1. Random Variables and Common Distributions : Discrete Distributions Binomial Distribution Bin ( m | 10 , 0 . 25 ) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 14 / 64

  15. 1. Random Variables and Common Distributions : Discrete Distributions Multinoulli Distribution Multinoulli variables, also called Categorical variables in some literature, are a generalization of binomial variables to multiple outputs (e.g., multiple classes) 1-of- K coding scheme (also called one-hot encoding) x = ( 0 , 0 , 1 , 0 , 0 , 0 ) ⊺ K K � � µ x k p ( x | µ ) = ∀ k : µ k ≥ 0 and µ k = 1 k k = 1 k = 1 � E [ x | µ ] = p ( x | µ ) x = ( µ 1 , . . . , µ K ) ⊺ x K � � p ( x | µ ) = u k = 1 x k = 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 15 / 64

  16. 1. Random Variables and Common Distributions : Discrete Distributions Multinomial Distribution N independent trials can result in one of K types of outcome What is the probability that in N trials, the frequency of the K classes is m 1 , m 2 , . . . , m K K � � N � µ m k Mult ( m 1 , m 2 , . . . , m k | µ , N ) = k m 1 , m 2 , . . . , m K k = 1 E [ m k ] = N µ k var [ m k ] = N µ k ( 1 − µ k ) � � cov = − N µ j µ k m j m k K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 16 / 64

  17. 1. Random Variables and Common Distributions : Discrete Distributions Multinomial Distribution The multinomial distribution play an important role in multi-class classification ( N = 1) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 17 / 64

  18. 1. Random Variables and Common Distributions : Discrete Distributions Poisson Distribution The Poisson distribution is the binomial distribution where the number of trials N goes to infinity, and the probability of success on each trial, µ , goes to zero, such that N µ = λ is a constant p ( m | λ ) = λ m m ! e − λ Where the m is the number of “successes” For example, Poisson distributions are an important model for t he firing characteristics of biological neurons. They are also used as an approximation to binomial variables with small p K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 18 / 64

  19. 1. Random Variables and Common Distributions : Discrete Distributions Poisson Distribution Example: What is the probability of firing of a Purkinje neuron in the cerebellum in a 10ms time interval? We know that the average firing of these neurons is about 40Hz, λ = 40Hz × 0 . 01s Note that this approximation only work if the number of spike is low in the given time interval K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 19 / 64

  20. 1. Random Variables and Common Distributions : Continuous Distributions Continuous Distributions The random variables take on continuous values Continuous distributions are discrete distributions where the number of discrete values goes to infinity, while the probability of each value goes to zero A continuous distribution is described by a probability density function, which integrates to 1 � + ∞ p ( x ) dx = 1 −∞ Continuous distributions are particularly important in regression and unsupervised learning A lot of Machine Learning is centered around how to better model a density function K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 20 / 64

  21. 1. Random Variables and Common Distributions : Continuous Distributions Example of a probability density function p ( x ) � b P ( a < x < b ) = p ( x ) dx a K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 21 / 64

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend