Statistical Machine Learning Lecture 03: Statistics Refresher - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 03: Statistics Refresher Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 1 / 64

Today’s Objectives Make you remember your sweetest high school dreams: statistics & probabilities. This topic is harder than most of remaining chapters, but you will need it to continue! Covered Topics: Random Variables: discrete & continuous Distributions: discrete & continuous Expected values and moments Joint distributions, conditional distributions, independence K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 2 / 64

Outline 1. Random Variables and Common Distributions Random Variables Discrete Distributions Continuous Distributions 2. Basic Rules of Probability 3. Expectations, Variance and Moments 4. Exponential Family 5. Information and Entropy 6. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 3 / 64

1. Random Variables and Common Distributions Outline 1. Random Variables and Common Distributions Random Variables Discrete Distributions Continuous Distributions 2. Basic Rules of Probability 3. Expectations, Variance and Moments 4. Exponential Family 5. Information and Entropy 6. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 4 / 64

1. Random Variables and Common Distributions : Random Variables Random Variables What is a random variable? Is a random number determined by chance More formally, drawn according to a probability distribution Typical random variables in statistical learning: input data, output data, noise What is a probability distribution? Describes the probability (density) that the random variable will be equal to a certain value. The probability distribution can be given by the physics of an experiment (e.g., throwing dice) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 5 / 64

1. Random Variables and Common Distributions : Random Variables Random Variables Important concept: The data generating model E.g., what is the data generating model for: i) throwing dice, ii) regression, iii) classification, iv) visual perception? Problem: On which time scale is a distribution observed? K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 6 / 64

1. Random Variables and Common Distributions : Random Variables Uniform Distribution All data is equally probable within a bounded region R p ( x ) = 1 R The uniform distribution plays an important role in entropy methods and information theory. K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 7 / 64

1. Random Variables and Common Distributions : Discrete Distributions Discrete Distributions The random variables take on discrete values E.g, when throwing a dice, the possible values are (countably finite set): x i ∈ { 1 , 2 , 3 , 4 , 5 , 6 } E.g., the number of sand grains at the beach (countably infinite set): x i ∈ N K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 8 / 64

1. Random Variables and Common Distributions : Discrete Distributions Discrete Distributions The probabilities sum to 1 � p ( x i ) = 1 i Discrete distributions are particularly important in classification and decision making A discrete distribution is described by a probability mass function (or frequency function), which is a normalized histogram K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 9 / 64

1. Random Variables and Common Distributions : Discrete Distributions Bernoulli Distribution A Bernoulli random variable only takes on two values, for example 0 and 1 ∈ { 0 , 1 } x p ( x = 1 | µ ) = µ µ x ( 1 − µ ) 1 − x Bern(x| µ ) = E [ x ] = µ var [ x ] µ ( 1 − µ ) = The only parameter of a Bernoulli distribution is µ , i.e., it is completely defined using only this parameter K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 10 / 64

1. Random Variables and Common Distributions : Discrete Distributions Bernoulli Distribution Bernoulli distributions are often modeled with sigmoidal nonlinearites in statistical learning K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 11 / 64

1. Random Variables and Common Distributions : Discrete Distributions Binomial Distribution Binomial variables are a sequence of N repeated Bernoulli variables One interpretation is “what is the probability of getting m ∈ N heads in N trials?” � N � µ m ( 1 − µ ) N − m Bin ( m | N , µ ) = m N � E [ m ] = m Bin ( m | N , µ ) = N µ m = 0 N � ( m − E [ m ]) 2 Bin ( m | N , µ ) = N µ ( 1 − µ ) var [ m ] = m = 0 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 12 / 64

1. Random Variables and Common Distributions : Discrete Distributions Binomial Distribution The Binomial distribution is completely defined with N - the number of samples - and µ - the probability that one sample is equal to 1 Binomial variables are important for example in density estimation: “What is the probability that k out of n data points fall into region R ?” K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 13 / 64

1. Random Variables and Common Distributions : Discrete Distributions Binomial Distribution Bin ( m | 10 , 0 . 25 ) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 14 / 64

1. Random Variables and Common Distributions : Discrete Distributions Multinoulli Distribution Multinoulli variables, also called Categorical variables in some literature, are a generalization of binomial variables to multiple outputs (e.g., multiple classes) 1-of- K coding scheme (also called one-hot encoding) x = ( 0 , 0 , 1 , 0 , 0 , 0 ) ⊺ K K � � µ x k p ( x | µ ) = ∀ k : µ k ≥ 0 and µ k = 1 k k = 1 k = 1 � E [ x | µ ] = p ( x | µ ) x = ( µ 1 , . . . , µ K ) ⊺ x K � � p ( x | µ ) = u k = 1 x k = 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 15 / 64

1. Random Variables and Common Distributions : Discrete Distributions Multinomial Distribution N independent trials can result in one of K types of outcome What is the probability that in N trials, the frequency of the K classes is m 1 , m 2 , . . . , m K K � � N � µ m k Mult ( m 1 , m 2 , . . . , m k | µ , N ) = k m 1 , m 2 , . . . , m K k = 1 E [ m k ] = N µ k var [ m k ] = N µ k ( 1 − µ k ) � � cov = − N µ j µ k m j m k K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 16 / 64

1. Random Variables and Common Distributions : Discrete Distributions Multinomial Distribution The multinomial distribution play an important role in multi-class classification ( N = 1) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 17 / 64

1. Random Variables and Common Distributions : Discrete Distributions Poisson Distribution The Poisson distribution is the binomial distribution where the number of trials N goes to infinity, and the probability of success on each trial, µ , goes to zero, such that N µ = λ is a constant p ( m | λ ) = λ m m ! e − λ Where the m is the number of “successes” For example, Poisson distributions are an important model for t he firing characteristics of biological neurons. They are also used as an approximation to binomial variables with small p K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 18 / 64

1. Random Variables and Common Distributions : Discrete Distributions Poisson Distribution Example: What is the probability of firing of a Purkinje neuron in the cerebellum in a 10ms time interval? We know that the average firing of these neurons is about 40Hz, λ = 40Hz × 0 . 01s Note that this approximation only work if the number of spike is low in the given time interval K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 19 / 64

1. Random Variables and Common Distributions : Continuous Distributions Continuous Distributions The random variables take on continuous values Continuous distributions are discrete distributions where the number of discrete values goes to infinity, while the probability of each value goes to zero A continuous distribution is described by a probability density function, which integrates to 1 � + ∞ p ( x ) dx = 1 −∞ Continuous distributions are particularly important in regression and unsupervised learning A lot of Machine Learning is centered around how to better model a density function K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 20 / 64

1. Random Variables and Common Distributions : Continuous Distributions Example of a probability density function p ( x ) � b P ( a < x < b ) = p ( x ) dx a K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 21 / 64

Statistical Machine Learning Lecture 03: Statistics Refresher - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 03: Statistics Refresher Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters Statistical Machine Learning Summer Term 2020 1 / 64 Todays Objectives Make you

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Perspectives on Nuclear Physics Input into High-Energy Cosmic Ray Interactions A.B. Balantekin

Outline DM811 Fall 2009 Heuristics for Combinatorial Optimization 1. Introduction Lecture

Machine Learning Lecture 2 - Bayesian Learning: Binomial and Dirichlet Distributions Devdatt

The story of the film so far... X , Y independent random variables and Z = X + Y : f Z = f X f

ACMS 20340 Statistics for Life Sciences Chapter 12: Discrete Probability Distributions What

MAS2602: Computing for Statistics Newcastle University lee.fawcett@ncl.ac.uk Semester 1, 2018/19

Categorical Data Analysis Cohen Chapters 19 & 20 For EDUC/PSY 6600 1 Creativity involves

61A Lecture 17 Friday, October 10 Announcements Homework 5 is due Wednesday 10/15 @ 11:59pm

Statistical Machine Learning Lecture 03: Statistics Refresher - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 03: Statistics Refresher Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters Statistical Machine Learning Summer Term 2020 1 / 64 Todays Objectives Make you

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Perspectives on Nuclear Physics Input into High-Energy Cosmic Ray Interactions A.B. Balantekin

Outline DM811 Fall 2009 Heuristics for Combinatorial Optimization 1. Introduction Lecture

Machine Learning Lecture 2 - Bayesian Learning: Binomial and Dirichlet Distributions Devdatt

The story of the film so far... X , Y independent random variables and Z = X + Y : f Z = f X f

ACMS 20340 Statistics for Life Sciences Chapter 12: Discrete Probability Distributions What

MAS2602: Computing for Statistics Newcastle University lee.fawcett@ncl.ac.uk Semester 1, 2018/19

Categorical Data Analysis Cohen Chapters 19 &amp; 20 For EDUC/PSY 6600 1 Creativity involves

61A Lecture 17 Friday, October 10 Announcements Homework 5 is due Wednesday 10/15 @ 11:59pm

Categorical Data Analysis Cohen Chapters 19 & 20 For EDUC/PSY 6600 1 Creativity involves