10-701 Machine Learning Recita2on 2: Probability / Sta2s2cs - PowerPoint PPT Presentation

10-‑701 ¡Machine ¡Learning ¡ Recita2on ¡2: ¡Probability ¡/ ¡Sta2s2cs ¡ Dougal ¡Sutherland ¡ 9/24/2013 ¡

Sample ¡spaces ¡ • Start ¡with ¡a ¡ sample ¡space ¡Ω ¡for ¡an ¡“experiment” ¡ – The ¡set ¡of ¡all ¡possible ¡outcomes ¡ – Flipping ¡a ¡coin: ¡{H, ¡T} ¡ – Flipping ¡a ¡coin ¡three ¡2mes: ¡ {HHH, ¡HHT, ¡HTH, ¡HTT, ¡THH, ¡THT, ¡TTH, ¡TTT} ¡ – A ¡person’s ¡age: ¡the ¡posi2ve ¡integers ¡ – A ¡person’s ¡height: ¡the ¡posi2ve ¡reals ¡

Events ¡ • An ¡ event ¡E ¡is ¡a ¡subset ¡of ¡Ω ¡ – Can ¡do ¡normal ¡set ¡opera2ons ¡ • Don’t ¡need ¡to ¡allow ¡for ¡any ¡arbitrary ¡subset ¡ • Just ¡need ¡a ¡ 𝜏 -‑algebra, ¡which ¡is ¡a ¡set ¡ B ¡ that ¡ – Contains ¡ ∅ ¡ – Is ¡closed ¡under ¡complements ¡ – Is ¡closed ¡under ¡countable ¡unions ¡ • In ¡prac2ce, ¡usually ¡don’t ¡need ¡to ¡worry ¡about ¡it ¡

Probability ¡axioms ¡ A ¡probability ¡func2on ¡ P ¡is ¡a ¡func2on ¡from ¡events ¡ in ¡our ¡ 𝜏 -‑algebra ¡to ¡real ¡numbers ¡sa2sfying: ¡ 1. ¡Nonnega2vity: ¡ ¡ P ( E ) ≥ 0 2. ¡Unit ¡measure: ¡ P ( Ω ) = 1 ∞ X 3. ¡σ-‑addi2vity: ¡ ¡ P ( E 1 ∪ E 2 ∪ . . . ) = P ( E i ) i =1 if ¡the ¡ E i ¡are ¡mutually ¡exclusive: ¡ E i ¡∩ ¡E j ¡ = ¡ ∅ ¡for ¡i ¡≠ ¡j ¡

Consequences ¡of ¡axioms ¡ P ( A c ) = 1 − P ( A ) (ax. ¡2) ¡ P ( A ∪ A c ) = P ( Ω ) = 1 since ¡ A, ¡ A c ¡are ¡disjoint ¡(ax. ¡3) ¡ = P ( A ) + P ( A c ) since ¡ P ( A c ) ¡= ¡1 ¡– ¡P(A) ¡≥ ¡0 ¡ P ( A ) ≤ 1 P ( ∅ ) = 0 P ( ∅ ) = P ( Ω c ) = 1 − P ( Ω ) = 1 − 1 = 0

Possible ¡probability ¡func2ons ¡ • For ¡Ω ¡ ¡= ¡{H, ¡T}: ¡ – We ¡know ¡P( ∅ ) ¡= ¡0, ¡P(Ω) ¡= ¡1 ¡always ¡ – { ∅ , ¡Ω} ¡is ¡a ¡valid ¡ 𝜏 -‑algebra ¡so ¡we ¡could ¡be ¡done ¡ – But ¡we ¡can ¡probably ¡observe ¡H ¡vs ¡T: ¡ • P(H) ¡= ¡ p ¡ can ¡be ¡anything ¡in ¡[0, ¡1] ¡ • P(T) ¡= ¡P({H} c ) ¡= ¡1 ¡– ¡ p ¡

More ¡consequences ¡of ¡axioms ¡ P ( B ∩ A c ) = P ( B ) − P ( A ∩ B ) P ( A ∪ B ) = P ( A ) + P ( B ) − P ( A ∩ B ) Corollary: ¡ P ( A ∩ B ) ≥ P ( A ) + P ( B ) − 1 A ⊆ B implies P ( A ) ≤ P ( B )

Interpreta2on ¡of ¡probabili2es ¡ • Frequen2st: ¡ – long-‑run ¡por2on ¡of ¡2mes ¡the ¡event ¡happens ¡ n 1 X P ( E ) = lim ( X i ∈ E ) n n →∞ i =1 • Bayesian: ¡ – Quan2fica2on ¡of ¡beliefs ¡ – Can ¡derive ¡axioms ¡from ¡a ¡certain ¡set ¡of ¡“common ¡ sense” ¡descrip2ons ¡of ¡how ¡beliefs ¡should ¡work ¡ (Cox’s ¡Theorem) ¡

Defining ¡probabili2es ¡ • Don’t ¡want ¡to ¡have ¡to ¡check ¡the ¡axioms ¡for ¡every ¡ probability ¡func2on ¡ • One ¡general ¡palern: ¡ – Let ¡{ ω 1 , ¡ ω 2 , ¡…} ¡be ¡a ¡countable ¡set ¡of ¡“atomic ¡events” ¡ (mutually ¡exclusive, ¡cover ¡all ¡of ¡ Ω ) ¡ – Define ¡corresponding ¡nonnega2ve ¡p i ¡that ¡sum ¡to ¡1 ¡ – Then ¡a ¡valid ¡probability ¡func2on ¡is ¡ X P ( E ) = p i i : ω i ∈ E

Condi2oning ¡ • Basically, ¡just ¡change ¡the ¡sample ¡space ¡ – P(E 1 ¡| ¡E 2 ) ¡changes ¡P(E 1 ) ¡with ¡sample ¡space ¡Ω ¡to ¡ have ¡sample ¡space ¡E 2 : ¡P(E 1 ∩E 2 ) ¡/ ¡P(E 2 ). ¡ • If ¡E 2 ¡is ¡empty ¡this ¡isn’t ¡well-‑defined. ¡ • If ¡E 1 ∩E 2 = ¡ ∅ , ¡P(E 1 |E 2 ) ¡= ¡0. ¡ E 2 ¡ E 1 ¡ Ω ¡

Independence ¡ • Events ¡ A ¡and ¡ B ¡are ¡independent ¡( A ⟂ B ) ¡if ¡ ¡ ¡P( A∩B ) ¡= ¡P( A ) ¡P( B ) ¡ ¡ • Equivalently, ¡P( A | B ) ¡= ¡P( A ), ¡P( B | A ) ¡= ¡P( B ) ¡ ¡ • If ¡ A ⟂ B , ¡then ¡ A c ⟂ B , ¡ A ⟂ B c , ¡ A c ⟂ B c ¡ ¡

Independence ¡of ¡several ¡events ¡ • The ¡last ¡example ¡had ¡ A ⟂ B , ¡ B ⟂ Z , ¡ A ⟂ Z ¡ (pairwise ¡independent), ¡but ¡we ¡don’t ¡have ¡A, ¡ B, ¡Z ¡all ¡“mutually ¡independent.” ¡ ¡ • P( A ∩ B ∩ Z ) ¡= ¡P( A ) ¡P( B ) ¡P( Z ) ¡also ¡isn’t ¡enough. ¡ ¡ • The ¡defini2on: ¡for ¡any ¡subcollec2on ¡ i 1 , ¡…, ¡ i k , ¡ 0 1 k k \ Y � � P A i j P A i j A = @ j =1 j =1

Random ¡variables ¡ • So ¡far ¡we’ve ¡been ¡talking ¡only ¡about ¡events ¡ • Usually ¡we ¡work ¡with ¡random ¡variables ¡ • Technically: ¡a ¡func2on ¡on ¡the ¡sample ¡space ¡ – Whether ¡a ¡coin ¡flip ¡was ¡heads: ¡ X ( ω ) ¡= ¡1 ¡if ¡ ω ¡= ¡H, ¡0 ¡if ¡ ω ¡= ¡T ¡ – Number ¡of ¡heads ¡in ¡a ¡sequence: ¡ a ¡func2on ¡from ¡Ω ¡= ¡{H, ¡T} n ¡to ¡{0, ¡1, ¡…, ¡ n } ¡ • Normally, ¡func2on ¡is ¡into ¡R d ¡or ¡Z d ¡ – Though ¡it ¡ can ¡be ¡anything ¡

Probability ¡mass ¡func2on ¡ • Discrete ¡ (not ¡“discreet”) ¡ RVs: ¡domain ¡is ¡a ¡countable ¡ subset ¡of ¡the ¡reals ¡ – e.g. ¡ X ¡= ¡number ¡of ¡heads ¡in ¡a ¡sequence ¡of ¡coin ¡flips ¡ • Naturally ¡defines ¡atomic ¡events ¡for ¡each ¡value ¡ – e.g. ¡{ X ¡= ¡0}, ¡{ X ¡= ¡1}, ¡…, ¡{ X ¡= ¡ n } ¡ • Probability ¡mass ¡func2on: ¡func2on ¡from ¡values ¡ to ¡probability ¡of ¡that ¡value ¡(basically ¡a ¡table) ¡ – e.g. ¡P X ( k ) ¡= ¡P({ X ¡= ¡ k }) ¡ • Nonnega2ve, ¡sums ¡to ¡1 ¡

Jointly ¡distributed ¡random ¡variables ¡ • If ¡ X ¡and ¡ Y ¡have ¡a ¡joint ¡distribu2on, ¡then ¡ they’re ¡components ¡of ¡the ¡random ¡vector ¡ concat( X , ¡ Y ) ¡ • Joint ¡PMF ¡is ¡just ¡a ¡mul2dimensional ¡table ¡ • Marginal ¡of ¡ X ¡is ¡the ¡distribu2on ¡of ¡ X ¡ignoring ¡ Y ¡ X P ( X ) = P ( X, Y ) Y

Condi2oning, ¡independence ¡of ¡RVs ¡ • Condi2oning, ¡independence ¡for ¡RVs ¡are ¡basically ¡ the ¡same ¡as ¡for ¡events: ¡ – P( A | B ) ¡= ¡P( A , ¡ B ) ¡/ ¡P( B ) ¡ • but ¡now ¡talking ¡about ¡ funcDons ¡rather ¡than ¡scalars ¡ – P( A | B ) ¡= ¡P( B | A ) ¡P( A ) ¡/ ¡P( B ) ¡ – A ⟂ B ¡if ¡P( A , ¡ B ) ¡= ¡P( A ) ¡P( B ) ¡ • also ¡as ¡func2ons, ¡i.e. ¡true ¡for ¡any ¡value ¡for ¡ A ¡and ¡ B ¡ • i.i.d.: ¡“independent ¡and ¡iden2cally ¡distributed” ¡

Cumula2ve ¡distribu2on ¡func2ons ¡ • The ¡cdf ¡is ¡F X ( x ) ¡= ¡P( X ¡≤ ¡ x ) ¡ • Useful ¡for ¡a ¡lot ¡of ¡theore2cal ¡things: ¡ ✓ ◆ – e.g. ¡ max = P (( X 1 ≤ x ) ∩ · · · ∩ ( X n ≤ x )) P 1 ≤ i ≤ n X i ≤ x = P ( X 1 ≤ x ) · · · P ( X n ≤ x ) = ( P ( X 1 ≤ x )) n F ( −∞ ) = 0 F ¡is ¡nondecreasing ¡ F ( ∞ ) = 1

CDFs ¡for ¡con2nuous ¡RVs ¡ • Can’t ¡do ¡a ¡mass ¡func2on: ¡P( X = x ) ¡= ¡0 ¡for ¡any ¡ x ¡ • S2ll ¡can ¡do ¡F X ( x ) ¡= ¡P( X ¡≤ ¡ x ) ¡the ¡same ¡way ¡ • F ¡is ¡con2nuous ¡if ¡a ¡con2nuous ¡RV; ¡ ¡ right-‑con2nuous ¡if ¡mixed ¡ • Joint ¡CDF: ¡ ¡P( X ¡≤ ¡ x , ¡ Y ¡≤ ¡ y ) ¡

Probability ¡density ¡func2ons ¡ Z x • Deriva2ve ¡of ¡the ¡CDF: ¡ P ( X ≤ x ) = f ( x ) dx • Nonnega2ve, ¡but ¡can ¡be ¡> ¡1 ¡ −∞ • Integrates ¡to ¡1 ¡

10-701 Machine Learning Recita2on 2: Probability / Sta2s2cs - PowerPoint PPT Presentation

10-701 Machine Learning Recita2on 2: Probability / Sta2s2cs Dougal Sutherland 9/24/2013 Sample spaces Start with a sample space for an

Machine Learning Machine Learning 10 10- -701/15 701/15- -781, Fall 2006 781, Fall 2006

701 HARRISON Planning Commission Hearing April 30th, 2020 701 HARRISON PROJECT SITE ASSESSOR'S

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Counting and Probability Whats to come? Counting and Probability Whats to come?

10-701 Machine Learning (Spring 2012) Principal Component Analysis Yang Xu This note is partly

9.1 Overview 9 Deep Learning Alexander Smola Introduction to Machine Learning 10-701

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Recitation First recitation tomorrow 56:30 here Linear algebra Geoff Gordon10-701

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

A technique for computing minors of orthogonal ( 0 , 1 ) matrices and an application to the

Graphical Models Graphical Models Review of probability theory Review of probability theory

Announcements Announcements For Monday read Becker sections 1 4-1 8 For Monday, read Becker,

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Quantum codes from generalized quadrangles Petr Lison ek Simon Fraser University Burnaby, BC,

Projected entangled-pair states for chiral topological phases Hong-Hao Tu (MPI for Quantum

Advanced Mathematical Methods Part II Statistics Probability Mel Slater

Uncertainty Russell & Norvig Chapter 13 http://toonut.com/wp-content/uploads/2011/12/69wp.jpg

10-701 Machine Learning Recita2on 2: Probability / Sta2s2cs - PowerPoint PPT Presentation

10-701 Machine Learning Recita2on 2: Probability / Sta2s2cs Dougal Sutherland 9/24/2013 Sample spaces Start with a sample space for an

Machine Learning Machine Learning 10 10- -701/15 701/15- -781, Fall 2006 781, Fall 2006

701 HARRISON Planning Commission Hearing April 30th, 2020 701 HARRISON PROJECT SITE ASSESSOR'S

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Counting and Probability Whats to come? Counting and Probability Whats to come?

10-701 Machine Learning (Spring 2012) Principal Component Analysis Yang Xu This note is partly

9.1 Overview 9 Deep Learning Alexander Smola Introduction to Machine Learning 10-701

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Recitation First recitation tomorrow 56:30 here Linear algebra Geoff Gordon10-701

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

A technique for computing minors of orthogonal ( 0 , 1 ) matrices and an application to the

Graphical Models Graphical Models Review of probability theory Review of probability theory

Announcements Announcements For Monday read Becker sections 1 4-1 8 For Monday, read Becker,

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Quantum codes from generalized quadrangles Petr Lison ek Simon Fraser University Burnaby, BC,

Projected entangled-pair states for chiral topological phases Hong-Hao Tu (MPI for Quantum

Advanced Mathematical Methods Part II Statistics Probability Mel Slater

Uncertainty Russell &amp; Norvig Chapter 13 http://toonut.com/wp-content/uploads/2011/12/69wp.jpg

Uncertainty Russell & Norvig Chapter 13 http://toonut.com/wp-content/uploads/2011/12/69wp.jpg