probability and statistics
play

Probability*and*Statistics* ! for*Computer*Science** - PowerPoint PPT Presentation

Probability*and*Statistics* ! for*Computer*Science** "Sta&s&cal!thinking!will!one!day! be!as!necessary!for!efficient! ci&zenship!as!the!ability!to!read! and!write."!H.!G.!Wells ! Credit:!wikipedia!


  1. Probability*and*Statistics* � ! for*Computer*Science** "Sta&s&cal!thinking!will!one!day! be!as!necessary!for!efficient! ci&zenship!as!the!ability!to!read! and!write."!H.!G.!Wells ! Credit:!wikipedia! Hongye!Liu,!Teaching!Assistant!Prof,!CS361,!UIUC,!03.12.2020!

  2. Midterm1** * Grading done is . be published today will Grades * . it 's curved will Points given be * relatively harder than last semester . You 're welcome come to to * hour today office discuss to about it .

  3. stole MidTerm1 ' 9.724 ' ' 5 112.849 - r last semester 30 Std -25.413 median Frequency 20 = 124 mean =L 12.956 10 0 60 80 100 120 140 Score

  4. Last*time* � Review!of!variance,!sample!mean! � Sum!and!difference!between! variables!of!normal!distribu&ons! � Hypothesis!test!of!equality!of!two! sample!means! � ChiRsquare!test!

  5. Contents* � Review!of!sta&s&cal!inference! � Inferring!probability!model!from! data! � Maximum!likelihood!es&mate! 1 � Confidence!interval!for!MLE! � Bayesian!inference!

  6. Categories*of*Statistical* inference** � Sta&s&cal!inference!includes! � Drawing!conclusion!from!samples! � Assessing!the!significance!of!evidence! for!a!hypothesis! � Inferring!the!parameters!of! = probabilis&c!model!from!data!

  7. Contents* � Review!of!sta&s&cal!inference! � Inferring(probability(model(from( data( � Maximum!likelihood!es&mate! � Confidence!interval!for!MLE! � Bayesian!inference!

  8. Motivation:*binomial*example* � Suppose!we!have!a!coin!with!unknown! probability!of!coming!up!heads! � We!toss!it!N!&mes!and!observe!k!heads! - - � We!know!that!this!data!comes!from!a! pix-ks-fkjpkci.PT " binomial!distribu&on! K � What!is!your!best!es&mate!of!the!probability! head ? of!coming!up!heads?! pot getting µ = Is 15-5 Credit:!David!Varodayan!

  9. Motivation:*geometric*example* � Suppose!we!have!a!die!with!unknown! probability!of!coming!up!six! � We!roll!it!and!it!comes!up!six!for!the!first! &me!on!the!kth!roll! � We!know!that!this!data!comes!from!a! geometric!distribu&on! � What!is!your!best!es&mate!of!the!probability! of!coming!up!heads?! Credit:!David!Varodayan!

  10. Motivation:*Poisson*example* � Suppose!we!have!data!on!the!number!of!babies! born!each!hour!in!a!large!hospital! known 1( 2( N" hour( …! ④ #!of!babies! k 1 # k 2 # k N # …! ! T T T � !We!can!assume!the!data!comes!from!a!Poisson! K , distribu&on! kN � What!is!your!best!es&mate!of!the!intensity!λ?! Atm known Credit:!David!Varodayan!

  11. The*parameter*estimation*problem* � Suppose!we!have!a!dataset!that!we!know!comes!from! a!distribu&on!(ie.!Binomial,!Geometric,!or!Poisson,!etc.)! - T o � What!is!the!best!es&mate!of!the!parameters!( θ !or! θ s)! of!the!distribu&on?! � Examples:! � For!binomial!and!geometric!distribu&on,! θ( =! p !(probability!of! T success)! � For!Poisson!and!exponen&al!distribu&ons,! θ( =! λ !(intensity)! mu � For!normal!distribu&ons,! θ( could!be! μ !or! σ 2 .# n n

  12. Maximum*likelihood*estimation*(MLE)* � We!write!the!probability!of!seeing!the!data!D! mm given!parameter!θ!! ' up ! " P L ( θ ) = P ( D | θ ) T H - *oi o � The! likelihood(func,on !!!!!!!!!!is! not !a! L ( θ ) probability!distribu&on! Ky � The! maximum(likelihood(es,mate((MLE) !of! Pata θ!is!! o is ˆ θ = arg max L ( θ ) an tf Yammerer θ !

  13. Why*is* L (θ)*not*a*probability*distribution?* A.!!It!doesn’t!give!the!probability!of!all!the! ¥ possible!θ!values.!! B.!Don’t!know!whether!the!sum!or!integral!of!!!!!!!!!!! L ( θ ) for!all!possible!θ!values!is!one!or!not.!! 40 ) f # die I C.!Both.! a not O is random variable

  14. Why*is* L (θ)*not*a*probability*distribution?* A.!!It!doesn’t!give!the!probability!of!all!the! possible!θ!values.!! B.!Don’t!know!whether!the!sum!or!integral!of!!!!!!!!!!! L ( θ ) for!all!possible!θ!values!is!one!or!not.!! C.!Both.!

  15. Likelihood*function:*binomial*example* � Suppose!we!have!a!coin!with!unknown! probability!of!coming!up!heads! � We!toss!it! N !&mes!and!observe! k !heads! � We!know!that!this!data!comes!from!a!binomial! distribu&on! � What!is!the!likelihood!func&on!!!!!!!!!!!!!!!!!!!!!!!!!!!?! L ( θ ) = P ( D | θ ) !

  16. Likelihood*function:*binomial*example* � Suppose!we!have!a!coin!with!unknown! probability!of!coming!up!heads! � We!toss!it! N !&mes!and!observe! k !heads! � We!know!that!this!data!comes!from!a!binomial! distribu&on! � What!is!the!likelihood!func&on!!!!!!!!!!!!!!!!!!!!!!!!!!!?! L ( θ ) = P ( D | θ ) replace � N � ! θ k (1 − θ ) N − k L ( θ ) = with o P k

  17. MLE*derivation:*binomial*example* � N � θ k (1 − θ ) N − k L ( θ ) = k ˆ In!order!to!find:! θ = arg max L ( θ ) ! θ TI We!set:!! d L ( θ ) = 0 . d θ

  18. MLE*derivation:*binomial*example* � N � θ k (1 − θ ) N − k L ( θ ) = k

  19. MLE*derivation:*binomial*example* � N � " . tr ) ( c . fi θ k (1 − θ ) N − k L ( θ ) = k off . fit fits ) / - � N � d ( k θ k − 1 (1 − θ ) N − k − θ k ( N − k )(1 − θ ) N − k − 1 ) = 0 d θ L ( θ ) = k

  20. MLE*derivation:*binomial*example* � N � θ k (1 − θ ) N − k L ( θ ) = k � N � d ( k θ k − 1 (1 − θ ) N − k − θ k ( N − k )(1 − θ ) N − k − 1 ) = 0 d θ L ( θ ) = k k θ k − 1 (1 − θ ) N − k = θ k ( N − k )(1 − θ ) N − k − 1

  21. MLE*derivation:*binomial*example* � N � θ k (1 − θ ) N − k L ( θ ) = k � N � d ( k θ k − 1 (1 − θ ) N − k − θ k ( N − k )(1 − θ ) N − k − 1 ) = 0 d θ L ( θ ) = k k θ k − 1 (1 − θ ) N − k = θ k ( N − k )(1 − θ ) N − k − 1 # k − k θ = N θ − k θ

  22. MLE*derivation:*binomial*example* � N � O θ k (1 − θ ) N − k L ( θ ) = k maximized at E � N � d ( k θ k − 1 (1 − θ ) N − k − θ k ( N − k )(1 − θ ) N − k − 1 ) = 0 d θ L ( θ ) = k k θ k − 1 (1 − θ ) N − k = θ k ( N − k )(1 − θ ) N − k − 1 - s p O k − k θ = N θ − k θ ECK f- NP θ = k The(MLE(of(p( ˆ N - -

  23. Likelihood*function:*geometric*example* � Suppose!we!have!a!die!with!unknown!probability! of!coming!up!six! � We!roll!it!and!it!comes!up!six!for!the!first!&me!on! the!kth!roll! � We!know!that!this!data!comes!from!a!geometric! distribu&on! � What!is!the!likelihood!func&on!!!!!!!!!!!!!!!!!!!!!!!!!!!?! L ( θ ) = P ( D | θ ) Assume(θ(is(p .! !

  24. MLE*derivation:*geometric*example* is pot o L ( θ ) = (1 − θ ) k − 1 θ head . - -

  25. MLE*derivation:*geometric*example* f- e f u L ( θ ) = (1 − θ ) k − 1 θ fit . d d θ L ( θ ) = (1 − θ ) k − 1 − ( k − 1)(1 − θ ) k − 2 θ = 0 Titi ¥

  26. ⇒ MLE*derivation:*geometric*example* L ( θ ) = (1 − θ ) k − 1 θ d d θ L ( θ ) = (1 − θ ) k − 1 − ( k − 1)(1 − θ ) k − 2 θ = 0 / (1 − θ ) k − 1 = ( k − 1)(1 − θ ) k − 2 θ r = ( K - 1) O I - O p at sett 's head is - O - O = KO l ' = 'T o' - Ko - i

  27. MLE*derivation:*geometric*example* L ( θ ) = (1 − θ ) k − 1 θ d d θ L ( θ ) = (1 − θ ) k − 1 − ( k − 1)(1 − θ ) k − 2 θ = 0 (1 − θ ) k − 1 = ( k − 1)(1 − θ ) k − 2 θ 1 − θ = k θ − θ

  28. MLE*derivation:*geometric*example* L ( θ ) = (1 − θ ) k − 1 θ d d θ L ( θ ) = (1 − θ ) k − 1 − ( k − 1)(1 − θ ) k − 2 θ = 0 (1 − θ ) k − 1 = ( k − 1)(1 − θ ) k − 2 θ dis r - . Geometric Eckl = ¥ 1 − θ = k θ − θ θ = 1 The(MLE(of(p( ˆ = I 6 k

  29. MLE*with*data*from*IID*trials* � If!the!dataset!!!!!!!!!!!!!!!!!!comes!from!IID!trials! D = { x } " " " Xi C- D � L ( θ ) = P ( D | θ ) = P ( x i | θ ) TT - x i ∈ D � Each! x i !!is!one!observed!result!from!an!IID!trial! -

  30. Q:*MLE*with*data*from*IID*trials* � If!the!dataset!!!!!!!!!!!!!!!!!!comes!from!IID!trials! D = { x } � L ( θ ) = P ( D | θ ) = P ( x i | θ ) Xi GD x i ∈ D � Why!is!the!above!func&on!defined!by!the!product?! !A.!IID!samples!are!independent! !B.!Each!trial!has!iden&cal!probability!func&on! o !C.!Both.!

  31. Q:*MLE*with*data*from*IID*trials* � If!the!dataset!!!!!!!!!!!!!!!!!!comes!from!IID!trials! D = { x } � L ( θ ) = P ( D | θ ) = P ( x i | θ ) x i ∈ D � Why!is!the!above!func&on!defined!by!the!product?! !A.!IID!samples!are!independent! !B.!Each!trial!has!iden&cal!probability!func&on! !C.!Both.!

  32. MLE*with*data*from*IID*trials* � If!the!dataset!!!!!!!!!!!!!!!!!!comes!from!IID!trials! D = { x } � L ( θ ) = P ( D | θ ) = P ( x i | θ ) x i ∈ D � The!likelihood!func&on!is!hard!to!differen&ate!in! general,!except!for!the!binomial!and!geometric! cases.! � Clever!trick:!take!the!(natural)!log! -

  33. LogJlikelihood*function* � Since!log!is!a!strictly!increasing!func&on! - ˆ ! θ = arg max L ( θ ) = arg max logL ( θ ) . θ θ � So!we!can!aim!to!maximize!the! logClikelihood( func,on( - I � � logL ( θ ) = logP ( D | θ ) = log P ( x i | θ ) = logP ( x i | θ ) x i ∈ D x i ∈ D � The!logRlikelihood!func&on!is!usually!much!easier! to!differen&ate!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend