Introduc)on*to*Probability*Theory*1* Clayton*Greenberg* - - PowerPoint PPT Presentation

introduc on to probability theory 1
SMART_READER_LITE
LIVE PREVIEW

Introduc)on*to*Probability*Theory*1* Clayton*Greenberg* - - PowerPoint PPT Presentation

Introduc)on*to*Probability*Theory*1* Clayton*Greenberg* CoLi,*CS,*MMCI,*LSV,*CRC*1102*(IDeaL)*B4* * October*22,*2014* Slide*1*of*24* Key*concepts* rules*of*probability* variance* exponents* entropy* logarithms*


slide-1
SLIDE 1

Introduc)on*to*Probability*Theory*1*

Clayton*Greenberg* CoLi,*CS,*MMCI,*LSV,*CRC*1102*(IDeaL)*B4* * October*22,*2014*

Slide*1*of*24*

slide-2
SLIDE 2

Key*concepts*

  • rules*of*probability*
  • exponents*
  • logarithms*
  • surprisal*
  • chain*rule*
  • Bayes’*rule*
  • random*variables*
  • expecta)on*
  • variance*
  • entropy*
  • mutual*informa)on*
  • rela)ve*entropy*
  • machine*learning*tasks*
  • supervision*
  • normal*distribu)ons*
  • linear*regression*

Slide*2*of*24*

slide-3
SLIDE 3

Schedule*

Slide*3*of*24*

22.10.2014 * *Calculate*the*probability*of*a*given*parse* 23.10.2014 * *Solve*the*medical*test*Bayes’*Rule*problem* 27.10.2014 * *Create*a*code*for*simplified*Polynesian* 29.10.2014 * *Iden)fy*types*of*machine*learning*problems* 31.10.2014 * *Find*a*regression*line*for*2D*data* *

slide-4
SLIDE 4

Textbook*recommenda)ons*

Christopher*D.*Manning*and*Hinrich*Schütze.** Founda'ons)of)sta's'cal)natural)language) processing.*MIT*press,*1999.* * Dan*Jurafsky*and*James*H.*Mar)n.*Speech)&) language)processing.**2nd*edi)on.**Pren)ce* Hall,*2008.* * Steven*Bird,*Ewan*Klein,*and*Edward*Loper.* Natural)language)processing)with)Python.** O'Reilly*Media,*Inc.,*2009.*

Slide*4*of*24*

slide-5
SLIDE 5

Probabilis)c*outcomes*

Slide*5*of*24*

* * * Ω*=*{H,*T}* * * * * Ω*=*Z** * * * Ω*=*{1,*2,*3,*4,*5,*6}* * * * * Ω*=*Vocabulary*

slide-6
SLIDE 6

Probabilis)c*events*

  • An*event*A*is*a*set*of*outcomes.*
  • A*has*“occurred”*or*“taken*place”*if*one*of*its*

member*outcomes*is*observed.*

  • Ω*is*the*certain*event.*
  • *is*the*impossible*event.*
  • There*are*2|Ω|*events*for*a*|Ω|%outcome%process.*

Slide*6*of*24*

slide-7
SLIDE 7

Probabilis)c*events*example*

Process:**roll*a*fair,*threegsided*die* Ω*=*{1,*2,*3}* Events*=*P(Ω)*=** {*,*{1},*{2},*{3},*{1,2},*{2,3},*{1,3},*Ω*}* * Event*A:**“roll*a*2”:**{2}* Event*B:**“at*least*2”:*{2,3}* Event*C:**“not*a*2”:**{1,3}*

Slide*7*of*24*

slide-8
SLIDE 8

Three*defini)ons*of*probability*

Formal:* * * * Simple*case:* *p(A)*=*|A|*/*|Ω|* Informal:** * *probability*=*what*you*want*/*what*is*possible* * Probability*is*a*property*of*events.*

Slide*8*of*24*

slide-9
SLIDE 9

Experimental*values*

To*experimentally*es)mate*probability:*

  • 1. Run*the*process*many*)mes,*T.*
  • 2. Count*how*many*)mes*the*event*A*occurs,*N.*
  • 3. p(A)*≈*N*/*T*=*p̂(A)*

Suppose*you*flip*a*coin*1000*)mes** and*get*heads*651*)mes.*** Then,*p̂(H)*=*0.651.* If*the*coin*is*fair,*p*=*0.5.*

Slide*9*of*24*

slide-10
SLIDE 10

Axioms*of*probability*

  • 1. *probabili)es*are*nongnega)ve*real*numbers*
  • 2. *p(Ω)*=*1*
  • 3. *A*∩*B*=**implies*p(A**B)*=*p(A)*+*p(B)*

* From*these*you*can*derive:*

  • *p()*=*0*
  • *A**B**implies*that*p(A)*≤*p(B)*

*

Slide*10*of*24*

slide-11
SLIDE 11

Two*events*together*

Joint)probability:**p(A*and*B)*or*p(A,*B)* Independent:**p(A,*B)*=*p(A)*p(B)* * P(A*or*B)*=*p(A)*+*p(B)*–*p(A,*B)** mutually)exclusive:**p(A,*B)*=*0*

Slide*11*of*24*

slide-12
SLIDE 12

Marginal*probability*

  • p(A)*=*sum*of*probabili)es*of*mutually*exclusive*
  • utcomes*in*A.*
  • In*math,**

Slide*12*of*24*

slide-13
SLIDE 13

Logarithms*review*

log2(8)*=*3*or* *23*=*8* Logarithms*are*exponents* * Surprisal(A)*=**g*log(p(A))* usually,*the*base*is*2*

*

Surprisal*of*heads*on*a*fair*coin:**glog2(1/2)*=*1*bit*

Slide*13*of*24*

slide-14
SLIDE 14

Proper)es*of*exponents*

  • x0*=*1*
  • 1x*=*1**
  • 0x*=*0,*for*all*x*≠*0*
  • 00*is*undefined**
  • xgy*=*1*/*xy*
  • x½*=*√x**
  • xa*xb*=*xa+b*
  • xa*/*xb*=*xagb*
  • (xa)b*=*xa*b*

Slide*14*of*24*

slide-15
SLIDE 15

Proper)es*of*logarithms*

  • logx(1)*=*0*
  • log(x)*is*undefined,*

*for*x*≤*0*

  • logy(x)*=*log(x)/log(y)*
  • glog(x)*=*log(1/x)*
  • blog

b (x)*=*x*

  • log(a*b)*=*log(a)*+*log(b)*
  • log(a/b)*=*log(a)*–*log(b)*
  • log(ab)*=*b*log(a)*

Slide*15*of*24*

slide-16
SLIDE 16

Review*of*grammar*symbols*

S*→*NP*VP* NP*→*Det*N* NP*→*NP*PP* PP*→*P*NP* VP*→*V*NP* VP*→*VP*PP*

Slide*16*of*24*

slide-17
SLIDE 17

Part*of*speech*tag*reference*

Slide*17*of*24*

slide-18
SLIDE 18

Structural*ambiguity*1*

Slide*18*of*24*

slide-19
SLIDE 19

Structural*ambiguity*2*

Slide*19*of*24*

slide-20
SLIDE 20

Deriva)on*

S*→*NP*VP* NP*→*Det*N* NP*→*NP*PP* PP*→*P*NP* VP*→*V*NP* VP*→*VP*PP*

Slide*20*of*24*

slide-21
SLIDE 21

Probability*of*gramma)cality*

S*→*NP*VP*(1.0)* NP*→*Det*N*(0.8)* VP*→*V*NP*(0.7)* NP*→*NP*PP*(0.2)* NP*→*Det*N*(0.8)* PP*→*P*NP*(1.0)* Product:**0.0896* S*→*NP*VP*(1.0)* NP*→*Det*N*(0.8)* VP*→*VP*PP*(0.3)* VP*→*V*NP*(0.7)* NP*→*Det*N*(0.8)* PP*→*P*NP*(1.0)* Product:**0.1344* *

Slide*21*of*24*

slide-22
SLIDE 22

A*classic*sentence*

The*lever*was*delivered.* The*lever*wriven*to*John*was*delivered.* The*lever*sent*to*John*was*delivered.* The*lever*sent*to*John*fell*on*the*floor.* The*lever*sent*to*John*fell.* * The*horse*raced*past*the*barn*fell.*

Slide*22*of*24*

slide-23
SLIDE 23

A*simple*(wrong)*grammar*

S*→*NP*VP*(1.0)* NP*→*Det*N’*(0.8)* NP*→*NP*PP*(0.2)* N’*→*N*(0.9)* N’*→*N*VP*(0.1)* PP*→*P*NP*(1.0)* VP*→*VP*PP*(0.4)* VP*→*V*(0.6)*

Slide*23*of*24*

slide-24
SLIDE 24

Exercises*

  • 1. Memorize:*
  • 1. probability*=*what*you*want*/*what*is*possible*
  • 2. “and”*=***()mes)*[if*independent]*
  • 3. “or”*=*+*(plus)*[if*mutually*exclusive]*
  • 4. logarithms*=*exponents*
  • 5. surprisal*=*the*nega)ve*logarithm*of*probability*

*

  • 2. Calculate*the*probability*of*the*parse*on*slide*23:*

*“The*horse*raced*past*the*barn*fell.”* *

Slide*24*of*24*