Non-Parametric Methods; Simulations March 6, 2020 Data Science - - PowerPoint PPT Presentation

non parametric methods simulations
SMART_READER_LITE
LIVE PREVIEW

Non-Parametric Methods; Simulations March 6, 2020 Data Science - - PowerPoint PPT Presentation

Non-Parametric Methods; Simulations March 6, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter Announcements Today Non-Parametric Methods Simulations (example using


slide-1
SLIDE 1

Non-Parametric Methods; Simulations

March 6, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter

slide-2
SLIDE 2

Announcements

slide-3
SLIDE 3

Today

  • Non-Parametric Methods
  • Simulations (example using Gaussian Mixture

Models)

slide-4
SLIDE 4

Today

  • Non-Parametric Methods
  • Simulations (example using Gaussian Mixture

Models)

slide-5
SLIDE 5

Parametric vs. Non- Parametric

cholesterol eucalyptus Given x, predict y

slide-6
SLIDE 6

Parametric vs. Non- Parametric

y = mx + b + e

cholesterol eucalyptus Given x, predict y

slide-7
SLIDE 7

Parametric vs. Non- Parametric

y = mx + b + e

cholesterol eucalyptus Given x, predict y

slide-8
SLIDE 8

Clicker Question!

slide-9
SLIDE 9

Parametric vs. Non- Parametric

cholesterol eucalyptus Given x, predict y

y = mx + b + e

Thoughts?

slide-10
SLIDE 10

Parametric vs. Non- Parametric

cholesterol eucalyptus Given x, predict y

y = mx + b + e

Nearest Neighbors!

slide-11
SLIDE 11

Parametric vs. Non- Parametric

cholesterol eucalyptus Given x, predict y

y = mx + b + e

Nearest Neighbors!

slide-12
SLIDE 12

Parametric vs. Non- Parametric

cholesterol eucalyptus Given x, predict y

y = mx + b + e

Nearest Neighbors!

slide-13
SLIDE 13

Parametric vs. Non- Parametric

cholesterol eucalyptus Given x, predict y

y = mx + b + e

Nearest Neighbors!

slide-14
SLIDE 14

Clicker Question!

slide-15
SLIDE 15
  • “Non-parametric” models: No assumptions about the number of parameters in

the model or the particular form of the model

  • Pros:
  • Can work well with small data
  • Or when you have very complex distributions and you aren’t sure what

assumptions can be made

  • Cons:
  • Size of model can increase with size of data
  • Slow to compute (randomized/interative processes)
  • Fewer assumptions -> weaker conclusions (higher p-values)

Non-Parametric Models

slide-16
SLIDE 16
  • “Non-parametric” models: No assumptions about the number of parameters in

the model or the particular form of the model

  • Pros:
  • Can work well with small data
  • Or when you have very complex distributions and you aren’t sure what

assumptions can be made

  • Cons:
  • Size of model can increase with size of data
  • Slow to compute (randomized/interative processes)
  • Fewer assumptions -> weaker conclusions (higher p-values)

Non-Parametric Models

slide-17
SLIDE 17
  • “Non-parametric” models: No assumptions about the number of parameters in

the model or the particular form of the model

  • Pros:
  • Can work well with small data
  • Or when you have very complex distributions and you aren’t sure what

assumptions can be made

  • Cons:
  • Size of model can increase with size of data
  • Slow to compute (randomized/interative processes)
  • Fewer assumptions -> weaker conclusions (higher p-values)

Non-Parametric Models

slide-18
SLIDE 18
  • “Non-parametric” models: No assumptions about the number of parameters in

the model or the particular form of the model

  • Pros:
  • Can work well with small data
  • Or when you have very complex distributions and you aren’t sure what

assumptions can be made

  • Cons:
  • Size of model can increase with size of data
  • Slow to compute (randomized/iterative processes)
  • Fewer assumptions -> weaker conclusions (higher p-values)

Non-Parametric Models

slide-19
SLIDE 19

Law of Large Numbers

  • If you perform the same experiment

a large number times, the average will converge to the expected value

  • Assumes that errors are “random”

and uncorrelated, so will balance

  • ut over time

https://en.wikipedia.org/wiki/Law_of_large_numbers

¯ Xn = 1 n(X1 + · · · + Xn) ¯ Xn → µ as n → ∞

<latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit><latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit><latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit><latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit>
slide-20
SLIDE 20

Central Limit Theorem

  • Given
  • Not only does a
  • But the distribution approaches a normal distribution

n · · · ¯ Xn → µ as n → ∞

<latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit><latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit><latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit><latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit>

X1 . . . Xn

<latexit sha1_base64="MpaT2GRJkWqsUcX2omOpSp0NopQ=">AB9XicbVBNS8NAFHzxs9avqkcvi0XwVBIR9Fj04rGCbQNtDJvtpl262YTdF6WU/g8vHhTx6n/x5r9x2+agrQMLw8wb3tuJMikMu63s7K6tr6xWdoqb+/s7u1XDg5bJs01402WylT7ETVcCsWbKFByP9OcJpHk7Wh4M/Xbj1wbkap7HGU8SGhfiVgwilZ68EOPdHspGkL8UIWVqltzZyDLxCtIFQo0wsqXDbM84QqZpMZ0PDfDYEw1Cib5pNzNDc8oG9I+71iqaMJNMJ5dPSGnVumRONX2KSQz9XdiTBNjRklkJxOKA7PoTcX/vE6O8VUwFirLkSs2XxTnkmBKphWQntCcoRxZQpkW9lbCBlRThraosi3BW/zyMmd1zy35t1dVOvXR0lOIYTOAMPLqEOt9CAJjDQ8Ayv8OY8OS/Ou/MxH1xiswR/IHz+QMELpGJ</latexit><latexit sha1_base64="MpaT2GRJkWqsUcX2omOpSp0NopQ=">AB9XicbVBNS8NAFHzxs9avqkcvi0XwVBIR9Fj04rGCbQNtDJvtpl262YTdF6WU/g8vHhTx6n/x5r9x2+agrQMLw8wb3tuJMikMu63s7K6tr6xWdoqb+/s7u1XDg5bJs01402WylT7ETVcCsWbKFByP9OcJpHk7Wh4M/Xbj1wbkap7HGU8SGhfiVgwilZ68EOPdHspGkL8UIWVqltzZyDLxCtIFQo0wsqXDbM84QqZpMZ0PDfDYEw1Cib5pNzNDc8oG9I+71iqaMJNMJ5dPSGnVumRONX2KSQz9XdiTBNjRklkJxOKA7PoTcX/vE6O8VUwFirLkSs2XxTnkmBKphWQntCcoRxZQpkW9lbCBlRThraosi3BW/zyMmd1zy35t1dVOvXR0lOIYTOAMPLqEOt9CAJjDQ8Ayv8OY8OS/Ou/MxH1xiswR/IHz+QMELpGJ</latexit><latexit sha1_base64="MpaT2GRJkWqsUcX2omOpSp0NopQ=">AB9XicbVBNS8NAFHzxs9avqkcvi0XwVBIR9Fj04rGCbQNtDJvtpl262YTdF6WU/g8vHhTx6n/x5r9x2+agrQMLw8wb3tuJMikMu63s7K6tr6xWdoqb+/s7u1XDg5bJs01402WylT7ETVcCsWbKFByP9OcJpHk7Wh4M/Xbj1wbkap7HGU8SGhfiVgwilZ68EOPdHspGkL8UIWVqltzZyDLxCtIFQo0wsqXDbM84QqZpMZ0PDfDYEw1Cib5pNzNDc8oG9I+71iqaMJNMJ5dPSGnVumRONX2KSQz9XdiTBNjRklkJxOKA7PoTcX/vE6O8VUwFirLkSs2XxTnkmBKphWQntCcoRxZQpkW9lbCBlRThraosi3BW/zyMmd1zy35t1dVOvXR0lOIYTOAMPLqEOt9CAJjDQ8Ayv8OY8OS/Ou/MxH1xiswR/IHz+QMELpGJ</latexit><latexit sha1_base64="MpaT2GRJkWqsUcX2omOpSp0NopQ=">AB9XicbVBNS8NAFHzxs9avqkcvi0XwVBIR9Fj04rGCbQNtDJvtpl262YTdF6WU/g8vHhTx6n/x5r9x2+agrQMLw8wb3tuJMikMu63s7K6tr6xWdoqb+/s7u1XDg5bJs01402WylT7ETVcCsWbKFByP9OcJpHk7Wh4M/Xbj1wbkap7HGU8SGhfiVgwilZ68EOPdHspGkL8UIWVqltzZyDLxCtIFQo0wsqXDbM84QqZpMZ0PDfDYEw1Cib5pNzNDc8oG9I+71iqaMJNMJ5dPSGnVumRONX2KSQz9XdiTBNjRklkJxOKA7PoTcX/vE6O8VUwFirLkSs2XxTnkmBKphWQntCcoRxZQpkW9lbCBlRThraosi3BW/zyMmd1zy35t1dVOvXR0lOIYTOAMPLqEOt9CAJjDQ8Ayv8OY8OS/Ou/MxH1xiswR/IHz+QMELpGJ</latexit>
slide-21
SLIDE 21

Central Limit Theorem

0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12

1.00

c d b d b a d c b c b d

slide-22
SLIDE 22

Central Limit Theorem

0.75 1.5 2.25 3 1 2 3 4 5 6 7 8 9 10 11 12

3.00 2.00

c d b d b a d c b c b d

slide-23
SLIDE 23

Central Limit Theorem

1.5 3 4.5 6 1 2 3 4 5 6 7 8 9 10 11 12

1.00 1.00 6.00 2.00

c d b d b a d c b c b d

slide-24
SLIDE 24

Central Limit Theorem

10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12

10.00 20.00 40.00 20.00 10.00

c d b d b a d c b c b d

slide-25
SLIDE 25

Central Limit Theorem

10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12

10.00 20.00 40.00 20.00 10.00

c d b d b a d c b c b d

I.e. test statistics are

  • ften normally

distributed…

slide-26
SLIDE 26

Central Limit Theorem

10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12

10.00 20.00 40.00 20.00 10.00

c d b d b a d c b c b d

Can apply statistical methods designed for normal distributions even when underlying distribution is not normal

slide-27
SLIDE 27

10 20 30 40 10 20 30 40 50 60 70 80 90100

Every year, I compute the mean grade in my class. I never change the material or my methods for evaluating because, lazy. Over the 439 years that I have been teaching this class, this has resulted in the below distribution. Which of these is mostly like the typical distribution on any given year?

10 20 30 40 10 20 30 40 50 60 70 80 90 100

(a)

10 20 30 40 10 20 30 40 50 60 70 80 90100

(b) (c) can’t say, could be either

Central Limit Theorem

slide-28
SLIDE 28

10 20 30 40 10 20 30 40 50 60 70 80 90100

Every year, I compute the mean grade in my class. I never change the material or my methods for evaluating because, lazy. Over the 439 years that I have been teaching this class, this has resulted in the below distribution. Which of these is mostly like the typical distribution on any given year?

10 20 30 40 10 20 30 40 50 60 70 80 90 100

(a)

10 20 30 40 10 20 30 40 50 60 70 80 90100

(b) (c) can’t say, could be either

Central Limit Theorem: repeated measures of mean will be normally distributed, doesn’ t assume the population over which you are taking the mean is normally distributed.

Central Limit Theorem

slide-29
SLIDE 29

Test for population means

http://www.censusscope.org/us/chart_age.html

Distribution of ages in the US Hypothesis: Mean age is 35.

t = ¯ x − µ0 p s

n

<latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit><latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit><latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit><latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit>
slide-30
SLIDE 30

Clicker Question!

slide-31
SLIDE 31

Test for population medians?

t = ¯ x − µ0 p s

n

<latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit><latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit><latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit><latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit>
slide-32
SLIDE 32
  • We still want to determine the probability of the test

statistic under the null hypothesis…

  • …but we don’t have an analytic solution, maybe

because

  • Theoretical distribution is unknown, complex, or

hard to write down

  • Assumptions about analytic solution are suspect

(e.g. sample size not large enough)

Non-Parametric Hypothesis Testing

slide-33
SLIDE 33
  • We still want to determine the probability of the test

statistic under the null hypothesis…

  • …but we don’t have an analytic solution, maybe

because

  • Theoretical distribution is unknown, complex,
  • r hard to write down
  • Assumptions about analytic solution are suspect

(e.g. sample size not large enough)

Non-Parametric Hypothesis Testing

???

slide-34
SLIDE 34
  • We still want to determine the probability of the test

statistic under the null hypothesis…

  • …but we don’t have an analytic solution, maybe

because

  • Theoretical distribution is unknown, complex,
  • r hard to write down
  • Assumptions about analytic solution are suspect

(e.g. sample size not large enough)

Non-Parametric Hypothesis Testing

median, 90th percentile, annotator agreement, model accuracy, whatever cool metric you made up that you care about.

slide-35
SLIDE 35
  • Resample (with replacement) in order to

approximate the distribution of the test statistic

  • Compute the test statistic over each sample
  • Repeat some large number of times (say 10,000)
  • View distribution of computed test statistics

Bootstrapping

slide-36
SLIDE 36
slide-37
SLIDE 37
  • Resample (with replacement) in order to

approximate the distribution of the test statistic

  • Compute the test statistic over each sample
  • Repeat some large number of times (say 10,000)
  • View distribution of computed test statistics

Bootstrapping

slide-38
SLIDE 38

Permutation Test

Ha: CS students sleep less than the rest of Brown students

slide-39
SLIDE 39

Permutation Test

Ha: CS students sleep less than the rest of Brown students

slide-40
SLIDE 40

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students

slide-41
SLIDE 41

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.4 Brown Overall 7.2 7 5 8 6 6 7 7 7 8 7

slide-42
SLIDE 42

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.4 Brown Overall 7.2 7 5 8 6 6 7 7 7 8 7 assuming these are samples from the same population

slide-43
SLIDE 43

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 7.2 Brown Overall 6.4 7 5 8 6 6 7 7 7 8 7

slide-44
SLIDE 44

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.8 Brown Overall 6.8 7 5 8 6 6 7 7 7 8 7

slide-45
SLIDE 45

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.4 7 5 8 6 6

slide-46
SLIDE 46

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.4 7 5 8 6 6

slide-47
SLIDE 47
slide-48
SLIDE 48

Today

  • Non-Parametric Methods
  • Simulations (example using Gaussian Mixture

Models)

slide-49
SLIDE 49

Simulations

H0: I swear there are two types of TAs: nice ones and mean

  • nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all.

slide-50
SLIDE 50

Simulations

H0: I swear there are two types of TAs: nice ones and mean

  • nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all.

???

slide-51
SLIDE 51

Simulations

H0: I swear there are two types of TAs: nice ones and mean

  • nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all. if (TA is nice): student passes (grade of 90) else: student fails (grade of 60)

slide-52
SLIDE 52

Clicker Question!

slide-53
SLIDE 53

Simulations

H0: I swear there are two types of TAs: nice ones and mean

  • nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all.

slide-54
SLIDE 54

Simulations

60% H0: I swear there are two types of TAs: nice ones and mean

  • nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all. p

slide-55
SLIDE 55

Simulations

60% 90% H0: I swear there are two types of TAs: nice ones and mean

  • nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all. 1-p p

slide-56
SLIDE 56

Simulations

60% 90% H0: I swear there are two types of TAs: nice ones and mean

  • nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all. 1-p p

Observed grades

slide-57
SLIDE 57

Simulations

60% 90% H0: I swear there are two types of TAs: nice ones and mean

  • nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all. 1-p p

Observed grades

slide-58
SLIDE 58

Clicker Question!