[PPT] - Non-Parametric Methods; Simulations March 6, 2020 Data Science PowerPoint Presentation

SLIDE 1

Non-Parametric Methods; Simulations

March 6, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter

SLIDE 2

Announcements

SLIDE 3

Today

Non-Parametric Methods
Simulations (example using Gaussian Mixture

Models)

SLIDE 4

Today

Non-Parametric Methods
Simulations (example using Gaussian Mixture

Models)

SLIDE 5

Parametric vs. Non- Parametric

cholesterol eucalyptus Given x, predict y

SLIDE 6

Parametric vs. Non- Parametric

y = mx + b + e

cholesterol eucalyptus Given x, predict y

SLIDE 7

Parametric vs. Non- Parametric

y = mx + b + e

cholesterol eucalyptus Given x, predict y

SLIDE 8

Clicker Question!

SLIDE 9

Parametric vs. Non- Parametric

cholesterol eucalyptus Given x, predict y

y = mx + b + e

Thoughts?

SLIDE 10

Parametric vs. Non- Parametric

cholesterol eucalyptus Given x, predict y

y = mx + b + e

Nearest Neighbors!

SLIDE 11

Parametric vs. Non- Parametric

cholesterol eucalyptus Given x, predict y

y = mx + b + e

Nearest Neighbors!

SLIDE 12

Parametric vs. Non- Parametric

cholesterol eucalyptus Given x, predict y

y = mx + b + e

Nearest Neighbors!

SLIDE 13

Parametric vs. Non- Parametric

cholesterol eucalyptus Given x, predict y

y = mx + b + e

Nearest Neighbors!

SLIDE 14

Clicker Question!

SLIDE 15

“Non-parametric” models: No assumptions about the number of parameters in

the model or the particular form of the model

Pros:
Can work well with small data
Or when you have very complex distributions and you aren’t sure what

assumptions can be made

Cons:
Size of model can increase with size of data
Slow to compute (randomized/interative processes)
Fewer assumptions -> weaker conclusions (higher p-values)

Non-Parametric Models

SLIDE 16

“Non-parametric” models: No assumptions about the number of parameters in

the model or the particular form of the model

Pros:
Can work well with small data
Or when you have very complex distributions and you aren’t sure what

assumptions can be made

Cons:
Size of model can increase with size of data
Slow to compute (randomized/interative processes)
Fewer assumptions -> weaker conclusions (higher p-values)

Non-Parametric Models

SLIDE 17

“Non-parametric” models: No assumptions about the number of parameters in

the model or the particular form of the model

Pros:
Can work well with small data
Or when you have very complex distributions and you aren’t sure what

assumptions can be made

Cons:
Size of model can increase with size of data
Slow to compute (randomized/interative processes)
Fewer assumptions -> weaker conclusions (higher p-values)

Non-Parametric Models

SLIDE 18

“Non-parametric” models: No assumptions about the number of parameters in

the model or the particular form of the model

Pros:
Can work well with small data
Or when you have very complex distributions and you aren’t sure what

assumptions can be made

Cons:
Size of model can increase with size of data
Slow to compute (randomized/iterative processes)
Fewer assumptions -> weaker conclusions (higher p-values)

Non-Parametric Models

SLIDE 19

Law of Large Numbers

If you perform the same experiment

a large number times, the average will converge to the expected value

Assumes that errors are “random”

and uncorrelated, so will balance

ut over time

https://en.wikipedia.org/wiki/Law_of_large_numbers

¯ Xn = 1 n(X1 + · · · + Xn) ¯ Xn → µ as n → ∞

<latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit><latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit><latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit><latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit>

SLIDE 20

Central Limit Theorem

Given
Not only does a
But the distribution approaches a normal distribution

n · · · ¯ Xn → µ as n → ∞

<latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit><latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit><latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit><latexit sha1_base64="8/FnER2VbVMPSWj2wlp3l3vNu4=">ACQXicbVBNS8QwE39XOtX1aOX4CIowtKoBdh0YvHFVwtbJeSZlMNpmlJpmIp/Wte/AfevHvxoIhXL2Z3C34+CPN4b4bJvCgTXIPrPloTk1PTM7ONOXt+YXFp2VlZPdprijr0lSkyo+IZoJL1gUOgvmZYiSJBLuIro+H/sUNU5qn8gyKjPUTcil5zCkBI4WOH0RElX4VSnyIg1gRWnpVKastP/TwDg4GKWhT/VBu4yCwv7oDSHGQ5KayWygx0bjCtcplDEXoN2WOwL+S7yaNFGNTug8mGU0T5gEKojWPc/NoF8SBZwKVtlBrlG6DW5ZD1DJUmY7pejBCq8aZQBjlNlngQ8Ur9PlCTRukgi05kQuNK/vaH4n9fLIT7ol1xmOTBJx4viXGBz5zBOPOCKURCFIYQqbv6K6RUxMYIJ3TYheL9P/kvOd1ue2/JO95rtozqOBlpHG2gLeWgftdEJ6qAuougOPaEX9GrdW8/Wm/U+bp2w6pk19APWxycnx65E</latexit>

X1 . . . Xn

<latexit sha1_base64="MpaT2GRJkWqsUcX2omOpSp0NopQ=">AB9XicbVBNS8NAFHzxs9avqkcvi0XwVBIR9Fj04rGCbQNtDJvtpl262YTdF6WU/g8vHhTx6n/x5r9x2+agrQMLw8wb3tuJMikMu63s7K6tr6xWdoqb+/s7u1XDg5bJs01402WylT7ETVcCsWbKFByP9OcJpHk7Wh4M/Xbj1wbkap7HGU8SGhfiVgwilZ68EOPdHspGkL8UIWVqltzZyDLxCtIFQo0wsqXDbM84QqZpMZ0PDfDYEw1Cib5pNzNDc8oG9I+71iqaMJNMJ5dPSGnVumRONX2KSQz9XdiTBNjRklkJxOKA7PoTcX/vE6O8VUwFirLkSs2XxTnkmBKphWQntCcoRxZQpkW9lbCBlRThraosi3BW/zyMmd1zy35t1dVOvXR0lOIYTOAMPLqEOt9CAJjDQ8Ayv8OY8OS/Ou/MxH1xiswR/IHz+QMELpGJ</latexit><latexit sha1_base64="MpaT2GRJkWqsUcX2omOpSp0NopQ=">AB9XicbVBNS8NAFHzxs9avqkcvi0XwVBIR9Fj04rGCbQNtDJvtpl262YTdF6WU/g8vHhTx6n/x5r9x2+agrQMLw8wb3tuJMikMu63s7K6tr6xWdoqb+/s7u1XDg5bJs01402WylT7ETVcCsWbKFByP9OcJpHk7Wh4M/Xbj1wbkap7HGU8SGhfiVgwilZ68EOPdHspGkL8UIWVqltzZyDLxCtIFQo0wsqXDbM84QqZpMZ0PDfDYEw1Cib5pNzNDc8oG9I+71iqaMJNMJ5dPSGnVumRONX2KSQz9XdiTBNjRklkJxOKA7PoTcX/vE6O8VUwFirLkSs2XxTnkmBKphWQntCcoRxZQpkW9lbCBlRThraosi3BW/zyMmd1zy35t1dVOvXR0lOIYTOAMPLqEOt9CAJjDQ8Ayv8OY8OS/Ou/MxH1xiswR/IHz+QMELpGJ</latexit><latexit sha1_base64="MpaT2GRJkWqsUcX2omOpSp0NopQ=">AB9XicbVBNS8NAFHzxs9avqkcvi0XwVBIR9Fj04rGCbQNtDJvtpl262YTdF6WU/g8vHhTx6n/x5r9x2+agrQMLw8wb3tuJMikMu63s7K6tr6xWdoqb+/s7u1XDg5bJs01402WylT7ETVcCsWbKFByP9OcJpHk7Wh4M/Xbj1wbkap7HGU8SGhfiVgwilZ68EOPdHspGkL8UIWVqltzZyDLxCtIFQo0wsqXDbM84QqZpMZ0PDfDYEw1Cib5pNzNDc8oG9I+71iqaMJNMJ5dPSGnVumRONX2KSQz9XdiTBNjRklkJxOKA7PoTcX/vE6O8VUwFirLkSs2XxTnkmBKphWQntCcoRxZQpkW9lbCBlRThraosi3BW/zyMmd1zy35t1dVOvXR0lOIYTOAMPLqEOt9CAJjDQ8Ayv8OY8OS/Ou/MxH1xiswR/IHz+QMELpGJ</latexit><latexit sha1_base64="MpaT2GRJkWqsUcX2omOpSp0NopQ=">AB9XicbVBNS8NAFHzxs9avqkcvi0XwVBIR9Fj04rGCbQNtDJvtpl262YTdF6WU/g8vHhTx6n/x5r9x2+agrQMLw8wb3tuJMikMu63s7K6tr6xWdoqb+/s7u1XDg5bJs01402WylT7ETVcCsWbKFByP9OcJpHk7Wh4M/Xbj1wbkap7HGU8SGhfiVgwilZ68EOPdHspGkL8UIWVqltzZyDLxCtIFQo0wsqXDbM84QqZpMZ0PDfDYEw1Cib5pNzNDc8oG9I+71iqaMJNMJ5dPSGnVumRONX2KSQz9XdiTBNjRklkJxOKA7PoTcX/vE6O8VUwFirLkSs2XxTnkmBKphWQntCcoRxZQpkW9lbCBlRThraosi3BW/zyMmd1zy35t1dVOvXR0lOIYTOAMPLqEOt9CAJjDQ8Ayv8OY8OS/Ou/MxH1xiswR/IHz+QMELpGJ</latexit>

SLIDE 21

Central Limit Theorem

0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12

1.00

c d b d b a d c b c b d

SLIDE 22

Central Limit Theorem

0.75 1.5 2.25 3 1 2 3 4 5 6 7 8 9 10 11 12

3.00 2.00

c d b d b a d c b c b d

SLIDE 23

Central Limit Theorem

1.5 3 4.5 6 1 2 3 4 5 6 7 8 9 10 11 12

1.00 1.00 6.00 2.00

c d b d b a d c b c b d

SLIDE 24

Central Limit Theorem

10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12

10.00 20.00 40.00 20.00 10.00

c d b d b a d c b c b d

SLIDE 25

Central Limit Theorem

10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12

10.00 20.00 40.00 20.00 10.00

c d b d b a d c b c b d

I.e. test statistics are

ften normally

distributed…

SLIDE 26

Central Limit Theorem

10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12

10.00 20.00 40.00 20.00 10.00

c d b d b a d c b c b d

Can apply statistical methods designed for normal distributions even when underlying distribution is not normal

SLIDE 27

10 20 30 40 10 20 30 40 50 60 70 80 90100

Every year, I compute the mean grade in my class. I never change the material or my methods for evaluating because, lazy. Over the 439 years that I have been teaching this class, this has resulted in the below distribution. Which of these is mostly like the typical distribution on any given year?

10 20 30 40 10 20 30 40 50 60 70 80 90 100

(a)

10 20 30 40 10 20 30 40 50 60 70 80 90100

(b) (c) can’t say, could be either

Central Limit Theorem

SLIDE 28

10 20 30 40 10 20 30 40 50 60 70 80 90100

Every year, I compute the mean grade in my class. I never change the material or my methods for evaluating because, lazy. Over the 439 years that I have been teaching this class, this has resulted in the below distribution. Which of these is mostly like the typical distribution on any given year?

10 20 30 40 10 20 30 40 50 60 70 80 90 100

(a)

10 20 30 40 10 20 30 40 50 60 70 80 90100

(b) (c) can’t say, could be either

Central Limit Theorem: repeated measures of mean will be normally distributed, doesn’ t assume the population over which you are taking the mean is normally distributed.

Central Limit Theorem

SLIDE 29

Test for population means

http://www.censusscope.org/us/chart_age.html

Distribution of ages in the US Hypothesis: Mean age is 35.

t = ¯ x − µ0 p s

n

<latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit><latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit><latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit><latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit>

SLIDE 30

Clicker Question!

SLIDE 31

Test for population medians?

t = ¯ x − µ0 p s

n

<latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit><latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit><latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit><latexit sha1_base64="tSiAfVR5nlP4TnBOg7mPzwnC74=">ACF3icbVDLSgNBEJz1GeNr1aOXwSB4MeyKoBch6MVjBPOAbAizk9lkyOzsOtMrhmH/wou/4sWDIl715t84eRw0saChqOqmuytMBdfged/OwuLS8spqYa24vrG5te3u7NZ1kinKajQRiWqGRDPBJasB8GaqWIkDgVrhIOrkd+4Z0rzRN7CMGXtmPQkjzglYKWOWwZ8gYNIEWqCkCjzkONjHMRZx8tNoO8UmImpcyPzPO+4Ja/sjYHniT8lJTRFteN+Bd2EZjGTQAXRuV7KbQNUcCpYHkxyDRLCR2QHmtZKknMdNuM/8rxoVW6OEqULQl4rP6eMCTWehiHtjMm0Nez3kj8z2tlEJ23DZdpBkzSyaIoExgSPAoJd7liFMTQEkIVt7di2ic2B7BRFm0I/uzL86R+Uva9sn9zWqpcTuMoH10gI6Qj85QBV2jKqohih7RM3pFb86T8+K8Ox+T1gVnOrOH/sD5/AHqdqBy</latexit>

SLIDE 32

We still want to determine the probability of the test

statistic under the null hypothesis…

…but we don’t have an analytic solution, maybe

because

Theoretical distribution is unknown, complex, or

hard to write down

Assumptions about analytic solution are suspect

(e.g. sample size not large enough)

Non-Parametric Hypothesis Testing

SLIDE 33

We still want to determine the probability of the test

statistic under the null hypothesis…

…but we don’t have an analytic solution, maybe

because

Theoretical distribution is unknown, complex,
r hard to write down
Assumptions about analytic solution are suspect

(e.g. sample size not large enough)

Non-Parametric Hypothesis Testing

???

SLIDE 34

We still want to determine the probability of the test

statistic under the null hypothesis…

…but we don’t have an analytic solution, maybe

because

Theoretical distribution is unknown, complex,
r hard to write down
Assumptions about analytic solution are suspect

(e.g. sample size not large enough)

Non-Parametric Hypothesis Testing

median, 90th percentile, annotator agreement, model accuracy, whatever cool metric you made up that you care about.

SLIDE 35

Resample (with replacement) in order to

approximate the distribution of the test statistic

Compute the test statistic over each sample
Repeat some large number of times (say 10,000)
View distribution of computed test statistics

Bootstrapping

SLIDE 36

SLIDE 37

Resample (with replacement) in order to

approximate the distribution of the test statistic

Compute the test statistic over each sample
Repeat some large number of times (say 10,000)
View distribution of computed test statistics

Bootstrapping

SLIDE 38

Permutation Test

Ha: CS students sleep less than the rest of Brown students

SLIDE 39

Permutation Test

Ha: CS students sleep less than the rest of Brown students

SLIDE 40

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students

SLIDE 41

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.4 Brown Overall 7.2 7 5 8 6 6 7 7 7 8 7

SLIDE 42

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.4 Brown Overall 7.2 7 5 8 6 6 7 7 7 8 7 assuming these are samples from the same population

SLIDE 43

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 7.2 Brown Overall 6.4 7 5 8 6 6 7 7 7 8 7

SLIDE 44

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.8 Brown Overall 6.8 7 5 8 6 6 7 7 7 8 7

SLIDE 45

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.4 7 5 8 6 6

SLIDE 46

Permutation Test

H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.4 7 5 8 6 6

SLIDE 47

SLIDE 48

Today

Non-Parametric Methods
Simulations (example using Gaussian Mixture

Models)

SLIDE 49

Simulations

H0: I swear there are two types of TAs: nice ones and mean

nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all.

SLIDE 50

Simulations

H0: I swear there are two types of TAs: nice ones and mean

nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all.

???

SLIDE 51

Simulations

H0: I swear there are two types of TAs: nice ones and mean

nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all. if (TA is nice): student passes (grade of 90) else: student fails (grade of 60)

SLIDE 52

Clicker Question!

SLIDE 53

Simulations

H0: I swear there are two types of TAs: nice ones and mean

nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all.

SLIDE 54

Simulations

60% H0: I swear there are two types of TAs: nice ones and mean

nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all. p

SLIDE 55

Simulations

60% 90% H0: I swear there are two types of TAs: nice ones and mean

nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all. 1-p p

SLIDE 56

Simulations

60% 90% H0: I swear there are two types of TAs: nice ones and mean

nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all. 1-p p

Observed grades

SLIDE 57

Simulations

60% 90% H0: I swear there are two types of TAs: nice ones and mean

nes. If you get a mean one, you fail, otherwise you pass.

Your work doesn’t really factor in at all. 1-p p

Observed grades

SLIDE 58