Non-Parametric Methods; Simulations
March 6, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter
Non-Parametric Methods; Simulations March 6, 2020 Data Science - - PowerPoint PPT Presentation
Non-Parametric Methods; Simulations March 6, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter Announcements Today Non-Parametric Methods Simulations (example using
March 6, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter
Models)
Models)
cholesterol eucalyptus Given x, predict y
cholesterol eucalyptus Given x, predict y
cholesterol eucalyptus Given x, predict y
cholesterol eucalyptus Given x, predict y
Thoughts?
cholesterol eucalyptus Given x, predict y
Nearest Neighbors!
cholesterol eucalyptus Given x, predict y
Nearest Neighbors!
cholesterol eucalyptus Given x, predict y
Nearest Neighbors!
cholesterol eucalyptus Given x, predict y
Nearest Neighbors!
the model or the particular form of the model
assumptions can be made
the model or the particular form of the model
assumptions can be made
the model or the particular form of the model
assumptions can be made
the model or the particular form of the model
assumptions can be made
a large number times, the average will converge to the expected value
and uncorrelated, so will balance
https://en.wikipedia.org/wiki/Law_of_large_numbers
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 11 12
1.00
0.75 1.5 2.25 3 1 2 3 4 5 6 7 8 9 10 11 12
3.00 2.00
1.5 3 4.5 6 1 2 3 4 5 6 7 8 9 10 11 12
1.00 1.00 6.00 2.00
10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12
10.00 20.00 40.00 20.00 10.00
10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12
10.00 20.00 40.00 20.00 10.00
I.e. test statistics are
distributed…
10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12
10.00 20.00 40.00 20.00 10.00
Can apply statistical methods designed for normal distributions even when underlying distribution is not normal
10 20 30 40 10 20 30 40 50 60 70 80 90100
Every year, I compute the mean grade in my class. I never change the material or my methods for evaluating because, lazy. Over the 439 years that I have been teaching this class, this has resulted in the below distribution. Which of these is mostly like the typical distribution on any given year?
10 20 30 40 10 20 30 40 50 60 70 80 90 100
10 20 30 40 10 20 30 40 50 60 70 80 90100
10 20 30 40 10 20 30 40 50 60 70 80 90100
Every year, I compute the mean grade in my class. I never change the material or my methods for evaluating because, lazy. Over the 439 years that I have been teaching this class, this has resulted in the below distribution. Which of these is mostly like the typical distribution on any given year?
10 20 30 40 10 20 30 40 50 60 70 80 90 100
10 20 30 40 10 20 30 40 50 60 70 80 90100
Central Limit Theorem: repeated measures of mean will be normally distributed, doesn’ t assume the population over which you are taking the mean is normally distributed.
http://www.censusscope.org/us/chart_age.html
Distribution of ages in the US Hypothesis: Mean age is 35.
statistic under the null hypothesis…
because
hard to write down
(e.g. sample size not large enough)
statistic under the null hypothesis…
because
(e.g. sample size not large enough)
statistic under the null hypothesis…
because
(e.g. sample size not large enough)
approximate the distribution of the test statistic
approximate the distribution of the test statistic
Ha: CS students sleep less than the rest of Brown students
Ha: CS students sleep less than the rest of Brown students
H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students
H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.4 Brown Overall 7.2 7 5 8 6 6 7 7 7 8 7
H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.4 Brown Overall 7.2 7 5 8 6 6 7 7 7 8 7 assuming these are samples from the same population
H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 7.2 Brown Overall 6.4 7 5 8 6 6 7 7 7 8 7
H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.8 Brown Overall 6.8 7 5 8 6 6 7 7 7 8 7
H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.4 7 5 8 6 6
H0: CS students sleep the same amount as everyone else Ha: CS students sleep less than the rest of Brown students CS Students 6.4 7 5 8 6 6
Models)
H0: I swear there are two types of TAs: nice ones and mean
Your work doesn’t really factor in at all.
H0: I swear there are two types of TAs: nice ones and mean
Your work doesn’t really factor in at all.
H0: I swear there are two types of TAs: nice ones and mean
Your work doesn’t really factor in at all. if (TA is nice): student passes (grade of 90) else: student fails (grade of 60)
H0: I swear there are two types of TAs: nice ones and mean
Your work doesn’t really factor in at all.
60% H0: I swear there are two types of TAs: nice ones and mean
Your work doesn’t really factor in at all. p
60% 90% H0: I swear there are two types of TAs: nice ones and mean
Your work doesn’t really factor in at all. 1-p p
60% 90% H0: I swear there are two types of TAs: nice ones and mean
Your work doesn’t really factor in at all. 1-p p
60% 90% H0: I swear there are two types of TAs: nice ones and mean
Your work doesn’t really factor in at all. 1-p p