Welcome to the course! Statistical Thinking in Python II You will - - PowerPoint PPT Presentation

welcome to the course
SMART_READER_LITE
LIVE PREVIEW

Welcome to the course! Statistical Thinking in Python II You will - - PowerPoint PPT Presentation

STATISTICAL THINKING IN PYTHON II Welcome to the course! Statistical Thinking in Python II You will be able to Estimate parameters ! a t a Compute confidence intervals d l a Perform linear regressions e r h t


slide-1
SLIDE 1

STATISTICAL THINKING IN PYTHON II

Welcome to the course!

slide-2
SLIDE 2

Statistical Thinking in Python II

You will be able to…

  • Estimate parameters
  • Compute confidence intervals
  • Perform linear regressions
  • Test hypotheses

w i t h r e a l d a t a !

slide-3
SLIDE 3

Statistical Thinking in Python II

slide-4
SLIDE 4

Statistical Thinking in Python II

We use hacker statistics

  • Literally simulate probability
  • Broadly applicable with a few principles
slide-5
SLIDE 5

Statistical Thinking in Python II

Statistical analysis of the beak of the finch

Source: John Gould, public domain Geospiza fortis Geospiza scandens

slide-6
SLIDE 6

STATISTICAL THINKING IN PYTHON II

Let's start thinking statistically!

slide-7
SLIDE 7

STATISTICAL THINKING IN PYTHON II

Optimal parameters

slide-8
SLIDE 8

Statistical Thinking in Python II

Histogram of Michelson's measurements

Data: Michelson, 1880

slide-9
SLIDE 9

Statistical Thinking in Python II

Data: Michelson, 1880

CDF of Michelson's measurements

slide-10
SLIDE 10

Statistical Thinking in Python II

Checking Normality of Michelson data

In [1]: import numpy as np In [2]: import matplotlib.pyplot as plt In [3]: mean = np.mean(michelson_speed_of_light) In [4]: std = np.std(michelson_speed_of_light) In [5]: samples = np.random.normal(mean, std, size=10000)

slide-11
SLIDE 11

Statistical Thinking in Python II

Data: Michelson, 1880

CDF of Michelson's measurements

slide-12
SLIDE 12

Statistical Thinking in Python II

Data: Michelson, 1880

CDF with bad estimate of st. dev.

slide-13
SLIDE 13

Statistical Thinking in Python II

Data: Michelson, 1880

CDF with bad estimate of mean

slide-14
SLIDE 14

Statistical Thinking in Python II

Optimal parameters

  • Parameter values that bring the model in

closest agreement with the data

slide-15
SLIDE 15

Statistical Thinking in Python II

Mass of MA large mouth bass

CDF for "optimal" parameters

  • f a bad model

Source: Mass. Dept. of Environmental Protection

slide-16
SLIDE 16

Statistical Thinking in Python II

Packages to do statistical inference

scipy.stats statsmodels hacker stats with numpy

Knife image: D-M Commons, CC BY-SA 3.0

slide-17
SLIDE 17

STATISTICAL THINKING IN PYTHON II

Let’s practice!

slide-18
SLIDE 18

STATISTICAL THINKING IN PYTHON II

Linear regression by least squares

slide-19
SLIDE 19

Statistical Thinking in Python II

2008 US swing state election results

Data retrieved from Data.gov (hps://www.data.gov/)

slide-20
SLIDE 20

Statistical Thinking in Python II

2008 US swing state election results

Data retrieved from Data.gov (hps://www.data.gov/)

slope intercept

slide-21
SLIDE 21

Statistical Thinking in Python II

2008 US swing state election results

Data retrieved from Data.gov (hps://www.data.gov/)

slide-22
SLIDE 22

Statistical Thinking in Python II

Residuals

Data retrieved from Data.gov (hps://www.data.gov/)

residual

slide-23
SLIDE 23

Statistical Thinking in Python II

Least squares

  • The process of finding the parameters for which

the sum of the squares of the residuals is minimal

slide-24
SLIDE 24

Statistical Thinking in Python II

In [1]: slope, intercept = np.polyfit(total_votes, ...: dem_share, 1) In [2]: slope Out[2]: 4.0370717009465555e-05 In [3]: intercept Out[3]: 40.113911968641744

Least squares with np.polyfit()

slide-25
SLIDE 25

STATISTICAL THINKING IN PYTHON II

Let’s practice!

slide-26
SLIDE 26

STATISTICAL THINKING IN PYTHON II

The importance of EDA: Anscombe's quartet

slide-27
SLIDE 27

Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973

slide-28
SLIDE 28

Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973

slide-29
SLIDE 29

Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973

slide-30
SLIDE 30

Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973

slide-31
SLIDE 31

Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973

slide-32
SLIDE 32

Statistical Thinking in Python II

Look before you leap!

  • Do graphical EDA first
slide-33
SLIDE 33

Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973

slide-34
SLIDE 34

Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973

slide-35
SLIDE 35

Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973

slide-36
SLIDE 36

Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973

slide-37
SLIDE 37

STATISTICAL THINKING IN PYTHON II

Let’s practice!