U nit 1: I ntroduction to data L ecture 1: D ata collection , - - PowerPoint PPT Presentation

u nit 1 i ntroduction to data l ecture 1 d ata collection
SMART_READER_LITE
LIVE PREVIEW

U nit 1: I ntroduction to data L ecture 1: D ata collection , - - PowerPoint PPT Presentation

U nit 1: I ntroduction to data L ecture 1: D ata collection , observational studies , and experiments S tatistics 101 Nicole Dalzell Duke University May 13, 2015 Welcome to Stat 101! Welcome to Stat 101! 1 Introduction to Inference


slide-1
SLIDE 1

Unit 1: Introduction to data Lecture 1: Data collection, observational studies, and experiments Statistics 101

Nicole Dalzell Duke University May 13, 2015

slide-2
SLIDE 2

Welcome to Stat 101!

1

Welcome to Stat 101! Introduction to Inference Populations and Samples Sampling from a population Sampling bias Observational studies and experiments Cereal breakfast Observations and variables Principles of experimental design Recap

2

Syllabus & policies Logistics Goals and topics Details Support Policies Tips

3

To do

Sta 101 U1 - L1: Data coll., obs. studies, experiments N.Dalzell– Duke University

slide-3
SLIDE 3

Welcome to Stat 101!

Welcome!

Professor: Nicole Dalzell

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 2 / 59

slide-4
SLIDE 4

Welcome to Stat 101!

Welcome!

Professor: Nicole Dalzell Our class (put in some basic stats about major and count)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 2 / 59

slide-5
SLIDE 5

Welcome to Stat 101! Introduction to Inference

So...what is statistics?

Statistics is the art and science of learning from data.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 3 / 59

slide-6
SLIDE 6

Welcome to Stat 101! Introduction to Inference

So...what is statistics?

Statistics is the art and science of learning from data. Data are a set of measurements taken on a set of individual units

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 3 / 59

slide-7
SLIDE 7

Welcome to Stat 101! Introduction to Inference

So...what is statistics?

Statistics is the art and science of learning from data. Data are a set of measurements taken on a set of individual units Steps for Statistical Inference/ Scientific Inquiry

1

Identify a hypothesis or research question

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 3 / 59

slide-8
SLIDE 8

Welcome to Stat 101! Introduction to Inference

So...what is statistics?

Statistics is the art and science of learning from data. Data are a set of measurements taken on a set of individual units Steps for Statistical Inference/ Scientific Inquiry

1

Identify a hypothesis or research question

2

Collect relevant data

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 3 / 59

slide-9
SLIDE 9

Welcome to Stat 101! Introduction to Inference

So...what is statistics?

Statistics is the art and science of learning from data. Data are a set of measurements taken on a set of individual units Steps for Statistical Inference/ Scientific Inquiry

1

Identify a hypothesis or research question

2

Collect relevant data

3

Analyze the data

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 3 / 59

slide-10
SLIDE 10

Welcome to Stat 101! Introduction to Inference

So...what is statistics?

Statistics is the art and science of learning from data. Data are a set of measurements taken on a set of individual units Steps for Statistical Inference/ Scientific Inquiry

1

Identify a hypothesis or research question

2

Collect relevant data

3

Analyze the data

4

Form a conclusion

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 3 / 59

slide-11
SLIDE 11

Welcome to Stat 101! Introduction to Inference

So...what is statistics?

Statistics is the art and science of learning from data. Data are a set of measurements taken on a set of individual units Steps for Statistical Inference/ Scientific Inquiry

1

Identify a hypothesis or research question

2

Collect relevant data

3

Analyze the data

4

Form a conclusion

5

Communicate the Results

6

Present your data

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 3 / 59

slide-12
SLIDE 12

Welcome to Stat 101! Introduction to Inference

Step 1 : Identify a Hypothesis or Research Question

A well formed hypothesis will clearly identify a population and associated parameters of interest.

Population: group of individuals or subjects to whom we can make inference. Parameters: “True” values of characteristics in the population we want to study.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 4 / 59

slide-13
SLIDE 13

Welcome to Stat 101! Introduction to Inference

Step 1 : Identify a Hypothesis or Research Question

A well formed hypothesis will clearly identify a population and associated parameters of interest.

Population: group of individuals or subjects to whom we can make inference. Parameters: “True” values of characteristics in the population we want to study.

How many babies born in 2013 have first names that begin with the letter ”j”?

Population ? Parameter ?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 4 / 59

slide-14
SLIDE 14

Welcome to Stat 101! Introduction to Inference

Step 2: Collect the data

Each year the Social Security Administration collects and releases data on the how many babies are given a certain name. They released these data for years 1880 to 2013 for each gender. For privacy reasons they restrict the list of names to those with at least 5 occurrences. We often store and present such data in data sets , comprised of variables measured on individual cases.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 5 / 59

slide-15
SLIDE 15

Welcome to Stat 101! Introduction to Inference

Data Sets

data set or data matrix

variable

↓ type price · · · weight

1 small 15.9

· · ·

2705 2 midsize 33.9

· · ·

3560

← observation . . . . . . . . . . . . . . .

54 midsize 26.7

· · ·

3245

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 6 / 59

slide-16
SLIDE 16

Welcome to Stat 101! Introduction to Inference

Baby Names Data Set

Besides looking at the frequency of first initials, what else could we learn from this data set?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 7 / 59

slide-17
SLIDE 17

Welcome to Stat 101! Introduction to Inference

Visualize the Data: Rank Table

Top Baby Names in 2012 Rank Male Female 1 Jacob Sophia 2 Mason Emma 3 Ethan Isabella 4 Noah Olivia 5 William Ava 6 Liam Emily 7 Michael Abigail 8 Jayden Mia 9 Alexander Madison 10 Aiden Elizabeth

http://www.ssa.gov/oact/babynames Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 8 / 59

slide-18
SLIDE 18

Welcome to Stat 101! Introduction to Inference

Visualize the Data: Rank Table

Top Baby Names in 2013 Rank Male Female 1 Noah Sophia 2 Liam Emma 3 Jacob Olivia 4 Mason Isabella 5 William Ava 6 Ethan Mia 7 Michael Emily 8 Alexander Abigail 9 Jayden Madison 10 Daniel Elizabeth

http://www.ssa.gov/oact/babynames Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 9 / 59

slide-19
SLIDE 19

Welcome to Stat 101! Introduction to Inference

Visualize the Data: Time Dependencies

How has the popularity of a name changed over time? http: //www.babynamewizard.com/voyager#prefix=&sw=both&exact=false

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 10 / 59

slide-20
SLIDE 20

Welcome to Stat 101! Introduction to Inference

Visualize the Data: Time Dependencies

http://www.babynamewizard.com Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 11 / 59

slide-21
SLIDE 21

Welcome to Stat 101! Introduction to Inference

Visualize the Data: Time Dependencies

http://www.babynamewizard.com Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 11 / 59

slide-22
SLIDE 22

Welcome to Stat 101! Introduction to Inference

What about the first initials?

1

Obtain data from SS website: name, gender, frequency.

d <- read.csv("yob2012.txt")

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 12 / 59

slide-23
SLIDE 23

Welcome to Stat 101! Introduction to Inference

What about the first initials?

1

Obtain data from SS website: name, gender, frequency.

d <- read.csv("yob2012.txt")

2

Use an R function (substring) to extract the initial of the name.

d$initial = substring(d[,1],1,1)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 12 / 59

slide-24
SLIDE 24

Welcome to Stat 101! Introduction to Inference

What about the first initials?

1

Obtain data from SS website: name, gender, frequency.

d <- read.csv("yob2012.txt")

2

Use an R function (substring) to extract the initial of the name.

d$initial = substring(d[,1],1,1)

3

Make a barplot of the initials, by gender if desired.

barplot(table(d$initial)) barplot(table(d$initial[d$gender == "M"])) barplot(table(d$initial[d$gender == "F"]))

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 12 / 59

slide-25
SLIDE 25

Welcome to Stat 101! Introduction to Inference

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Initials − All names in 2012

1000 2000 3000 4000

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 13 / 59

slide-26
SLIDE 26

Welcome to Stat 101! Introduction to Inference

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Initials − All names in 201 (M)

200 400 600 800 1200 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Initials − All names in 2012 (F)

500 1000 1500 2000 2500 3000

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 14 / 59

slide-27
SLIDE 27

Welcome to Stat 101! Introduction to Inference

Step 4: Form a conclusion

In 2012, there were about 3,000 babies given a first name that began with ”j” based on the data from the Social Security database The list of babies from the Social Security data set is a sample, a group of individuals taken from the entire population. The number of individuals in the sample is usually denoted with the letter n. A statistic is any function of the data collected in the sample (e.g., mean, median, etc). So, the count of the babies in the Social Security data set for 2012 who have first initial ”j” is a statistic.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 15 / 59

slide-28
SLIDE 28

Welcome to Stat 101! Populations and Samples

Data Collection

Be aware that there exist “bad” samples. “There are three kinds of lies: lies, damned lies, and statistics.” If poor sampling techniques are utilized, then the observed statistics will not be applicable to the true population of interest.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 16 / 59

slide-29
SLIDE 29

Welcome to Stat 101! Populations and Samples

Data Collection

Be aware that there exist “bad” samples. “There are three kinds of lies: lies, damned lies, and statistics.” If poor sampling techniques are utilized, then the observed statistics will not be applicable to the true population of interest. Example Data Collection:

Raise your hand if you have been on an airplane in the past two years.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 16 / 59

slide-30
SLIDE 30

Welcome to Stat 101! Populations and Samples

Data Collection

Be aware that there exist “bad” samples. “There are three kinds of lies: lies, damned lies, and statistics.” If poor sampling techniques are utilized, then the observed statistics will not be applicable to the true population of interest. Example Data Collection:

Raise your hand if you have been on an airplane in the past two years. What does this tell us about how many 17-30 year olds have ridden an airplane in the past two years?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 16 / 59

slide-31
SLIDE 31

Welcome to Stat 101! Sampling from a population

Census

Wouldn’t it be better to just include everyone and “sample” the entire population, i.e. conduct a census?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 17 / 59

slide-32
SLIDE 32

Welcome to Stat 101! Sampling from a population

Census

Wouldn’t it be better to just include everyone and “sample” the entire population, i.e. conduct a census? Some individuals are hard to locate or hard to measure. And these difficult-to-find people may have certain characteristics that distinguish them from the rest of the population. Populations rarely stand still. Even if you could take a census, the population changes constantly, so it’s never possible to get a perfect measure.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 17 / 59

slide-33
SLIDE 33

Welcome to Stat 101! Sampling from a population

Census

Wouldn’t it be better to just include everyone and “sample” the entire population, i.e. conduct a census? Some individuals are hard to locate or hard to measure. And these difficult-to-find people may have certain characteristics that distinguish them from the rest of the population. Populations rarely stand still. Even if you could take a census, the population changes constantly, so it’s never possible to get a perfect measure.

http://www.npr.org/templates/story/story.php?storyId=125380052 Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 17 / 59

slide-34
SLIDE 34

Welcome to Stat 101! Sampling from a population

Exploratory analysis to inference

Sampling is natural.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 18 / 59

slide-35
SLIDE 35

Welcome to Stat 101! Sampling from a population

Exploratory analysis to inference

Sampling is natural. Think about sampling something you are cooking - you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 18 / 59

slide-36
SLIDE 36

Welcome to Stat 101! Sampling from a population

Exploratory analysis to inference

Sampling is natural. Think about sampling something you are cooking - you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole. When you taste a spoonful of soup and decide the spoonful you tasted isn’t salty enough, that’s exploratory analysis.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 18 / 59

slide-37
SLIDE 37

Welcome to Stat 101! Sampling from a population

Exploratory analysis to inference

Sampling is natural. Think about sampling something you are cooking - you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole. When you taste a spoonful of soup and decide the spoonful you tasted isn’t salty enough, that’s exploratory analysis. If you generalize and conclude that your entire soup needs salt, that’s an inference.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 18 / 59

slide-38
SLIDE 38

Welcome to Stat 101! Sampling from a population

Exploratory analysis to inference

Sampling is natural. Think about sampling something you are cooking - you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole. When you taste a spoonful of soup and decide the spoonful you tasted isn’t salty enough, that’s exploratory analysis. If you generalize and conclude that your entire soup needs salt, that’s an inference. For your inference to be valid, the spoonful you tasted (the sample) needs to be representative of the entire pot (the population).

If your spoonful comes only from the surface and the salt is collected at the bottom of the pot, what you tasted is probably not representative of the whole pot. If you first stir the soup thoroughly before you taste, your spoonful will more likely be representative of the whole pot.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 18 / 59

slide-39
SLIDE 39

Welcome to Stat 101! Sampling bias

Landon vs. FDR

A historical example of a biased sample yielding misleading results: In 1936, Landon sought the Republican presidential nomination opposing the re-election of FDR.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 19 / 59

slide-40
SLIDE 40

Welcome to Stat 101! Sampling bias

The Literary Digest Poll

The Literary Digest polled about 10 million Americans, and got responses from about 2.4 million. The poll showed that Landon would likely be the overwhelming winner and FDR would get only 43% of the votes. Election result: FDR won, with 62% of the votes. The magazine was completely discredited because of the poll, and was soon discontinued.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 20 / 59

slide-41
SLIDE 41

Welcome to Stat 101! Sampling bias

The Literary Digest Poll - what went wrong?

The magazine had surveyed

its own readers, registered automobile owners, and registered telephone users.

These groups had incomes well above the national average of the day (remember, this is Great Depression era) which resulted in lists of voters far more likely to support Republicans than a truly typical voter of the time, i.e. the sample was not representative of the American population at the time.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 21 / 59

slide-42
SLIDE 42

Welcome to Stat 101! Sampling bias

Large samples are preferable, but...

The Literary Digest election poll was based on a sample size of 2.4 million, which is huge, but since the sample was biased, the sample did not yield an accurate prediction. Back to the soup analogy: If the soup is not well stirred, it doesn’t matter how large a spoon you have, it will still not taste right. If the soup is well stirred, a small spoon will suffice to test the soup.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 22 / 59

slide-43
SLIDE 43

Welcome to Stat 101! Sampling bias

A few sources of bias

Non-response: If only a (non-random) fraction of the randomly sampled people choose to respond to a survey, the sample may no longer be representative of the population.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 23 / 59

slide-44
SLIDE 44

Welcome to Stat 101! Sampling bias

A few sources of bias

Non-response: If only a (non-random) fraction of the randomly sampled people choose to respond to a survey, the sample may no longer be representative of the population. Voluntary response: Occurs when the sample consists of people who volunteer to respond because they have strong opinions on the issue, and hence is not representative of the population.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 23 / 59

slide-45
SLIDE 45

Welcome to Stat 101! Sampling bias

A few sources of bias

Non-response: If only a (non-random) fraction of the randomly sampled people choose to respond to a survey, the sample may no longer be representative of the population. Voluntary response: Occurs when the sample consists of people who volunteer to respond because they have strong opinions on the issue, and hence is not representative of the population.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 23 / 59

slide-46
SLIDE 46

Welcome to Stat 101! Sampling bias

A few sources of bias

Non-response: If only a (non-random) fraction of the randomly sampled people choose to respond to a survey, the sample may no longer be representative of the population. Voluntary response: Occurs when the sample consists of people who volunteer to respond because they have strong opinions on the issue, and hence is not representative of the population.

edition.com, Aug 29, 2013 Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 23 / 59

slide-47
SLIDE 47

Welcome to Stat 101! Sampling bias

A few sources of bias

Non-response: If only a (non-random) fraction of the randomly sampled people choose to respond to a survey, the sample may no longer be representative of the population. Voluntary response: Occurs when the sample consists of people who volunteer to respond because they have strong opinions on the issue, and hence is not representative of the population.

edition.com, Aug 29, 2013

Convenience sample: Individuals who are easily accessible are more likely to be included in the sample. What type of bias do reviews on Amazon.com have? What about re- views on RateMyProfessor.com?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 23 / 59

slide-48
SLIDE 48

Welcome to Stat 101! Sampling bias

Participation question A school district is considering whether it will no longer allow high school students to park at school after two recent accidents where students were severely injured. As a first step, they survey parents by mail, asking them whether or not the parents would object to this policy change. Of 6,000 sur- veys that go out, 1,200 are returned. Of these 1,200 surveys that were com- pleted, 960 agreed with the policy change and 240 disagreed. Which of the following statements are true?

  • I. Some of the mailings may have never reached the parents.
  • II. The school district has strong support from parents to move forward

with the policy approval.

  • III. It is possible that majority of the parents of high school students

disagree with the policy change.

  • IV. The survey results are unlikely to be biased because all parents were

mailed a survey. (a) Only I (b) I and II (c) I and III (d) III and IV (e) Only IV

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 24 / 59

slide-49
SLIDE 49

Welcome to Stat 101! Sampling bias

Participation question A school district is considering whether it will no longer allow high school students to park at school after two recent accidents where students were severely injured. As a first step, they survey parents by mail, asking them whether or not the parents would object to this policy change. Of 6,000 sur- veys that go out, 1,200 are returned. Of these 1,200 surveys that were com- pleted, 960 agreed with the policy change and 240 disagreed. Which of the following statements are true?

  • I. Some of the mailings may have never reached the parents.
  • II. The school district has strong support from parents to move forward

with the policy approval.

  • III. It is possible that majority of the parents of high school students

disagree with the policy change.

  • IV. The survey results are unlikely to be biased because all parents were

mailed a survey. (a) Only I (b) I and II (c) I and III (d) III and IV (e) Only IV

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 24 / 59

slide-50
SLIDE 50

Welcome to Stat 101! Sampling bias

A picture’s worth a lot, but...

A lot of the time we only have part of the story. BabyCenter: ”Our data comes from nearly half a million parents who shared their baby’s name with us in 2014.” http://www.babycenter.com/top-baby-names-2014

1

The Netflix effect

Orange is the new Black: Galina, Piper, Nicky, Alex, Gloria House of Cards : Garrett, Claire, Robin, Wright

2

A blizzard of Frozen names (Elsa, Hans, Kristin)

Are we comfortable making decisions about these name trends based

  • n this data?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 25 / 59

slide-51
SLIDE 51

Welcome to Stat 101! Sampling bias

A picture’s worth a lot, but...

A lot of the time we only have part of the story. BabyCenter: ”Our data comes from nearly half a million parents who shared their baby’s name with us in 2014.” http://www.babycenter.com/top-baby-names-2014

1

The Netflix effect

Orange is the new Black: Galina, Piper, Nicky, Alex, Gloria House of Cards : Garrett, Claire, Robin, Wright

2

A blizzard of Frozen names (Elsa, Hans, Kristin)

Are we comfortable making decisions about these name trends based

  • n this data? The ”name Elsa soared 29 percent on our list of names

for baby girls”. Is this sample statistic enough for us to conclude that the population parameter of the percent of newborn girls in the United States who are named Elsa has increased from 2013 to 2014?

http://www.babycenter.com/top-baby-names-2014 Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 25 / 59

slide-52
SLIDE 52

Welcome to Stat 101! Observational studies and experiments

Causality versus Correlation

1

Has the popularity in Frozen caused an increase in the number

  • f baby girls in 2014 that were named Elsa?

Causal Effect

2

Is the popularity in Frozen related to the increase in the number

  • f baby girls in 2014 that were named Elsa?

Correlation, or relationship

We collect our data differently depending on the type of relationship (causal or correlation) that we are interested in.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 26 / 59

slide-53
SLIDE 53

Welcome to Stat 101! Observational studies and experiments

Observational studies and experiments

An experimental study is a controlled study in which the researchers impose treatments upon the subjects.

Subjects are assigned to control and treatment groups using random assignment. Experiments are the preferred method of data collection because

  • ften results can be attributed as causal. I.e., we can conclude

that the treatments caused the response of the study. In some cases experiments are not always feasible or ethical.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 27 / 59

slide-54
SLIDE 54

Welcome to Stat 101! Observational studies and experiments

Observational studies and experiments

An experimental study is a controlled study in which the researchers impose treatments upon the subjects.

Subjects are assigned to control and treatment groups using random assignment. Experiments are the preferred method of data collection because

  • ften results can be attributed as causal. I.e., we can conclude

that the treatments caused the response of the study. In some cases experiments are not always feasible or ethical.

An observational study is a study in which the researchers did not assign the subjects to treatments.

Observational studies retain the notion of treatment and control groups. Observational studies still require the researcher to clearly define a research question. This requires identification of the response variable that they will measure on each subject in the study.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 27 / 59

slide-55
SLIDE 55

Welcome to Stat 101! Observational studies and experiments

Experimental vs Observational Datasets (cont.)

Example: We want to consider the effect of drinking alcohol during pregnancy on rates of Fetal Alcohol Syndrome.

Research question (population/parameter)? Should we use experimental or observational data? What potential biases should we be cautious of?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 28 / 59

slide-56
SLIDE 56

Welcome to Stat 101! Observational studies and experiments

Experimental vs Observational Datasets (cont.)

Example: We want to consider the effect of drinking alcohol during pregnancy on rates of Fetal Alcohol Syndrome.

Research question (population/parameter)? Should we use experimental or observational data? What potential biases should we be cautious of?

Response Bias Non-response Bias Undercoverage Bias

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 28 / 59

slide-57
SLIDE 57

Welcome to Stat 101! Observational studies and experiments

Observational studies and experiments (Recap)

Observational study: Researchers collect data in a way that does not directly interfere with how the data arise, i.e. they merely “observe”, and can only establish an association between the explanatory and response variables.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 29 / 59

slide-58
SLIDE 58

Welcome to Stat 101! Observational studies and experiments

Observational studies and experiments (Recap)

Observational study: Researchers collect data in a way that does not directly interfere with how the data arise, i.e. they merely “observe”, and can only establish an association between the explanatory and response variables. Experiment: Researchers randomly assign subjects to various treatments in order to establish causal connections between the explanatory and response variables.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 29 / 59

slide-59
SLIDE 59

Welcome to Stat 101! Observational studies and experiments

Observational studies and experiments (Recap)

Observational study: Researchers collect data in a way that does not directly interfere with how the data arise, i.e. they merely “observe”, and can only establish an association between the explanatory and response variables. Experiment: Researchers randomly assign subjects to various treatments in order to establish causal connections between the explanatory and response variables. If you’re going to walk away with one thing from this class, let it be “correlation does not imply causation”.

http://xkcd.com/552/ Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 29 / 59

slide-60
SLIDE 60

Welcome to Stat 101! Cereal breakfast Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 30 / 59

slide-61
SLIDE 61

Welcome to Stat 101! Cereal breakfast

What type of study is this, observational study or an experiment?

“Girls who regularly ate breakfast, particularly one that includes cereal, were slimmer than those who skipped the morning meal, according to a study that tracked nearly 2,400 girls for 10 years. [...] As part of the survey, the girls were asked once a year what they had eaten during the previous three days.”

What is the conclusion of the study? Who sponsored the study?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 31 / 59

slide-62
SLIDE 62

Welcome to Stat 101! Cereal breakfast

What type of study is this, observational study or an experiment?

“Girls who regularly ate breakfast, particularly one that includes cereal, were slimmer than those who skipped the morning meal, according to a study that tracked nearly 2,400 girls for 10 years. [...] As part of the survey, the girls were asked once a year what they had eaten during the previous three days.”

This is an observational study since the researchers merely observed the behavior of the girls (subjects) as opposed to imposing treatments

  • n them.

What is the conclusion of the study? Who sponsored the study?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 31 / 59

slide-63
SLIDE 63

Welcome to Stat 101! Cereal breakfast

What type of study is this, observational study or an experiment?

“Girls who regularly ate breakfast, particularly one that includes cereal, were slimmer than those who skipped the morning meal, according to a study that tracked nearly 2,400 girls for 10 years. [...] As part of the survey, the girls were asked once a year what they had eaten during the previous three days.”

This is an observational study since the researchers merely observed the behavior of the girls (subjects) as opposed to imposing treatments

  • n them.

What is the conclusion of the study? There is an association between girls eating breakfast and being slimmer. Who sponsored the study?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 31 / 59

slide-64
SLIDE 64

Welcome to Stat 101! Cereal breakfast

What type of study is this, observational study or an experiment?

“Girls who regularly ate breakfast, particularly one that includes cereal, were slimmer than those who skipped the morning meal, according to a study that tracked nearly 2,400 girls for 10 years. [...] As part of the survey, the girls were asked once a year what they had eaten during the previous three days.”

This is an observational study since the researchers merely observed the behavior of the girls (subjects) as opposed to imposing treatments

  • n them.

What is the conclusion of the study? There is an association between girls eating breakfast and being slimmer. Who sponsored the study? General Mills.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 31 / 59

slide-65
SLIDE 65

Welcome to Stat 101! Cereal breakfast

3 possible explanations:

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 32 / 59

slide-66
SLIDE 66

Welcome to Stat 101! Cereal breakfast

3 possible explanations:

1

Eating breakfast causes girls to be thinner.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 32 / 59

slide-67
SLIDE 67

Welcome to Stat 101! Cereal breakfast

3 possible explanations:

1

Eating breakfast causes girls to be thinner.

2

Being thin causes girls to eat breakfast.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 32 / 59

slide-68
SLIDE 68

Welcome to Stat 101! Cereal breakfast

3 possible explanations:

1

Eating breakfast causes girls to be thinner.

2

Being thin causes girls to eat breakfast.

3

A third variable is responsible for both. What could it be? An extraneous variable that affects both the explanatory and the response variable and that make it seem like there is a relationship between the two are called confounding variables.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 32 / 59

slide-69
SLIDE 69

Welcome to Stat 101! Observations and variables

Types of variables

all variables numerical categorical continuous discrete

regular categorical

  • rdinal

measured counted unordered categories

  • rdered

categories

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 33 / 59

slide-70
SLIDE 70

Welcome to Stat 101! Observations and variables

Types of variables (cont.)

type: small, midsize or large.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

slide-71
SLIDE 71

Welcome to Stat 101! Observations and variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

slide-72
SLIDE 72

Welcome to Stat 101! Observations and variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal) price: average price in $1000’s

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

slide-73
SLIDE 73

Welcome to Stat 101! Observations and variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal) price: average price in $1000’s (numerical, continuous)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

slide-74
SLIDE 74

Welcome to Stat 101! Observations and variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal) price: average price in $1000’s (numerical, continuous) mpgCity: cite mileage per gallon

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

slide-75
SLIDE 75

Welcome to Stat 101! Observations and variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal) price: average price in $1000’s (numerical, continuous) mpgCity: cite mileage per gallon (numerical, continuous)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

slide-76
SLIDE 76

Welcome to Stat 101! Observations and variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal) price: average price in $1000’s (numerical, continuous) mpgCity: cite mileage per gallon (numerical, continuous) drivetrain: front, rear, 4WD

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

slide-77
SLIDE 77

Welcome to Stat 101! Observations and variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal) price: average price in $1000’s (numerical, continuous) mpgCity: cite mileage per gallon (numerical, continuous) drivetrain: front, rear, 4WD (categorical)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

slide-78
SLIDE 78

Welcome to Stat 101! Observations and variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal) price: average price in $1000’s (numerical, continuous) mpgCity: cite mileage per gallon (numerical, continuous) drivetrain: front, rear, 4WD (categorical) passengers: passenger capacity

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

slide-79
SLIDE 79

Welcome to Stat 101! Observations and variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal) price: average price in $1000’s (numerical, continuous) mpgCity: cite mileage per gallon (numerical, continuous) drivetrain: front, rear, 4WD (categorical) passengers: passenger capacity (numerical, discrete)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

slide-80
SLIDE 80

Welcome to Stat 101! Observations and variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal) price: average price in $1000’s (numerical, continuous) mpgCity: cite mileage per gallon (numerical, continuous) drivetrain: front, rear, 4WD (categorical) passengers: passenger capacity (numerical, discrete) weight: car weight in pounds

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

slide-81
SLIDE 81

Welcome to Stat 101! Observations and variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal) price: average price in $1000’s (numerical, continuous) mpgCity: cite mileage per gallon (numerical, continuous) drivetrain: front, rear, 4WD (categorical) passengers: passenger capacity (numerical, discrete) weight: car weight in pounds (numerical, continuous)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

slide-82
SLIDE 82

Welcome to Stat 101! Observations and variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal) price: average price in $1000’s (numerical, continuous) mpgCity: cite mileage per gallon (numerical, continuous) drivetrain: front, rear, 4WD (categorical) passengers: passenger capacity (numerical, discrete) weight: car weight in pounds (numerical, continuous)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

slide-83
SLIDE 83

Welcome to Stat 101! Principles of experimental design

Principles of experimental design

1

Control: Compare treatment of interest to a control group.

2

Randomize: Randomly assign subjects to treatments.

3

Replicate: Within a study, replicate by collecting a sufficiently large sample. Or replicate the entire study.

4

Block: If there are variables that are known or suspected to affect the response variable, first group subjects into blocks based on these variables, and then randomize cases within each block to treatment groups.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 35 / 59

slide-84
SLIDE 84

Welcome to Stat 101! Principles of experimental design

More on blocking

We would like to design an experiment to investigate if energy gels makes you run faster:

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 36 / 59

slide-85
SLIDE 85

Welcome to Stat 101! Principles of experimental design

More on blocking

We would like to design an experiment to investigate if energy gels makes you run faster:

Treatment: energy gel Control: no energy gel

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 36 / 59

slide-86
SLIDE 86

Welcome to Stat 101! Principles of experimental design

More on blocking

We would like to design an experiment to investigate if energy gels makes you run faster:

Treatment: energy gel Control: no energy gel

It is suspected that energy gels might affect pro and amateur athletes differently, therefore we block for pro status:

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 36 / 59

slide-87
SLIDE 87

Welcome to Stat 101! Principles of experimental design

More on blocking

We would like to design an experiment to investigate if energy gels makes you run faster:

Treatment: energy gel Control: no energy gel

It is suspected that energy gels might affect pro and amateur athletes differently, therefore we block for pro status:

Divide the sample to pro and amateur Randomly assign pro athletes to treatment and control groups Randomly assign amateur athletes to treatment and control groups Pro/amateur status is equally represented in the resulting treatment and control groups

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 36 / 59

slide-88
SLIDE 88

Welcome to Stat 101! Principles of experimental design

More on blocking

We would like to design an experiment to investigate if energy gels makes you run faster:

Treatment: energy gel Control: no energy gel

It is suspected that energy gels might affect pro and amateur athletes differently, therefore we block for pro status:

Divide the sample to pro and amateur Randomly assign pro athletes to treatment and control groups Randomly assign amateur athletes to treatment and control groups Pro/amateur status is equally represented in the resulting treatment and control groups

Why is this important? Can you think of other variables to block for?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 36 / 59

slide-89
SLIDE 89

Welcome to Stat 101! Principles of experimental design

Participation question A study is designed to test the effect of light level and noise level on exam performance of students. The researcher also believes that light and noise levels might have different effects on males and females, so wants to make sure both genders are represented equally under different conditions. Which of the below is correct? (a) There are 3 explanatory variables (light, noise, gender) and 1 response variable (exam performance) (b) There are 2 explanatory variables (light and noise), 1 blocking variable (gender), and 1 response variable (exam performance) (c) There is 1 explanatory variable (gender) and 3 response variables (light, noise, exam performance) (d) There are 2 blocking variables (light and noise), 1 explanatory variable (gender), and 1 response variable (exam performance)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 37 / 59

slide-90
SLIDE 90

Welcome to Stat 101! Principles of experimental design

Participation question A study is designed to test the effect of light level and noise level on exam performance of students. The researcher also believes that light and noise levels might have different effects on males and females, so wants to make sure both genders are represented equally under different conditions. Which of the below is correct? (a) There are 3 explanatory variables (light, noise, gender) and 1 response variable (exam performance) (b) There are 2 explanatory variables (light and noise), 1 blocking variable (gender), and 1 response variable (exam performance) (c) There is 1 explanatory variable (gender) and 3 response variables (light, noise, exam performance) (d) There are 2 blocking variables (light and noise), 1 explanatory variable (gender), and 1 response variable (exam performance)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 37 / 59

slide-91
SLIDE 91

Welcome to Stat 101! Principles of experimental design

Difference between blocking and explanatory variables

Factors are conditions we can impose on the experimental units. Blocking variables are characteristics that the experimental units come with, that we would like to control for. Blocking is like stratifying, except used in experimental settings when randomly assigning, as opposed to when sampling.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 38 / 59

slide-92
SLIDE 92

Welcome to Stat 101! Principles of experimental design

More experimental design terminology...

Placebo: fake treatment, often used as the control group for medical studies Placebo effect: experimental units showing improvement simply because they believe they are receiving a special treatment Blinding: when experimental units do not know whether they are in the control or treatment group Double-blind: when both the experimental units and the researchers do not know who is in the control and who is in the treatment group

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 39 / 59

slide-93
SLIDE 93

Welcome to Stat 101! Recap

Participation question What is the main difference between observational studies and exper- iments? (a) Experiments take place in a lab while observational studies do not need to. (b) In an observational study we only look at what happened in the past. (c) Most experiments use random assignment while observational studies do not. (d) Observational studies are completely useless since no causal inference can be made based on their findings.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 40 / 59

slide-94
SLIDE 94

Welcome to Stat 101! Recap

Participation question What is the main difference between observational studies and exper- iments? (a) Experiments take place in a lab while observational studies do not need to. (b) In an observational study we only look at what happened in the past. (c) Most experiments use random assignment while observational studies do not. (d) Observational studies are completely useless since no causal inference can be made based on their findings.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 40 / 59

slide-95
SLIDE 95

Welcome to Stat 101! Recap

Random assignment vs. random sampling

Random assignment No random assignment Random sampling

Causal conclusion, generalized to the whole population. No causal conclusion, correlation statement generalized to the whole population.

Generalizability No random sampling

Causal conclusion,

  • nly for the sample.

No causal conclusion, correlation statement only for the sample.

No generalizability Causation Correlation

ideal experiment most experiments most

  • bservational

studies bad

  • bservational

studies

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 41 / 59

slide-96
SLIDE 96

Welcome to Stat 101! Recap

More...

Want more baby name analysis? Freakonomics podcast: How Much Does Your Name Matter? http://freakonomics.com/2013/04/08/ how-much-does-your-name-matter-a-new-freakonomics-radio-podcast/

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 42 / 59

slide-97
SLIDE 97

Syllabus & policies

1

Welcome to Stat 101! Introduction to Inference Populations and Samples Sampling from a population Sampling bias Observational studies and experiments Cereal breakfast Observations and variables Principles of experimental design Recap

2

Syllabus & policies Logistics Goals and topics Details Support Policies Tips

3

To do

Sta 101 U1 - L1: Data coll., obs. studies, experiments N.Dalzell– Duke University

slide-98
SLIDE 98

Syllabus & policies Logistics

General Info

Instructor: Nicole Dalzell - nmd16@stat.duke.edu Old Chemistry 214 Lecture: MTuWThF 11:30 AM - 12:45 PM Old Chemistry 201 Lab: TuWTh 1 PM - 2 PM Old Chemistry 101 Office hours: Tentative: Monday 2:00 PM - 3:00 PM Wednesday 3-4 PM Friday 2-3 PM, or by appointment

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 42 / 59

slide-99
SLIDE 99

Syllabus & policies Logistics

Required materials

Textbook OpenIntro Statistics Diez, Barr, C ¸ etinkaya-Rundel CreateSpace, 2nd Edition, 2012 ISBN: 978-1478217206 Calculator (Optional) You might need a four function calcu- lator that can do square roots for this class. No limitation on the type of calculator you can use.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 43 / 59

slide-100
SLIDE 100

Syllabus & policies Logistics

Webpage

https://stat.duke.edu/ ∼nmd16/courses/Summer15/sta101.001-1/

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 44 / 59

slide-101
SLIDE 101

Syllabus & policies Goals and topics

Inference

Design

  • f studies

Probability Bayesian inference Frequentist inference (CLT & simulation) Modeling (numerical response) 1 explanatory numerical categorical

  • ne mean & median
  • ne proportion

many explanatory Exploratory data analysis

two means & medians many means two proportions many proportions

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 45 / 59

slide-102
SLIDE 102

Syllabus & policies Details

Course structure

Seven learning units. Set of learning objectives and required and suggested readings, videos, etc. for each unit. Prior to beginning the unit, complete the readings and familiarize yourselves with the learning objectives. Begin a new unit with a readiness assessment: individual, then team. Class time: split between lecture, discussion/application. Computing labs.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 46 / 59

slide-103
SLIDE 103

Syllabus & policies Details

Class - duration of unit

Slides will be posted on the course webpage (under schedule)

  • n the day of the course.

Discussion of concepts as well as hands on activities and exercises to complement them. Attend class to keep up with the pace and not fall behind + to contribute to application activities completed in teams. You are responsible for all the material covered in all components

  • f the course, not just the class. Please ask questions in class,
  • ffice-hours or by e-mail if you are struggling (or just curious), do

not wait until just before an exam when it may be too late.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 47 / 59

slide-104
SLIDE 104

Syllabus & policies Details

Participation questions: attendance and participation

Objective: Make you an active participant and help me pace the class. On new material being discussed in class that day. Credit for participation, regardless of whether you have the correct answer. Up to two unexcused late arrivals or absences will not affect your participation grade. While I might sometimes call on you during the class discussion, it is your responsibility to be an active participant without being called on.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 48 / 59

slide-105
SLIDE 105

Syllabus & policies Details

Problem sets and labs

Problem sets: Objective: Help you develop a more in-depth understanding of the material and help you prepare for exams and projects.

Individual: collaborate but don’t copy! – submit in class, show all work.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 49 / 59

slide-106
SLIDE 106

Syllabus & policies Details

Problem sets and labs

Problem sets: Objective: Help you develop a more in-depth understanding of the material and help you prepare for exams and projects.

Individual: collaborate but don’t copy! – submit in class, show all work.

Labs: Objective: Give you hands on experience with data analysis using a statistical software and provide you with tools for the projects.

In partners – turn in lab report on Sakai by the following day at 5 PM.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 49 / 59

slide-107
SLIDE 107

Syllabus & policies Details

Problem sets and labs

Problem sets: Objective: Help you develop a more in-depth understanding of the material and help you prepare for exams and projects.

Individual: collaborate but don’t copy! – submit in class, show all work.

Labs: Objective: Give you hands on experience with data analysis using a statistical software and provide you with tools for the projects.

In partners – turn in lab report on Sakai by the following day at 5 PM.

Lowest score dropped for both.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 49 / 59

slide-108
SLIDE 108

Syllabus & policies Details

Project

Objective: Give you independent applied research experience using real data and statistical methods. individual statistical inference exploring the distributional characteristics of

  • ne variable or relationship between two variables

choose a research question, find data, analyze it, write up your results

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 50 / 59

slide-109
SLIDE 109

Syllabus & policies Details

Exams

Midterm: Wednesday, July 16th, in class Final: Saturday, August 9th (9:00 AM - 12:00 PM) (Cumulative) Exam dates cannot be changed. No make-up exams will be

  • given. If you cannot take the exams on these dates you should

drop this class. You must bring a calculator to the exams (no cell phones, iPods, etc.) and you are also allowed to bring one sheet of notes (“cheat sheet”). This sheet must be no larger than 81

2” × 11” and

must be prepared by you (no photocopies). You may use both sides of the sheet.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 51 / 59

slide-110
SLIDE 110

Syllabus & policies Details

Grading

In Class Participation/Activities: 5% Quizzes: 5% Problem sets: 15% Labs: 10% Project: 20% Midterm: 20% Final: 25%

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 52 / 59

slide-111
SLIDE 111

Syllabus & policies Support

Email

I will regularly send announcements by email, so make sure to check your email daily. While email is the quickest way to reach me outside of class, it is much more efficient to answer most statistical questions in person.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 53 / 59

slide-112
SLIDE 112

Syllabus & policies Support

Discussion Forum on Sakai

Content related questions should be posted on the Discussion Forum on Sakai. Title your questions according to the guidelines on the forum. Check if your question has already been answered before posting a new question. I will be answering questions on the forum daily and all students are expected to answer questions as well. “Watch” the forums to be notified when a new question is posted.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 54 / 59

slide-113
SLIDE 113

Syllabus & policies Support

Office hours

Instructor Mondays and Wednesdays 2:00 - 3:00 PM You are highly encouraged to stop by with any questions or comments about the class, or just to say hi and introduce yourself. Most problem sets due on Tuesday and Thursday. Recommend attempting all problems two days before to make the most of OH (and lab sessions).

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 55 / 59

slide-114
SLIDE 114

Syllabus & policies Policies

Policies

Late work policy for problem sets and labs reports:

late but submitted during class: lose 10% of points after class on due date: lose 20% of points next day: lose 40% of points later than next day: lose all points

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 56 / 59

slide-115
SLIDE 115

Syllabus & policies Policies

Policies

Late work policy for problem sets and labs reports:

late but submitted during class: lose 10% of points after class on due date: lose 20% of points next day: lose 40% of points later than next day: lose all points

Late work policy for project: 10% off for each day (24-hour period) late.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 56 / 59

slide-116
SLIDE 116

Syllabus & policies Policies

Policies

Late work policy for problem sets and labs reports:

late but submitted during class: lose 10% of points after class on due date: lose 20% of points next day: lose 40% of points later than next day: lose all points

Late work policy for project: 10% off for each day (24-hour period) late. No make-ups

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 56 / 59

slide-117
SLIDE 117

Syllabus & policies Policies

Policies

Late work policy for problem sets and labs reports:

late but submitted during class: lose 10% of points after class on due date: lose 20% of points next day: lose 40% of points later than next day: lose all points

Late work policy for project: 10% off for each day (24-hour period) late. No make-ups Regrade requests: within one week, no regrade for number of points deducted for a mistake, no regrade after the final

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 56 / 59

slide-118
SLIDE 118

Syllabus & policies Policies

Academic Dishonesty

Any form of academic dishonesty will result in an immediate 0 on the given assignment and will be reported to the Office of Student

  • Conduct. Additional penalties may also be assessed if deemed
  • appropriate. If you have any questions about whether something is or

is not allowed, ask me beforehand. Some examples: Use of disallowed materials (including any form of communication with classmates or accessing the web) during exams and readiness assessments. Plagiarism of any kind. Use of outside answer keys or solution manuals for the homework. If you have any questions about whether something is or is not allowed, ask me beforehand.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 57 / 59

slide-119
SLIDE 119

Syllabus & policies Tips

Tips for success

1

Complete the reading before a new unit begins, and then review again after the unit is over.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 58 / 59

slide-120
SLIDE 120

Syllabus & policies Tips

Tips for success

1

Complete the reading before a new unit begins, and then review again after the unit is over.

2

Be an active participant during lectures and labs.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 58 / 59

slide-121
SLIDE 121

Syllabus & policies Tips

Tips for success

1

Complete the reading before a new unit begins, and then review again after the unit is over.

2

Be an active participant during lectures and labs.

3

Ask questions - during class or office hours, or by email. Ask me and your classmates.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 58 / 59

slide-122
SLIDE 122

Syllabus & policies Tips

Tips for success

1

Complete the reading before a new unit begins, and then review again after the unit is over.

2

Be an active participant during lectures and labs.

3

Ask questions - during class or office hours, or by email. Ask me and your classmates.

4

Do the problem sets - start early and make sure you attempt and understand all questions.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 58 / 59

slide-123
SLIDE 123

Syllabus & policies Tips

Tips for success

1

Complete the reading before a new unit begins, and then review again after the unit is over.

2

Be an active participant during lectures and labs.

3

Ask questions - during class or office hours, or by email. Ask me and your classmates.

4

Do the problem sets - start early and make sure you attempt and understand all questions.

5

Start your project early and and allow adequate time to complete it.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 58 / 59

slide-124
SLIDE 124

Syllabus & policies Tips

Tips for success

1

Complete the reading before a new unit begins, and then review again after the unit is over.

2

Be an active participant during lectures and labs.

3

Ask questions - during class or office hours, or by email. Ask me and your classmates.

4

Do the problem sets - start early and make sure you attempt and understand all questions.

5

Start your project early and and allow adequate time to complete it.

6

Give yourself plenty of time to prepare a good cheat sheet for

  • exams. This requires going through the material and taking the

time to review the concepts that you’re not comfortable with.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 58 / 59

slide-125
SLIDE 125

Syllabus & policies Tips

Tips for success

1

Complete the reading before a new unit begins, and then review again after the unit is over.

2

Be an active participant during lectures and labs.

3

Ask questions - during class or office hours, or by email. Ask me and your classmates.

4

Do the problem sets - start early and make sure you attempt and understand all questions.

5

Start your project early and and allow adequate time to complete it.

6

Give yourself plenty of time to prepare a good cheat sheet for

  • exams. This requires going through the material and taking the

time to review the concepts that you’re not comfortable with.

7

Do not procrastinate - don’t let a unit go by with unanswered questions as it will just make the following unit’s material even more difficult to follow.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 58 / 59

slide-126
SLIDE 126

To do

1

Welcome to Stat 101! Introduction to Inference Populations and Samples Sampling from a population Sampling bias Observational studies and experiments Cereal breakfast Observations and variables Principles of experimental design Recap

2

Syllabus & policies Logistics Goals and topics Details Support Policies Tips

3

To do

Sta 101 U1 - L1: Data coll., obs. studies, experiments N.Dalzell– Duke University

slide-127
SLIDE 127

To do

To do

1

Download or purchase the textbook. www.openintro.org

2

Read the syllabus and let me know if you have any questions.

3

Start reviewing the resources for Unit 1 – . https://stat.duke.edu/ ∼nmd16/courses/Summer15/sta101. 001-1/resources.html

4

Complete Lab 0 - this is just an introduction to RStudio.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 59 / 59