Unit 1: Introduction to data Lecture 1: Data collection, - PowerPoint PPT Presentation

Unit 1: Introduction to data Lecture 1: Data collection, observational studies, and experiments Statistics 101 Thomas Leininger May 16, 2013

Thought for the day ”We are drowning in information but starved for knowledge... Uncontrolled and unorganized information is no longer a resource in an information society, instead it becomes the enemy.” –John Naisbitt, Megatrends (1982) Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 2 / 33

Introduction to Data Some terminology Dr. Arbuthnot’s baptismal records Terms to know: year boys girls B4500 1 1629 5218 4683 TRUE case 2 1630 4858 4457 TRUE variable 3 1631 4422 4102 FALSE 4 1632 4994 4590 TRUE numerical variable 5 1633 5158 4839 TRUE discrete variable 6 1634 5035 4820 TRUE continuous variable 7 1635 5106 4928 TRUE 8 1636 4917 4605 TRUE categorical variable (levels) 9 1637 4703 4457 TRUE 10 1638 5359 4952 TRUE ordinal variable Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 3 / 33

Introduction to Data Some terminology Control vs. treatment groups A pharmaceutical company has created a wonder drug to cure bone loss. In order to sell this drug to consumers, the FDA requires this company to perform several highly regulated experiments to prove the efficacy (and safety) of this new drug. In this experiment, some patients will be randomly assigned to the control group, where they will receive a standard bone loss treatment. The other patients are all assigned to the treatment group, where they receive the new wonder drug. If the treatment group experiences significantly better outcomes, the FDA will allow this company to sell their new drug. Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 4 / 33

Introduction to Data Some terminology Association and Independence http://biojournalism.com/2012/08/correlation-vs-causation/ Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 5 / 33

Overview of data collection principles Anecdotal evidence Anecdotal evidence and early smoking research Anti-smoking research started in the 1930s and 1940s when cigarette smoking became increasingly popular. While some smokers seemed to be sensitive to cigarette smoke, others were completely unaffected. Anti-smoking research was faced with resistance based on anecdotal evidence such as “My uncle smokes three packs a day and he’s in perfectly good health”, evidence based on a limited sample size that might not be representative of the population. It was concluded that “smoking is a complex human behavior, by its nature difficult to study, confounded by human variability.” In time researchers were able to examine larger samples of cases (smokers) and trends showing that smoking has negative health impacts became much clearer. Brandt, The Cigarette Century (2009), Basic Books. Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 6 / 33

Overview of data collection principles Populations and samples Populations and samples Research question: Can people become better, more efficient runners on their own, merely by running? Population of interest: http://well.blogs.nytimes.com/2012/08/29/ finding-your-ideal-running-form Sample: Group of adult women who recently joined a running group Population to which results can be generalized: Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 7 / 33

Overview of data collection principles Sampling methods Census Wouldn’t it be better to just include everyone and “sample” the entire population? This is called a census . There are problems with taking a census: It can be difficult to complete a census: there always seem to be some individuals who are hard to locate or hard to measure. And there may be certain characteristics about those individuals who are hard to locate. Populations rarely stand still. Even if you could take a census, the population changes constantly, so it’s never possible to get a perfect measure. Taking a census may be more complex than sampling. Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 8 / 33

Overview of data collection principles Sampling methods http://www.npr.org/templates/story/story.php?storyId=125380052 Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 9 / 33

Overview of data collection principles Sampling methods Exploratory analysis to inference Sampling is natural... Think about sampling something you are cooking - you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole. When you taste a spoonful of soup and decide the spoonful you tasted isn’t salty enough, that’s exploratory analysis . If you generalize and conclude that your entire soup needs salt, that’s an inference . For your inference to be valid, the spoonful you tasted (the sample) needs to be representative of the entire pot (the population). If your spoonful comes only from the surface and the salt is collected at the bottom of the pot, what you tasted is probably not representative of the whole pot. If you first stir the soup thoroughly before you taste, your spoonful will more likely be representative of the whole pot. Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 10 / 33

Overview of data collection principles Sampling methods Simple random sample Randomly select cases from the population, each case is equally likely to be selected. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 11 / 33

Overview of data collection principles Sampling methods Stratified sample Strata are homogenous, simple random sample from each stratum. Stratum 2 Stratum 4 Stratum 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 5 Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 12 / 33

Overview of data collection principles Sampling methods Cluster sample Clusters are not necessarily homogenous, simple random sample from a random sample of clusters. Usually preferred for economical reasons. Cluster 9 Cluster 2 Cluster 5 ● ● Cluster 7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● Cluster 8 ● ● ● ● ● ● ● ● ● ● ● Cluster 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 6 ● ● ● ● ● ● ● ● ● ● Cluster 1 Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 13 / 33

Overview of data collection principles Sampling methods Question A city council has requested a household survey be conducted in a suburban area of their city. The area is broken into many distinct and unique neighborhoods, some including large homes, some with only apartments, and others a diverse mixture of housing structures. Which approach would likely be the least effective? (a) Simple random sampling (b) Cluster sampling (c) Stratified sampling (d) Blocked sampling (e) Anecdotal sampling Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 14 / 33

Unit 1: Introduction to data Lecture 1: Data collection, - PowerPoint PPT Presentation

Unit 1: Introduction to data Lecture 1: Data collection, observational studies, and experiments Statistics 101 Thomas Leininger May 16, 2013 Thought for the day We are drowning in information but starved for knowledge... Uncontrolled and

Sunglasses SM001 Collection SM005 Collection YPC001 Collection(swimming goggles) SR001

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

Conference + Meeting Spaces Salt + Pepper TONON COLLECTION Macs Table TONON COLLECTION Pit

Conference + Meeting Spaces Salt + Pepper TONON COLLECTION Macs Table TONON COLLECTION Pit

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

Data Collection and HIVe Current Data Collection For those collecting data, you are use to

Digital Tachograph Data Collection & Analysis System 1 Outline Data Collection

New Collection Spring 2015 Spring Collection 2015 Categories Outdo tdoor or To Go Icons s

Data Collection International Labour Office Department of Statistics Data Collection data

UWA Publications Collection 2013 Overview of the collection process Using Minerva Research

Unit Title: Practical Presentation Skills For Working In the Creative Industries Unit Level:

Unit Structure Stage 5 Unit 9: Coaching Presentation Unit Aim This unit aims to enable learners

Unit T esting Framework for T cl Unit T esting Framework for T cl What is Unit T

Module 6 Social Education Course Unit 1: A Place of my Own Unit 2: Making Ends Meet

SLE352 Community Science Project INFORMATION ABOUT THE UNIT The unit team Assoc Prof Jan West

Introduction to Big Data and Machine Learning Dr. Mihail November 12, 2019 (Dr. Mihail) Intro

NAME MATCHING WITH PHYLOGENIES Nicholas Andrews, Jason Eisner, Mark Dredze 1 2 2 2 Martin

IND E 498 Special Topics: Data Analytics Instructor: Prof. Shuai Huang Office: AERB 141B Phone:

Adam Clayton Powell Jr. Blvd Month Safety and Mobility Improvements Year Presentation to CB10

Lecture #9 (Diffusion) Chapra L8 David A. Reckhow CEE 577 #9 1 Forge Pond project

Medial Scaffolds for 3D data modelling: status and challenges Frederic Fol Leymarie Outline

CSCI261 Lecture 27: Events, Interrupts & Event Handling Memory, Stack, Stack Frames, The Heap

Substitutability of Symmetric Second-order Tensor Fields: An Application in Urban LiDAR 3D Point

Sambuz

Useful Links

Newsletter

Mail Us

Unit 1: Introduction to data Lecture 1: Data collection, - PowerPoint PPT Presentation

Unit 1: Introduction to data Lecture 1: Data collection, observational studies, and experiments Statistics 101 Thomas Leininger May 16, 2013 Thought for the day We are drowning in information but starved for knowledge... Uncontrolled and

Sunglasses SM001 Collection SM005 Collection YPC001 Collection(swimming goggles) SR001

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

Conference + Meeting Spaces Salt + Pepper TONON COLLECTION Macs Table TONON COLLECTION Pit

Conference + Meeting Spaces Salt + Pepper TONON COLLECTION Macs Table TONON COLLECTION Pit

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

Data Collection and HIVe Current Data Collection For those collecting data, you are use to

Digital Tachograph Data Collection &amp; Analysis System 1 Outline Data Collection

New Collection Spring 2015 Spring Collection 2015 Categories Outdo tdoor or To Go Icons s

Data Collection International Labour Office Department of Statistics Data Collection data

UWA Publications Collection 2013 Overview of the collection process Using Minerva Research

Unit Title: Practical Presentation Skills For Working In the Creative Industries Unit Level:

Unit Structure Stage 5 Unit 9: Coaching Presentation Unit Aim This unit aims to enable learners

Unit T esting Framework for T cl Unit T esting Framework for T cl What is Unit T

Module 6 Social Education Course Unit 1: A Place of my Own Unit 2: Making Ends Meet

SLE352 Community Science Project INFORMATION ABOUT THE UNIT The unit team Assoc Prof Jan West

Introduction to Big Data and Machine Learning Dr. Mihail November 12, 2019 (Dr. Mihail) Intro

NAME MATCHING WITH PHYLOGENIES Nicholas Andrews, Jason Eisner, Mark Dredze 1 2 2 2 Martin

IND E 498 Special Topics: Data Analytics Instructor: Prof. Shuai Huang Office: AERB 141B Phone:

Adam Clayton Powell Jr. Blvd Month Safety and Mobility Improvements Year Presentation to CB10

Lecture #9 (Diffusion) Chapra L8 David A. Reckhow CEE 577 #9 1 Forge Pond project

Medial Scaffolds for 3D data modelling: status and challenges Frederic Fol Leymarie Outline

CSCI261 Lecture 27: Events, Interrupts &amp; Event Handling Memory, Stack, Stack Frames, The Heap

Substitutability of Symmetric Second-order Tensor Fields: An Application in Urban LiDAR 3D Point

Sambuz

Useful Links

Newsletter

Mail Us

Digital Tachograph Data Collection & Analysis System 1 Outline Data Collection

CSCI261 Lecture 27: Events, Interrupts & Event Handling Memory, Stack, Stack Frames, The Heap