Announcements U nit 1: I ntroduction to data L ecture 2: E xploratory - PowerPoint PPT Presentation

Warm-Up and Data Basics Announcements U nit 1: I ntroduction to data L ecture 2: E xploratory data analysis S tatistics 101 Nicole Dalzell May 14, 2015 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 2 / 1 Warm-Up and Data Basics Warm-Up and Data Basics Review Types of Variables Example Still our cat example: Example Study: A researcher is interested in whether or not cats will choose to sleep Cat Age Toys # of Naps Weight (lbs) less if they have toys to entertain themselves. She divides 250 cats 1 adult 1 3 8 (adults and kittens) into two rooms, with adult cats in one room and 2 juvenile 1 5 9 baby kittens in the other room. Within each room she erects a fence, 3 adult 0 2 10.5 randomly placing half the cats (or kittens) on each side of the fence. 4 adult 1 8 12.25 On one side of the fence she scatters a variety of cat toys. For 1 day, . . . . . . . . . . the researcher records the number of hours each cat spends . . . . . 250 adult 0 5 11.67 sleeping. What is the research question? What types of variables are these: What are the explanatory and response variables? Age? Is this an Experimental or Observational study? Toys? What are the controls and treatments? # of Naps? Is blocking employed in this study? Weight? Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 3 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 3 / 1

Warm-Up and Data Basics Sampling Methods Warm-Up and Data Basics Sampling Methods Population to sample Obtaining good samples It is usually not feasible to collect information on the entire population due to high costs of data collection so statisticians instead work with samples that are (hopefully) representative of Almost all statistical methods are based on the notion of implied the populations they come from. randomness. population If observational data are not collected in a random framework sample from a population, these statistical methods – the estimates and errors associated with the estimates – are not reliable. Most commonly used random sampling techniques are simple , stratified , and cluster sampling. We try to understand certain features of the population as a whole using summary statistics and graphs based on these samples. Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 4 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 5 / 1 Warm-Up and Data Basics Sampling Methods Warm-Up and Data Basics Sampling Methods Simple random sample Stratified sample Randomly select cases from the population, each case is equally Strata are homogenous, simple random sample from each stratum. likely to be selected. Stratum 2 Stratum 4 Stratum 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 5 ● ● ● ● ● Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 6 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 7 / 1

Warm-Up and Data Basics Sampling Methods Warm-Up and Data Basics Sampling Methods Cluster sample Clusters are not necessarily homogenous, simple random sample Participation question from a random sample of clusters. Usually preferred for economical reasons. A city council has requested a household survey be conducted in a suburban area of their city. The area is broken into many distinct and Cluster 9 Cluster 5 unique neighborhoods, some including large homes, some with only Cluster 2 ● ● Cluster 7 ● ● ● ● apartments. Which approach would likely be the least effective? ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 3 ● ● ● ● ● ● ● ● ● (a) Simple random sampling ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● Cluster 8 (b) Cluster sampling ● ● ● ● ● ● ● ● ● ● Cluster 4 ● ● ● ● ● ● ● ● ● ● ● ● (c) Stratified sampling ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 6 ● ● ● ● ● ● ● ● ● ● Cluster 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 8 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 9 / 1 Warm-Up and Data Basics Exploratory Data Analysis Warm-Up and Data Basics Exploratory Data Analysis Explore the Data Visualizing numerical variables When you taste a spoonful of chili and decide it doesn’t taste Intensity map : Useful for displaying the spatial distribution. spicy enough, that’s exploratory analysis . Dot plot : Useful when individual values are of interest. For data analysis, we perform exploratory data analysis , or EDA , to determine trends in features that may be present in the data. Histogram : Provides a view of the data density , and are especially convenient for describing the shape of the data The distribution of a variable is a list of possible values the distribution. variable can take and how often it takes each of those values. Box plot : Especially useful for displaying the median, quartiles, Distributions are critical to assessing the probability of events. unusual observations, as well as the IQR. Plots are almost always useful for visualizing relationships and distributions in the data. Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 10 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 11 / 1

Warm-Up and Data Basics Exploratory Data Analysis Warm-Up and Data Basics Exploratory Data Analysis Why visualize? Why visualize? And let’s take a closer look at Durham. Describe the spatial distribution of race/ethnicity in the US. http://demographics.coopercenter.org/DotMap/index.html Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 12 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 13 / 1 Warm-Up and Data Basics Exploratory Data Analysis Warm-Up and Data Basics Exploratory Data Analysis Scatterplot Cars: ... vs. weight Scatterplots are useful for visualizing the relationship between two From the cars data: numerical variables. 60 miles per gallon (city rating) Do life expectancy and total fertil- 40 50 price ($1000s) ity appear to be associated or in- 40 dependent ? 30 30 20 Was the relationship the same 20 10 throughout the years, or did it 2000 2500 3000 3500 4000 2000 3000 4000 change? weight (pounds) weight (pounds) What do these scatterplots reveal about the data? How might they be useful? http://www.gapminder.org/world Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 14 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 15 / 1

Announcements U nit 1: I ntroduction to data L ecture 2: E xploratory - PowerPoint PPT Presentation

Warm-Up and Data Basics Announcements U nit 1: I ntroduction to data L ecture 2: E xploratory data analysis S tatistics 101 Nicole Dalzell May 14, 2015 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 2 / 1 Warm-Up and Data Basics

U nit 1: I ntroduction to data L ecture 1: D ata collection , observational studies , and

Announcements U nit 4: I nference for numerical variables L ecture 1: T wo samples - paired and

Announcements U nit 2: P robability and distributions L ecture 2: B inomial and N ormal

Announcements U nit 2: P robability and distributions L ecture 1: P robability and conditional

Part 1 Part 1 I ntroduction Review of I ntroduction Review of I ntroduction, Review of I

Measures of Spread The population Variance , 2 , measures each observations U nit 1: I

U nit 2: P robability and distributions L ecture 2: N ormal distribution S tatistics 101 Mine C

L AB L ECTURE 1: I NTRODUCTION TO ROS I NSTRUCTOR : G IANNI A. D I C ARO P ROBLEM ( S ) IN ROBOTICS

ZDLRA @ METRONOM 1 0 .2 4 .2 0 1 8 1 I ntroduction Agenda 2 Mission 3 Best Practices 4

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 2: I NTRODUCTION TO M ACHINE L EARNING Ilya

Skip Day? U nit 5: I nference for categorical variables The table below shows the number of pupils

L ECTURE 1: I NTRODUCTION T EACHER : G IANNI A. D I C ARO C OLLECTIVE I NTELLIGENCE ? Group of

L ECTURE 1: I NTRODUCTION Prof. Julia Hockenmaier juliahmr@illinois.edu Welcome to CS 446!

L ECTURE 1: I NTRODUCTION I NSTRUCTOR : G IANNI A. D I C ARO C OLLECTIVE I NTELLIGENCE ? Group of

I ntroduction to population PKPD modelling modelling I ntroduction to population PKPD in

I ntroduction I ntroduction CO CO W I NDALCO W I NDALCO to to Contribution to the Econom y

Descripti v e statistics P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON

Descriptive Statistics Central Tendency Variation Mean and Standard Deviation of Grouped Data

Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science

Essential Data Preparation, Descriptive Statistics and Visualizations with Examples for Slides

Experimental Analysis Marco Chiarandini Department of Mathematics & Computer Science

Introduction to Statistical Inference Edwin Leuven Introduction Define key terms that are

Section 1.1 Population and Sample WHAT IS STATISTICS? Statistics is a group of methods used to

Adding data analysis to a mathematical statistics course Johanna Franklin Hofstra University

Sambuz

Useful Links

Newsletter

Mail Us

Announcements U nit 1: I ntroduction to data L ecture 2: E xploratory - PowerPoint PPT Presentation

Warm-Up and Data Basics Announcements U nit 1: I ntroduction to data L ecture 2: E xploratory data analysis S tatistics 101 Nicole Dalzell May 14, 2015 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 2 / 1 Warm-Up and Data Basics

U nit 1: I ntroduction to data L ecture 1: D ata collection , observational studies , and

Announcements U nit 4: I nference for numerical variables L ecture 1: T wo samples - paired and

Announcements U nit 2: P robability and distributions L ecture 2: B inomial and N ormal

Announcements U nit 2: P robability and distributions L ecture 1: P robability and conditional

Part 1 Part 1 I ntroduction Review of I ntroduction Review of I ntroduction, Review of I

Measures of Spread The population Variance , 2 , measures each observations U nit 1: I

U nit 2: P robability and distributions L ecture 2: N ormal distribution S tatistics 101 Mine C

L AB L ECTURE 1: I NTRODUCTION TO ROS I NSTRUCTOR : G IANNI A. D I C ARO P ROBLEM ( S ) IN ROBOTICS

ZDLRA @ METRONOM 1 0 .2 4 .2 0 1 8 1 I ntroduction Agenda 2 Mission 3 Best Practices 4

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 2: I NTRODUCTION TO M ACHINE L EARNING Ilya

Skip Day? U nit 5: I nference for categorical variables The table below shows the number of pupils

L ECTURE 1: I NTRODUCTION T EACHER : G IANNI A. D I C ARO C OLLECTIVE I NTELLIGENCE ? Group of

L ECTURE 1: I NTRODUCTION Prof. Julia Hockenmaier juliahmr@illinois.edu Welcome to CS 446!

L ECTURE 1: I NTRODUCTION I NSTRUCTOR : G IANNI A. D I C ARO C OLLECTIVE I NTELLIGENCE ? Group of

I ntroduction to population PKPD modelling modelling I ntroduction to population PKPD in

I ntroduction I ntroduction CO CO W I NDALCO W I NDALCO to to Contribution to the Econom y

Descripti v e statistics P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON

Descriptive Statistics Central Tendency Variation Mean and Standard Deviation of Grouped Data

Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science

Essential Data Preparation, Descriptive Statistics and Visualizations with Examples for Slides

Experimental Analysis Marco Chiarandini Department of Mathematics &amp; Computer Science

Introduction to Statistical Inference Edwin Leuven Introduction Define key terms that are

Section 1.1 Population and Sample WHAT IS STATISTICS? Statistics is a group of methods used to

Adding data analysis to a mathematical statistics course Johanna Franklin Hofstra University

Sambuz

Useful Links

Newsletter

Mail Us

Experimental Analysis Marco Chiarandini Department of Mathematics & Computer Science