STAT 201 Defence against the Dark Arts (for the Life Sciences) - PDF document

STAT 201 Defence against the Dark Arts (for the Life Sciences) Instructor: Professor Lockhart Office: 10561 E-mail: lockhart@sfu.ca Phone: 3264 Web site: http://www.stat.sfu.ca/~lockhart 1

Text: The Basic Practice of Statistics , 3rd Edition, by David S. Moore, W.H. Freeman Publishers On 2 hour reserve in library. Lectures: Tuesday 10:30 – 12:10 without break. Thursday 10:30–11:20. Help: go to Stat Workshop K9516 inside K9510. Office Hours: Thursday 11:30 – 12:30; Friday 12:30 – 13:30 2

Assignments: six – first five marked. Due: ev- ery second Tuesday by 4:30 PM in boxes outside Stat Workshop. First due on 21 Sept. Late Assignments not accepted. Worth 20% of mark. Based on best 4 of 5 marked. Returned through Stat Workshop Exams: Midterm worth 30% on 21 October 2004. CLOSED BOOK. Final: 15 December 2004. OPEN BOOK worth 50% Make-up exams: medical note required. Missed midterm will be replaced by Final. 3

Web material: slides posted on the web when possible. Assignment questions posted. Solutions posted evening after assignment due. Midterm solutions to be posted eventually. Extra material: perhaps but probably not. Computing: some questions to be done using program JMP. Assignment lab accounts created. JMP available on PCs and Macs in assignment lab. JMP also available in Stat Workshop. 4

Outline Univariate Descrip- 4 hours Chapters 1, 2, 3 tive Statistics Bivariate Descriptive 6 hours Chapters 4, 5, 6 Statistics Experimental Design 2 hours ? Probability 3 hours ? Binomial, Poisson 3 hours ? distributions Hypothesis Tests, 2 hours ? confidence intervals Midterm 1 hour Hypothesis Tests, 2 hours ? confidence intervals Two Sample tests 3 hours ? Inference in Regres- 3 hours ? sion 1 and 2 way ANOVA 3 hours ? Count data 3 hours ? 5

Definition : Defence against the Dark Arts is the science of Data. How should it be collected? How should it be summarized? How should it be displayed? How should it be interpreted? Where are the pitfalls? 6

Jargon Usual structure of data set. Individuals , subjects , cases , experimental units are all jargons used for the people or animals or plants or things on which measure- ments are made. Variables : the things measured. 7

Example : case by variable presentation. Data on sea urchins: Urchin ID Age Size 3997 6.91 57.5 991 0.91 9.5 2163 2.41 29.5 15 0.49 0.5 2202 2.41 30.5 2862 3.42 44.5 1575 1.41 24.5 293 0.49 2.5 358 0.49 3.5 . . . . . . . . . Comment: 9 cases (of 250) shown, 3 variables. Comment: Notice poor scientific form – no units listed for Age or Size in on-line source. 8

Example : weather in Central Park, New York for May Day Max Temp Sunshine Weather 1 72 5 18 2 75 4 1 3 65 4 NA 4 63 0 NA . . . . . . . . . Comment: “Weather” is code. ’1’ means Fog. ‘18’ not listed. Comment: NA means not available. Comment ‘5’ for sunshine means partly cloudy. ‘0’ is clear. 9

Jargon : variable types. Nominal : categories with no particular order. Examples: Variable Sex has 2 “levels”: Male and Female. Variable Eye colour has levels like “blue”, “hazel”, “brown”. Ordinal : categories with an order. Examples: 5 point scales: “Paul Martin is do- ing a good job: Strongly Agree, Agree, Neu- tral, Disagree, Strongly Disagree” Sunshine in NY: 0 is most sunny, 10 is most cloudy. Categorical : either Nominal or Ordinal . Also called “qualitative”. 10

Quantitative : numerical variable like value in $, age, height, weight. Interval : quantitative variable for which dis- tance from 1 to 2 is same as from 3 to 4. Ratio : Like Interval but with a natural value for 0. Discrete : Used for both Categorical variables and for variable with only integer values. Continuous : values between integers (in prin- ciple as finely measured as desired) Examples: Mass is ratio, temperature in de- grees Celsius is interval, number of murders in a week in Vancouver is quantitative but dis- cretes, temperature is continuous.. Note: 5 point scales (“Likert” scales in Psy- chology) often assigned numbers say 1-5 or 0- 4. But is difference between “Strongly agree” and “agree” same as between “agree” and “neu- tral”? 11

Why the jargon? Sometimes helps identify suitable methods of data presentation, summarization and analysis. WARNING: many different forms of statistical jargon in use in different disciplines. Social Sciences: use nominal , ordinal , interval and ratio . Math Stat: use categorical , quantitative , discrete continuous . WARNING: all labels are sometimes open to debate. Is money “discrete”? (Integer number of pennies but huge number of possible values.) 12

Data Collection Exercise: VOLUNTARY On blank sheet of paper please provide: 1) Height 2) Weight 3) Sex 4) Value of Coins in pocket / purse 5) SFU credits completed. PLEASE DO NOT PUT YOUR NAME ON THIS. Give to me at end of class or put in box outside Stat Workshop. PURPOSE: provide data set to display and summarize 13

Univariate Descriptive Statistics Displays: pie charts, bar graphs, box plots, his- tograms, density estimates, dot plots, stem- leaf plots, tables, lists. Example : sea urchin sizes Boxplot Histogram 60 60 50 Number of Urchins 50 Urchin Size (mm) 40 40 30 30 20 20 10 10 0 0 0 10 20 30 40 50 60 70 Urchin Size (mm) Dot Plot Density 0.015 Density 0.010 0.005 0.000 0 10 20 30 40 50 60 −20 0 20 40 60 80 Urchin Size (mm) Urchin Size (mm) 14

Points: 1) Useful for quantitative variables. 2) Boxplot shows five point summary: mini- mum, first quartile, median, third quartile, max- imum. 3) Dot Plot illegible with 250 data points. (1 dot for each size plotted on line.) 4) Histogram, density plot serve similar purposes. 5) Density goes below 0: bad. 6) Histogram doesn’t show clustering density plot shows. 15

Example : Categorical: Weather in Central Park Pie Chart Bar Graph 10 clear 8 6 partly.cloudy 4 cloudy 2 0 clear partly.cloudy cloudy Pie chart harder to read. General summary: Pie Charts are bad. More useful with more categories. Ordering of categories important for nominal variables. Cloudiness is ordinal. 16

Pie charts: wedge has area proportional to # of individuals in category. Bar chart: bar has height equal to # of individuals in category. Density estimates not discussed in this course. Histogram: 1) divide range of values into intervals. 2) Count numbers of individuals in each interval. 3) bar AREA is proportional to # of individuals in interval; width is length of interval. 4) equal width bars best – then height proportional to # of individuals. 5) label x -axis; include units. 6) label y -axis. 17

Example : Personal Income for BC (ages 15+). (For those with income.) Source: 2001 Cen- sus. Adult Personal Income (BC) 0.03 0.02 0.01 0.00 0 20 40 60 80 100 Income ($000s) 18

Points 1) Bar widths unequal – census tables given that way. 2) So take width times height to get area = fraction of population in that income group. 3) Last group on right open ended – artificially cut off at $100,000 by me. 4) Plot is “long-tailed to the right” or “skewed to the right”. 5) Based on 20% sample of 1,523,720 people aged 15 + in BC on census day, 2001. 19

Comparison of 1996, 2001. 1996 Income Density 0 20 40 60 80 100 2001 Income Density 0 20 40 60 80 100 20

Summarizing the pictures. Purposes: less space in text than a graph; pre- cise numerical comparison between groups. Summarizing a histogram: Where is centre of the x -axis values? Jargon: location or centre . How far do the x values extend on either side? Jargon: spread , variation , width . Is the picture symmetric or does it extend far- ther to right than left? Location and number of bumps. 21

Measures of location: Mean , Arithmetic Mean , Average , Arith- metic Average : total of x -values divided by number of x values. Histogram balances at mean. ( First Moment in physics.) Think of See-Saw: small kid far from centre balances big kid close to centre. Formula: data X 1 , . . . , X n . � n i =1 X i ¯ X = n Utility of summation notation in this course: NIL. But ¯ X is standard notation for average of X . Median : number such that 1/2 of X values at least that large, and 1/2 of X values at least that small. Sort list: if n is odd median is middle of sorted list. If n is even take average of two middle values. 22

Numerical examples: ages in my family: 50 , 50 , 20 , 15 , 8 , 8 . A = 50 + 50 + 20 + 15 + 8 + 8 = 151 ¯ ≈ 25 . 2 6 6 Median age: middle numbers are 15, 20. Halfway between is median = 17.5. Mode : most common value. Not useful con- cept in most cases. Location of tallest bar in histogram (affected by definition of classes). Mode of ages is not unique: 50 or 8. Not useful summary of centre. 23

Comparison: Advantages of mean: 1) if your average weekly income is $100 you know how you will do in the long run; not so if median weekly income is $100. 2) Same point: average and sample size tells you total. 3) Has simpler mathematical behaviour than median. Advantages of median: Not influenced by extreme members of list. Median income, for instance, gives more information about typical person. 24

STAT 201 Defence against the Dark Arts (for the Life Sciences) - PDF document

STAT 201 Defence against the Dark Arts (for the Life Sciences) Instructor: Professor Lockhart Office: 10561 E-mail: lockhart@sfu.ca Phone: 3264 Web site: http://www.stat.sfu.ca/~lockhart 1 Text: The Basic Practice of Statistics , 3rd

Beyond Dark Matter and Dark Energy Sean Carroll Beyond Dark Matter and Dark Energy Sean Carroll,

New Defence Perspective New Defence Perspective New Defence Perspective New Defence Perspective

WELCOME Les Shearn Alliance Facilitator Defence Teaming Centre Defence Industry Liaison

Centre of Defence Pathology Centre of Defence Pathology Impact of Friction upon the BMS

ED EDA A Defence ence Pr Procureme curement nt Ga Gate teway ED EDA Defence ence

blood, but against the rulers, against the authorities, against the powers of this dark world and

Epping Forest Arts Epping Forest Arts Epping Forest Councils Epping Forest Councils Arts

STAT 830 Blank Slides for Notes Richard Lockhart SFU STAT 830 Fall 2020 Richard Lockhart

ASPI-UNISYS Defence and Security Luncheon 26 May 2011 The 2011-12 Defence Budget Mark

Larg arge e Scale ale Larg arge e Scale ale Dark Dark Matte atter r Dark Dark Matte

Doomsday Dark Matter Doomsday Dark Matter or Some stones are better left unturned Doomsday

Dark Halos Dark Halos Dark Halos of Dark Halos of of of M31 and the Milky Way M31 and the

Chapter 22 Dark Matter, Dark Energy, and the Fate of the Universe 22.1 Unseen Influences in the

Chapter 22 Dark Matter, Dark Energy, and 22.1 Unseen Influences in the Cosmos the Fate of the

Creative Arts Dr. Sharon G. Davis Outline The arts in everyday life. Meaningful arts

St. Petersburg Arts Shine Here City of the Arts Arts & Culture 6 Performing Arts

Tries and Suffix Trees Inge Li Grtz String indexing problem String matching problem. Given

HOMELAND SECURITY: CYBER-SECURITY AT THE LOCAL LEVEL Kirk Bailey, CISSP, CISM CISO, UW Ernie

Document Your Software Project! ian.s.dees@tek.com Hi, Im Ian. Im here to talk about

Information Visualization Tables Tamara Munzner Department of Computer Science University of

Assessment of External Hazards Javier Yllera Department of Nuclear Safety and Security Division

Eric Rasmusen, Erasmuse@indiana.edu. http://www.rasmusen The essential elements of a game are

I : The Root System of En . A Very I 't Brief - Moody Algebras Introduction Kac to . 1.2

11-823 Conlanging Orality Orality Orality Language differs without a written form Language

Sambuz

Useful Links

Newsletter

Mail Us

STAT 201 Defence against the Dark Arts (for the Life Sciences) - PDF document

STAT 201 Defence against the Dark Arts (for the Life Sciences) Instructor: Professor Lockhart Office: 10561 E-mail: lockhart@sfu.ca Phone: 3264 Web site: http://www.stat.sfu.ca/~lockhart 1 Text: The Basic Practice of Statistics , 3rd

Beyond Dark Matter and Dark Energy Sean Carroll Beyond Dark Matter and Dark Energy Sean Carroll,

New Defence Perspective New Defence Perspective New Defence Perspective New Defence Perspective

WELCOME Les Shearn Alliance Facilitator Defence Teaming Centre Defence Industry Liaison

Centre of Defence Pathology Centre of Defence Pathology Impact of Friction upon the BMS

ED EDA A Defence ence Pr Procureme curement nt Ga Gate teway ED EDA Defence ence

blood, but against the rulers, against the authorities, against the powers of this dark world and

Epping Forest Arts Epping Forest Arts Epping Forest Councils Epping Forest Councils Arts

STAT 830 Blank Slides for Notes Richard Lockhart SFU STAT 830 Fall 2020 Richard Lockhart

ASPI-UNISYS Defence and Security Luncheon 26 May 2011 The 2011-12 Defence Budget Mark

Larg arge e Scale ale Larg arge e Scale ale Dark Dark Matte atter r Dark Dark Matte

Doomsday Dark Matter Doomsday Dark Matter or Some stones are better left unturned Doomsday

Dark Halos Dark Halos Dark Halos of Dark Halos of of of M31 and the Milky Way M31 and the

Chapter 22 Dark Matter, Dark Energy, and the Fate of the Universe 22.1 Unseen Influences in the

Chapter 22 Dark Matter, Dark Energy, and 22.1 Unseen Influences in the Cosmos the Fate of the

Creative Arts Dr. Sharon G. Davis Outline The arts in everyday life. Meaningful arts

St. Petersburg Arts Shine Here City of the Arts Arts &amp; Culture 6 Performing Arts

Tries and Suffix Trees Inge Li Grtz String indexing problem String matching problem. Given

HOMELAND SECURITY: CYBER-SECURITY AT THE LOCAL LEVEL Kirk Bailey, CISSP, CISM CISO, UW Ernie

Document Your Software Project! ian.s.dees@tek.com Hi, Im Ian. Im here to talk about

Information Visualization Tables Tamara Munzner Department of Computer Science University of

Assessment of External Hazards Javier Yllera Department of Nuclear Safety and Security Division

Eric Rasmusen, Erasmuse@indiana.edu. http://www.rasmusen The essential elements of a game are

I : The Root System of En . A Very I 't Brief - Moody Algebras Introduction Kac to . 1.2

11-823 Conlanging Orality Orality Orality Language differs without a written form Language

Sambuz

Useful Links

Newsletter

Mail Us

St. Petersburg Arts Shine Here City of the Arts Arts & Culture 6 Performing Arts