linear models ii
play

Linear Models II Design of Experiments, Analysis of Variance and - PDF document

Linear Models II Design of Experiments, Analysis of Variance and Multiple Regression http://bcf.isb-sib.ch/teaching/introStat/ EMBnet Course Introduction to Statistics for Biologists, Jan 2009 The research process Scientific


  1. Linear Models II Design of Experiments, Analysis of Variance and Multiple Regression http://bcf.isb-sib.ch/teaching/introStat/ EMBnet Course – Introduction to Statistics for Biologists, Jan 2009 The research process � � Scientific question of interest � � Decision on what data to collect (and how) � � Collection and analysis of data � � Conclusions, generalization � � Communication and dissemination of results EMBnet Course – Introduction to Statistics for Biologists

  2. Generic Question : Does a ‘treatment’ have an ‘effect’? Examples : � � Does wine prevent cancer? � � Does smoking cause lung cancer? � � Does milk reduce osteoporosis? � � Does physical exercise slow artheriosclerosis? � � Does statin treatment lower blood lipids? EMBnet Course – Introduction to Statistics for Biologists Experimental Design – why do we care? � � Poor design costs: – � time, money, ethical considerations � � To ensure relevant data are collected, and can be analyzed to test the scientific hypothesis/ question of interest – � Decide in advance how data will be analyzed – � ‘Designing the experiment’ = ‘Planning the analysis’ � � The design is about the science (biology) EMBnet Course – Introduction to Statistics for Biologists

  3. Planning an Experiment � � What measurements to make ( response ) � � What conditions to study ( treatments ) � � What experimental material to use ( units ) A “good” experiment � � tests what you want to test / estimates the effects you are interested in � � controls for everything else (exclusion, blocking, adjustment) to avoid bias and confounding EMBnet Course – Introduction to Statistics for Biologists Example Cancer Diagnosis � � Blood samples were taken from 25 cancer patients and � � a control group of 25 healthy people. � � The healthy people were a consecutive series that came to hospital as blood donors. � � The laboratory analyzed the “positive” samples in March and the “negative” samples in April. � � What can go wrong in this study? EMBnet Course – Introduction to Statistics for Biologists Jan 2009

  4. Example Agricultural experiment • � Response = crop yield • � Treatments Two different sorts of potatoes are compared • � Units Two pieces of land can be used Field 1 Field 2 EMBnet Course – Introduction to Statistics for Biologists Jan 2009 Example: two blocks Type A Block 1 Type B Block 2 Is this a good design ? EMBnet Course – Introduction to Statistics for Biologists Jan 2009

  5. Blocking and Replication 5 replicas for each treatment in the first Block 1 block and 8 in the Block 2 second . • � replication is needed to estimate the scale of random effects measurement errors • � fields are subdivided into smaller areas; the choice of potato sort of to be planted is randomized inside the two blocks EMBnet Course – Introduction to Statistics for Biologists Jan 2009 Addressing the question � � A basic means to address this type of question involves comparing two groups of study subjects – � Control group: provides a baseline for comparison – � Treatment group: group receiving the ‘treatment’ EMBnet Course – Introduction to Statistics for Biologists

  6. Types of variability � � Planned systematic (difference between the conditions, wanted) � � Chance variation (can handle this with statistical models) � � Unplanned systematic differences ( NOT wanted) – � Can bias results – � Can only be corrected for if it can be included in the model (adjusting) – � e.g. time of measurements EMBnet Course – Introduction to Statistics for Biologists Confounding factors � � Ideally, both the treatment and control groups are exactly alike in all respects (except for group membership) � � A confounding factor (or confounder ) is associated with both the group membership and the response � � Example: strong association of gender and lung cancer, confounded by smoking � � Unbalanced factors that are not associated with response are not confounding EMBnet Course – Introduction to Statistics for Biologists

  7. Replication, Randomization, Blocking � � Replication – to reduce random variation of the test statistic, increases generalizability � � Randomization – to remove bias � � Blocking – to reduce unwanted variation � � Idea here is that units within a block are similar to each other, but different between blocks � � ‘Block what you can, randomize what you cannot’ EMBnet Course – Introduction to Statistics for Biologists Experimental vs. Observational studies � � Controlled experiment : subjects assigned to groups by the investigator – � randomization : protects against bias in assignment to groups – � blind , double-blind : protects against bias in outcome assessment/measurement – � placebo : fake ‘treatment’ � � Observational study : subjects ‘assign’ themselves to groups – � confounder : associated with both group membership and the outcome of interest EMBnet Course – Introduction to Statistics for Biologists

  8. Observational studies � � Advantages – � often easier to carry out – � don’t ‘interfere’ with the system, what you see is ‘ natural ’ rather than ‘artificial’ – � variation is biologically relevant , as it has been unaltered – � sometimes manipulation is not possible � � Drawbacks – � confounders EMBnet Course – Introduction to Statistics for Biologists Hibernation example � � General question: How do changes in an animal’s environment cause the animal to start hibernating? � � What changes should be studied ?? – � temperature – � photoperiod (day length: long or short) � � What measurement(s) to take? – � nerve activity enzyme (Na + K + ATP-ase) � � What animal to study – � golden hamster, 2 organs (brain, heart) EMBnet Course – Introduction to Statistics for Biologists

  9. Specific question � � General question : How do changes in an animal’s environment cause the animal to start hibernating? � � => Specific question : What is the effect of changing day length on the concentration of the sodium pump enzyme in two golden hamster organs? EMBnet Course – Introduction to Statistics for Biologists Sources of variability � � Variability due to conditions of interest (wanted) – � Day length (long vs. short) – � Organ (heart vs. brains) � � Variability in the response ( NOT wanted): measurement error – � Preparation of enzyme suspension – � Instrument calibration � � Variability in experimental units ( NOT wanted) – � Biological differences among hamsters – � Environmental differences EMBnet Course – Introduction to Statistics for Biologists

  10. Basic designs: Completely randomized � � Focus on 1 organ (heart, say) � � Random assignment: use chance to assign hamsters to long and short days � � ‘Random’ is not the same as ‘haphazard’ � � For balance , assign same number to short and long � � Example (8 hamsters): Long: 4, 1, 7, 2 Short: 3, 8, 5, 6 EMBnet Course – Introduction to Statistics for Biologists Basic designs: Randomized block � � Suppose that the hamsters came from 4 different litters , with 2 hamsters per litter � � Expect hamsters from the same litter to be more similar than hamsters from different litters � � Can take each pair of hamsters and randomly assign short or long to one member of each pair � � Example (coin flip, say): S, L // L, S // S, L // S, L EMBnet Course – Introduction to Statistics for Biologists

  11. Basic designs: Factorial crossing � � Compare 2 (or more) sets of conditions in the same experiment : Long vs. Short and Heart vs. Brain � � In this example, there are 4 combinations of conditions: – � Long/Heart, Long/Brain, Short/Heart, Short/Brain � � Example (2 coin flips, say): L/H: 7, 2 L/B: 4, 1 S/H: 3, 5 S/B: 8, 6 EMBnet Course – Introduction to Statistics for Biologists Basic designs: Split plot/ repeated measures � � First, randomly assign Long days to 4 hamsters and Short days to the other 4 � � Then, use each hamster twice : once to get Heart conc, and once to get Brain conc � � This design has units of different sizes for each factor – � for day length , the unit is a hamster – � for organ , the unit is a part of a hamster EMBnet Course – Introduction to Statistics for Biologists

  12. Summary � � Optimize precision of the estimates among main comparisons of interest � � Must satisfy scientific and physical constraints of the experiment � � You can save a lot of time , money and heart- ache by consulting with an experienced analyst on design issues before any steps of the experiment have been carried out EMBnet Course – Introduction to Statistics for Biologists X categorical- Y continuous � � We can visually inspect the dependence of the distribution of Y given X by a series of boxplot or stripcharts EMBnet Course – Introduction to Statistics for Biologists, Jan 2009

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend