Quantitative and Computational skills 58I Lab and Prof Skills II 2 - - PowerPoint PPT Presentation

quantitative and computational skills
SMART_READER_LITE
LIVE PREVIEW

Quantitative and Computational skills 58I Lab and Prof Skills II 2 - - PowerPoint PPT Presentation

1 Quantitative and Computational skills 58I Lab and Prof Skills II 2 Overview Overview of Q&C strand Approach: what are Q&C / Data skills How it fits with stage 1: Stage 1 and 2 Roadmaps Fit with other strands Stage 1 Revision


slide-1
SLIDE 1

Quantitative and Computational skills

58I Lab and Prof Skills II

1

slide-2
SLIDE 2

Overview

Overview of Q&C strand Approach: what are Q&C / Data skills How it fits with stage 1: Stage 1 and 2 Roadmaps Fit with other strands Stage 1 Revision Overview of the session content Why script? Why R Independent study ideas

2

slide-3
SLIDE 3

Overview of Q&C strand and fit with ED and BT

Exposure to, and practice in, a variety of techniques Can’t cover everything you’ll ever need but topics are: Foundational Applicable to all Universal => in this module Highly transferable and beyond

  • > option leaders for option specific analyses

3

slide-4
SLIDE 4

Learning Objectives

1. To be able to generate a testable hypothesis. 2.

To design and conduct experiments to test this hypothesis, with appropriate controls.

3. To have practical experience of a range of techniques relevant to the discipline. 4. To work effectively within a team. 5. To be able to write a scientific report based on practical work. 6. To communicate scientific information and ideas in the form of a variety of media to a variety of audiences.

7. To use appropriate graphical methods to produce data figures with appropriately detailed legends. 8. To use relevant statistical or other analytical methods to analyse data.

9. To research scientific literature in a given area, and write an extended and well-structured account.

4

slide-5
SLIDE 5

5

What are Data Skills: actions with data

Tidy

(the mental model and the activity)

Import Transform Explore Model

‘statistics’ ~15%

Report Simulate

slide-6
SLIDE 6

Reproducible actions with data

6

Reproducibly

Tidy Import Transform Explore Model Report Simulate

slide-7
SLIDE 7

7

Reproducibly

ROADMAP: Stage 1 MLO

Tidy Import Transform Explore Model Report

From files - all but unusually complex .txt, .xlsx, .csv, .sav, .dta Relative paths Separators ..and more Everything scripted Code commenting Organisation of analysis What ‘tidy’ data are but little tidying. Changing variable names and types Factor levels Wide to long reshaping Simple plots: histograms Normality testing Summary stats Fundamental concepts in hypothesis testing CI, Linear models (t-tests, ANOVA, regression), chi-sq, Mann-Whitney Wilcoxon, Kruskal Wallis, correlation Multiple comparison Selection: Assumptions Not really fit “significance, direction, magnitude” Figures: legends, saving Not fully reproducibly Little: ranking

Introductory Simulate

Abstraction

slide-8
SLIDE 8

8

Stage 1

Aut: numeracy, early data skills Spr: main teaching block, primarily data analysis Sum: reinforcement and development

slide-9
SLIDE 9

9

Stage 2

slide-10
SLIDE 10

10

Reproducibly

Stage 2 MLO

Tidy Import Transform Explore Model Report

Identification and removal of outliers and NA Stage 1 tests in LM framework (increased conceptual complexity) More LM GLM - Binomial and Poisson Odds ratios Deviance measures of fit More on Multiple comparisons Non-linear regression ~Mixed models FDR GWAS Proportions Z score standardisation Coefficient of variation Log to base 2 Subtraction of noise/background Scaling/reversing experimental steps PCR Relative quantification RPKM quantification Identifying non-independence and pseudo replication in experimental design Multi panel figures Complex domain specific viz Volcano plots

Introductory Intermediate Simulate

Abstraction Running and interpreting particular models

slide-11
SLIDE 11

Emma Rand

Reproducibly

Tidy Import Transform Explore Model Report

11

Introductory Intermediate Advanced Simulate

Reproducibly

Stage 1

Tidy Import Transform Explore Model Report Simulate

Reproducibly Reproducibly

Stage 2

Tidy Import Transform Explore Model Report Simulate Tidy Import Transform Explore Model Report Simulate

Reproducibly Reproducibly

Stage 4

Tidy Import Transform Explore Model Report Simulate Tidy Import Transform Explore Model Report Simulate

Reproducibly

Tidy Import Transform Explore Model Report Simulate

slide-12
SLIDE 12

Stage 1 Revision: experiments and analysis

Something we measure Some things we control, choose or set

Relationship

12

Response variable

Dependent variable The ‘y’ s

Predictor variables

Independent variable(s) The ‘x’ s Can be explained by

function(y ~ x) function(y ~ x1 * x2)

slide-13
SLIDE 13

Stage 1 Revision: Choice of analysis (test)?

can be explained by

What relationship links the predictors to the response? Linear?

13

Type of Response variable

Continuous or discrete? If continuous, normal?

Predictor variables

Number: one? or more? Type: continuous? categories?

function(y ~ x) function(y ~ x1 * x2)

slide-14
SLIDE 14

Revision: Stage 1 analyses (tests)

can be explained by

What relationship links the predictors to the response? Linear?

14

Type of Response variable

Continuous or discrete? If continuous, normal?

Predictor variables

Number: one or two Continuous: regression Categories: t-tests and ANOVA

function(y ~ x) function(y ~ x1 * x2)

slide-15
SLIDE 15

From Bolker, 2007

15

Why those analyses??

slide-16
SLIDE 16

Spring Quant & Comp Experimental Design

1hr lecture - Introduction (ER) 2hr Workshop - Revision and thinking about analysis before experimental design (ER) 2x 2hr Workshop - Data analysis - Building from linear models to Generalised linear models (ER) 2hr Workshop - Visualising data (ER) 2hr Drop-in 2hr Workshop - Problems in time (JWP) 2hr Drop-ins for each Bioscience Technique strand

Autumn Bioscience techniques

16

Overview of Session content

slide-17
SLIDE 17

Approach

17

A bit different from last year: No lectures!

Independent study in the form of Prior learning: Slides + Short recordings Workshop: Workbook - You are not expected do all of the workbook examples. Choose the examples from each section that best match your biological interests. Independent study Practice, anything! More advanced examples, Other workshop examples? Examples from last year? Rloggers?

slide-18
SLIDE 18

W01: Revision and thinking about analysis before experimental design

18

Some things we control, choose or set

can be explained by

Something we measure

slide-19
SLIDE 19

W01: Revision and thinking about analysis before experimental design

Something we measure….Think about data in the broadest sense: how do we ensure we get ‘good’ data?

What and how to measure Reliability Precision Transformations, normalisations Do you need statistics? Limits of interpretation Independence of data points

19

slide-20
SLIDE 20

From Bolker, 2007

20

slide-21
SLIDE 21

W01: Revision and thinking about analysis before experimental design

By following the slides and applying the techniques to select examples from the workbook the successful student will be able to:

  • Recognise non-independence and pseudoreplication
  • Select appropriately, and apply some methods to make data comparable
  • Design experiments to take account of these

Slides: outline concepts about experimental design, non-independence, pseudoreplication and data comparability Workbook: practice in recognising non-independence and pseudoreplication and in applying ‘normalisation’ Experimental Design and Bioscience Techniques: practice in designing experiments and analysing and presenting results

21

slide-22
SLIDE 22

W02: Building from Linear Models to Generalised Linear Models Part 1

t-tests, ANOVA and regression are Linear Models! Revisit tests in the framework of the General Linear Model. More extendable We will learn to apply and interpret the lm() function. By following the slides and applying the techniques to select examples from the workbook the successful student will be able to:

  • Explain the the link between t-tests, ANOVA and regression
  • Appropriately apply linear models using lm()
  • Interpret the results using summary() and anova() and relate them to the outputs of t.test() and aov()

22

slide-23
SLIDE 23

W03: Building from Linear Models to Generalised Linear Models Part 2

We will learn to apply and interpret the glm() function for when your response variable is not continuous but a count or a binary outcome. By following the slides and applying the techniques to select examples from the workbook the successful student will be able to:

  • Explain the link between the general linear models and the generalised linear model
  • Recognise where a generalised linear model would be appropriate and apply glm()
  • Determine which effects are significant using using summary() and anova()

23

slide-24
SLIDE 24

24

Reproducibly: scripting Reproducibly: protocol, lab book

The rationale for scripting analysis

Explanatory variables

Choose / set / manipulate

Experiments

(tests of ideas)

Response variables

measure

Experimental design Analyse Visualise Interpret and report

slide-25
SLIDE 25

25

Why R?

  • R caters to users who do not see themselves as programmers, but then

allows them to slide gradually into programming

  • Community
  • Language for data analysis
  • Open source, Free,
  • Graphics
  • Reproducibility
slide-26
SLIDE 26

Assessment

Opportunities to express competency in Experimental Design and Bioscience Techniques (and elsewhere)

Becoming Competent

Make it fun. Practice and engage with people. The workshops are not a test. It is expected that you make a lot of mistakes and need help. Talk to each other, demonstrators and lecturers. “There are two ways to write error free code and only the third way works” You can optionally stretch yourself by asking questions in class, creating additional figures, or doing ‘More advanced examples’ in some cases

26

slide-27
SLIDE 27

Recommended independent study

Revise: 17C and 8C VLE or http://www-users.york.ac.uk/~er13/ Variable types, data structures Revision lecture L10 notes: any familiarisation will help Play: Datacamp Put R and RStudio on your own pc/mac #biol58I RBloggers https://buzzrbeeline.blog/

27

slide-28
SLIDE 28

Questions?

28