Experimental epidemiology analyses with R and R commander Lars T. - - PowerPoint PPT Presentation

experimental epidemiology
SMART_READER_LITE
LIVE PREVIEW

Experimental epidemiology analyses with R and R commander Lars T. - - PowerPoint PPT Presentation

Experimental epidemiology analyses with R and R commander Lars T. Fadnes Centre for International Health University of Bergen 1 Click to add an outline 2 How to install R commander? - install.packages("Rcmdr",


slide-1
SLIDE 1

1

Experimental epidemiology analyses with R

and R commander

Lars T. Fadnes Centre for International Health University of Bergen

slide-2
SLIDE 2

2

  • Click to add an outline
slide-3
SLIDE 3

3

How to install R commander?

  • install.packages("Rcmdr", dependencies=TRUE)
  • Download necessary web files put them in a folder:

http://statistics.fadnes.net/epi

– Fieldtrials12.RData – Exercises-for-the-exp-epi-2012-03.rtf

slide-4
SLIDE 4

4

  • Install R (if not done already)

– http://www.r-project.org/

  • Install package Rcmdr (if not done already)

– Package Install package scroll down to Rcmdr click OK

  • Install package Epi

– Package Install package scroll down to Rcmdr click OK

  • Download RcmdrPlugin.Cih.Epi from

http://statistics.fadnes.net/epi/

  • Install package(s) from local zip file (choose the package you just downloaded)
  • Open program

– Load package Rcmdr – Tools -> load Rcmdr plug-in(s) RcmdrPlugin.Cih.Epi

  • click ok and Yes
  • Now you are ready

Installation of RcmdrPlugin.Cih.Epi

slide-5
SLIDE 5

5

Aim for this session

  • Introduce a brilliant tool
  • Analyse a dataset with the tool
  • Guide to further knowledge
slide-6
SLIDE 6

6

What is R?

  • R is a free software environment that

includes a set of base packages for graphics, math, and statistics.

  • You can make use of specialized

packages contributed by R users or write your own new functions.

slide-7
SLIDE 7

7

Why R?

  • Very powerful
  • Developing extremely quickly
  • Working on different platforms (not only

Microsoft Windows…)

  • Free of all costs
slide-8
SLIDE 8

8

Why don’t all use R?

  • Click to add an outline
slide-9
SLIDE 9

9

What is R commander?

slide-10
SLIDE 10

11

Why R commander?

  • Powerful
  • Free of all costs
  • Working on different platforms (not only

Microsoft Windows…)

  • Easy to learn and to use…
slide-11
SLIDE 11

12

How to install R?

For Windows:

  • http://cran.r-project.org/bin/windows/base/

– Easy

  • If here at UiB, the IT department will fix it for you if you just ask them

to add it for you For Linux (Ubuntu etc):

  • Very Easy
  • Just search for R-base-core in Synaptic Package Manager and add it
  • http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/installation-notes.html

(also contains a good description for installation on Mac)

slide-12
SLIDE 12

13

How to install R commander?

  • Is already installed on UiB computers
  • If not:

install.packages("Rcmdr", dependencies=TRUE)

slide-13
SLIDE 13

14

Some things to note first

  • R is case-sensitive

– help, Help, HELP and HELF are different… – Recommendation:

  • Choose one style and stick to it
  • If it’s something you don’t know?

– There are lot’s of good information on the web

  • Particularly for R
slide-14
SLIDE 14

15

How to start?

  • Open R

– Load packages

Rcmdr

  • r write
  • library(Rcmdr)
slide-15
SLIDE 15

16

Menu

  • File Menu:

– items for loading and saving script files; – for saving output and the R workspace; – and for exiting

  • Edit Menu:

– items (Cut, Copy, Paste, etc.) for editing the contents of the script and output windows.

  • Data

– Submenus containing menu items for reading and manipulating data.

  • Statistics

– Submenus containing menu items for a variety of basic statistical analyses.

slide-16
SLIDE 16

17

Menu

  • Graphs

– Menu items for creating simple statistical graphs.

  • Models

– Menu items and submenus for obtaining numerical summaries, confidence intervals, hypothesis tests, diagnostics, and graphs for a statistical model, and for adding diagnostic quantities, such as residuals, to the data set.

  • Distributions

– Probabilities, quantiles, and graphs of standard statistical distributions (to be used, for example, as a substitute for statistical tables) and samples from these distributions.

  • Tools

– Menu items for loading R packages unrelated to the Rcmdr package (e.g., to access data saved in another package), and for setting some options.

  • Help Menu

– items to obtain information about the R Commander (including this manual). As well, each R Commander dialog box has a Help button (see below).

slide-17
SLIDE 17

18

  • Script Window

– R commands generated by the R Commander – You can also type R commands directly into the script window or the R Console – The main purpose of the R Commander, however, is to avoid having to type commands.

  • Output Window

– Printed output

  • Messages Window

– Displays error messages, warnings, and notes

  • Graphics Device window

– When you create graphs, these will appear in a separate window outside of the main R Commander window.

slide-18
SLIDE 18

19

Available functions:

slide-19
SLIDE 19

20

Click to add title

slide-20
SLIDE 20

21

Save dataset under your documents folder

Files and documents are available at http://statistics.fadnes.net/epi

slide-21
SLIDE 21

22

Let’s get started…

  • Change directory (under File)

– Find the folder where you placed your data file

  • Import data

– Give it the name: fieldtrials

  • Save workspace as

– Give a name to your file – The file contains the dataset and

any models you might have generated

slide-22
SLIDE 22

23

Data Types

  • Vectors

– Quantitative difference (one vs. two apples) – including continuous (numerical) variables – Number variables coded as vectors as default

  • Factors

– Qualitative difference (apples vs. pears) – Categorical – Text variables coded as factors as default

  • Matrices, lists, arrays and data frames

http://www.statmethods.net/input/datatypes.html

slide-23
SLIDE 23

24

Variables in dataset - define the datatypes

id id number gender male/female - (factor) treatmentarm Treatment (1=zinc, 0=placebo) - (factor) childage Age of the child in months - vector breastfed Is the child breast fed - (factor) lentils Does the child eat lentils? (0=no, 1= yes) - (factor) meat Does the child eat meat? (0=no, 1= yes) - (factor) duration Duration of diarrhea in days - vector diarsev Severe diarrhoea ≥10 stools per day - (factor) fever Did the child have fever at enrollment? - (factor) clusterzn2/4/8/16 cluster variables identifying living areas

  • coded as vector, but needs to be transformed into a factor
slide-24
SLIDE 24

25

How to save?

  • Save R workspace as…

– This will save your data (in the R format)

  • Save output as…

– This will save your output – Another strategy is to cut and paste what you want to save

  • Always save the commands (syntax)

– essential if you want to re-run the analyses later – WordPad is a better option than Word etc (does not autocorrect - change to upper case etc)

slide-25
SLIDE 25

26

How to write and a command?

  • Simply write the command in the script

window, mark it and click ’Submit’ or press Ctrl+R

slide-26
SLIDE 26

27

Nice to know:

  • When writing comments in the syntax,

start with the following sign #

– R will then not consider the line as a command

  • If you are uncertain about a function, use

google or help(name-of-function)

slide-27
SLIDE 27

28

– Cluster has numbers and is as default coded as vector, but needs to be recoded into a factor (categorical variables for grouping etc)

slide-28
SLIDE 28

29

Now we’re ready to answer some scientific questions…

slide-29
SLIDE 29

30

Compute new variable

  • Child age in years
  • Child age now given in months
  • of vaccination is often calculated by measuring

antibodies before and after vaccination

  • childageyear = childage/ 12
slide-30
SLIDE 30

31

Does the new variable look reasonable?

  • View data set
slide-31
SLIDE 31

32

Summarize variable

  • Calculate mean, median and standard

deviation for

– childageyear

  • for each intervention arm (treatmentarm)
  • Numerical summaries
slide-32
SLIDE 32

33

Doing calculations for subsets by generating new datasets

slide-33
SLIDE 33

34

  • Placebo:

treatmentarm == "placebo"

  • Zinc:

treatmentarm == "zinc" Make the other subset by changing the syntax and run ('Submit') the syntax

zinc <- subset(fieldtrials3, subset=treatmentarm=="zinc") placebo <- subset(fieldtrials3, subset=treatmentarm=="placebo")

slide-34
SLIDE 34

35

– You can now easily change between the datasets

slide-35
SLIDE 35

36

Make a histogram of childageyear

  • First for the 'zinc' dataset
  • Then for the 'placebo' dataset
  • Are they look similar?

Note: – The histograms will be printed in the R window (not inside R commander) – Right click on the graph and you can copy it as metafile to paste it into a document, print it or save it

slide-36
SLIDE 36

37

  • Does it look normally distributed?

placebo$childageyear frequency 0.5 1.0 1.5 2.0 2.5 10 20 30 40

slide-37
SLIDE 37

38

Box plot

  • Box plot for childageyear

– By treatmentarm – (first remember to select the complete fieldtrials dataset)

placebo zinc 0.5 1.0 1.5 2.0 2.5 treatmentarm childageyear

slide-38
SLIDE 38

39

Is the baseline child age different in the zinc and placebo arms?

  • This can be checked with a robust test not

assuming normal distribution?

  • Check with a

– non-paramethric » two-sample wilcoxon test (log rank test) » Use the ’Exact’ test

slide-39
SLIDE 39

40

Recoding variables

Diarrhoea duration can be recoded into an additional categorised variable (diarlong) – Data  manage variables

  • value = ”factor”

– Factor can be either number or word

  • value, value, value = ”factor”

– Listed with comma

  • value:value = ”factor”

– From lowest to highest values

  • else

all other values

  • NA

missing

slide-40
SLIDE 40

41

Click to add title

slide-41
SLIDE 41

42

Are there differenses in syntax between R and R commander?

  • Some few:

– Commands that extend over more than one line should have the second and subsequent lines indented by one or more spaces or tabs; all lines of a multiline command must be submitted simultaneously for execution. – Commands that include an assignment arrow (<-) will not generate printed

  • utput, even if such output would normally appear had the command been

entered in the R Console [the command print(x <- 10), for example]. On the other hand, assignments made with the equals sign (=) produce printed output even when they normally would not (e.g., x = 10). – Commands that produce normally invisible output will occasionally cause output to be printed in the output window. This behaviour can be modified by editing the entries of the log-exceptions.txt file in the R Commander’s etc directory. – Blocks of commands enclosed by braces, i.e., {}, are not handled properly unless each command is terminated with a semicolon (;). This is poor R style, and implies that the script window is of limited use as a programming editor. For serious R programming, it would be preferable to use the script editor provided by the Windows version of R itself, or – even better – a programming editor.

slide-42
SLIDE 42

43

True or false quiz

  • R commander only works for Ms Windows?
  • R is case sensitive (difference with small and large

letters)

  • R is built by a few people with a secret source-code?
  • R was the program that gave their shareholders most

profit last year?

  • There are a lot of enthusiastic people working with R

providing help to their peers in the R forum?

slide-43
SLIDE 43

44

R help forum:

  • http://r.789695.n4.nabble.com/
slide-44
SLIDE 44

45

Further reading:

  • The R Commander A Basic-Statistics Graphical User Interface to R
  • John Fox 2005.pdf

– http://www.jstatsoft.org/v14/i09/paper

  • Getting started with the R Commander: a basic-statistics graphical

user interface to R

– http://socserv.mcmaster.ca/jfox/Getting-Started-with-the-Rcmdr.pdf

  • Quick-R: magnificent guide

– http://www.statmethods.net/

  • http://cran.r-project.org/manuals.html
slide-45
SLIDE 45

46

  • You have learnt some basic skills and can

now experiment with the program yourself

slide-46
SLIDE 46

47

Questions and comments

  • Click to add an outline