Learning An Introduction for New Users & Advanced Techniques - - PowerPoint PPT Presentation

learning
SMART_READER_LITE
LIVE PREVIEW

Learning An Introduction for New Users & Advanced Techniques - - PowerPoint PPT Presentation

Learning An Introduction for New Users & Advanced Techniques Professional Development Opportunity for the Flow Cytometry Core Facility July 27 th & August 24 th , 2018 LKG Consulting Email: consulting.lkg@gmail.com Website:


slide-1
SLIDE 1

Learning

Professional Development Opportunity for the

Flow Cytometry Core Facility

July 27th & August 24th, 2018

LKG Consulting

Email: consulting.lkg@gmail.com Website: www.consultinglkg.com

An Introduction for New Users & Advanced Techniques

slide-2
SLIDE 2

The goal of this workshop is to introduce (or re- introduce) you to R Programming Language through reference tools and interactive examples. At the end of this workshop you will NOT be an R master (sorry), but you will be have a bunch of knowledge and tricks to help you on your ongoing R adventures.

slide-3
SLIDE 3

Laura Gray-Steinhauer

www.ualberta.ca/~lkgray BSc in Mathematics, Statistics and Environmental Studies (UVIC, 2005) MSc in Forest Biology and Management (UofA, 2008) PhD in Forest Biology and Management (UofA, 2011)

Designated Professional Statistician with The Statistical Society of Canada (2014) Research: Climate Change, Policy Evaluation, Adaptation, Mitigation, Risk management for forest resources, Conservation…

A little about me…

slide-4
SLIDE 4

Learning R: An Introduction for New Users & Advanced Techniques

Workshop Schedule

8:15 – 8:30 Arrive to the Lab & Start up the computers 8:30 – 8:45 Welcome to the Workshop (housekeeping & today’s goals)

Introduction to R

8:45 – 9:05 Unit 1: Getting started in R (script files, working directories, RStudio, etc.) 9:05 – 9:30 Unit 2: Data Preparation in R (import/export, missing values, modes, classes, etc.) 9:30 – 10:30 Work period (questions are welcome) 10:30 – 10:45 Break 10:45 – 11:15 Unit 3: Data Management in R (tidyr and dplyr packages, etc.) 11:15 – 12:15 Work period (questions are welcome) 12:15 – 1:00 Lunch

Advanced R

1:00 – 1:30 Unit 4: Control Structures (loops, apply functions, etc.) 1:30 – 1:45 Unit 5: Functions (build your own) 1:45– 2:45 Work period (questions are welcome) 2:45 – 3:00 Break 3:00 – 3:30 Unit 6: Graphics (ggplot2 package) 3:30 – 5:00 Work period (questions welcome) After 5:00 Enjoy your weekend!

slide-5
SLIDE 5

Workbook

  • Yours to keep!
  • Goes into MANY more examples

than we will have time to go through in this course.

  • R code is identified by Century

Gothic font (everything else is Arial)

  • Arbitrary object names are bold to

indicate these could change depending on what you name your variables.

  • Referenced data is provided on my

personal website:

www.ualberta.ca/~lkgray

  • Please contact me to obtain

permission to redistribute content

  • utside of the workshop attendees.

consulting.lkg@gmail.com

slide-6
SLIDE 6

Unit 1: Getting Started in R

Introduction to R

slide-7
SLIDE 7

R Project Website

https://cran.r-project.org/index.html

slide-8
SLIDE 8

Working with numbers: 2+3 A=2+3 A a (oops, R is case sensitive) B=7 A+B C=A+B C Working with vectors: X=c(1,4,3,5,7) Y=c(5,7,9,4,8) mean(X) sd(X) X*10 Z=Y+3 Z boxplot(X,Y,Z) t.test(X,Y) t.test(X,Z) Working with tables & matrices: K=as.data.frame(cbind(X,Y, Z)) X=X*10 K (oops, nothing happened?) K$X=K$X*10 t(K) plot(K)

R In It’s simplest form…. A calculator

  • Open R Workspace and enter the following code into the

Console window – What happens?

slide-9
SLIDE 9
  • #1 Feature – SAVABLE!
  • This means you can come back to your script files and

re-run your code without having to type.

  • Always have a record of what you have done
  • Modifications are easy to execute
  • Get into the habit of creating or opening a script file

being the first thing you do when you open R.

Script files (1st thing to set up)

slide-10
SLIDE 10
  • Keeps all your work organized

Working Directory (2nd thing to set up)

slide-11
SLIDE 11

Working Directory (2nd thing to set up)

slide-12
SLIDE 12

Recommended for this workshop

Working Directory (2nd thing to set up)

  • Ensure forward slashes “/” are used without “+”

Tell R where to look

setwd(“File path”)

slide-13
SLIDE 13

The most powerful aspect of R…. Packages!

  • R is an open-source platform – open contributions (peer reviewed)
  • Most software you have to wait years for the new and latest tests and

models to be incorporated into the software you purchased – BUT with R most new techniques and updates are generally available within months

  • Also there are a TON of subject matter specific packages available

(see Appendix 1 in your workbook for flow packages)

  • When you install R you get the base package only which allows you to

execute many basic statistical tests and graphic commands, but packages are where you get the power

  • See section 1.4 on how to install R packages (we will do this later

today)

slide-14
SLIDE 14

The package the function is housed in { } Usage: Default settings for the function options Description: What the function does Arguments: Details of what the options control Value: Details of what information is created when the command is executed OR further details of the function options References: Where you can find more information about the function See Also: hyperlinks for alternative functions that may be useful Examples: Code you can copy/paste into your R Console and step through line-by-line to get a better idea of what the command is doing

R Help Files

slide-15
SLIDE 15

Mac Tips (Section 1.7)

  • For the most part works

the same as Windows

  • Script file = Text Editor
  • Working directories =

available in drop-down menu

  • Execute code by

highlighting code and pressing Command R

  • Some differences in

function names

  • choose.files() vs choose.file()
  • windows() vs. quartz()

Text Editor

slide-16
SLIDE 16

https://www.rstudio.com/

RStudio (IDE: Integrated Development Environment)

Preferred among programmers, we will use it in this workshop

slide-17
SLIDE 17

Script Panel Environment/History Panel Console Plots/Packages/Help Files

  • Create “Projects” as working directories
  • Customize environment aesthetics in Tools> Global Options

RStudio (IDE: Integrated Development Environment)

Preferred among programmers, we will use it in this workshop

slide-18
SLIDE 18

Unit 2: Data Preparation in R

Introduction to R

slide-19
SLIDE 19

Object-based programming

Object<-output

Saves information so you can use it later

e.g. scaler, vector, data table, matrix, ANOVA table, etc.

You create this! You name it!

  • When you are naming objects try to keep things simple yet informative (if

everything is called data you might get confused).

  • CAN use “_” and ”.” as well as numbers within your naming conventions,
  • CANNOT use special symbols (%,&,*,#,!), “,”
  • CANNOT start an object name with a number.
slide-20
SLIDE 20

Object-based programming

Object<-output

What you are asking R to do for you

e.g. create a new variable, calculate a mean, etc.

  • Output is most often a function()
  • Functions are also considered objects, BUT they are special objects that

accept other arguments to complete the desired action

function(data, argument1, argument2)

slide-21
SLIDE 21

Objects Classes (Section 2.5)

Vector – the basic data object, a list of information

e.g. data.v=c(1,2,3,4) data.v2=c("A","B","C")

Scaler – a vector of length=1

e.g. A=2

Matrix – a 2-dimensional vector. All columns in a matrix must be numeric and the same length

e.g. data.m=matrix(data.v, nrow=2, ncol=2, byrow=FALSE, dimnames=list(c("Row1","Row2"), c("Col1","Col2")))

Data Frame – is a general form of a matrix, but in a data frame different columns can have different modes.

e.g. x=c("Adam","Beth","Chris","Danielle") # Student names y=c(78,90,56,49) # Test scores z=c(TRUE,TRUE,TRUE,FALSE) # Passed? mydata=data.frame(x,y,z) # make a datatable colnames(mydata)=c("student","testScore","Passed") # assign column names mydata ## view data table

List – An ordered collection of objects (components). A list allows you to gather a variety of (possibly unrelated) objects under one name

e.g. # example of a list with 4 components – a string, a numeric vector, a matrix, and a scaler W= list(name="RcourseExamples", mynumbers=data.v, mymatrix=data.m,myscaler=A) W ## view list

Ask R:

is.vector() is.matrix() is.data.frame() is.list()

Convert:

as.vector() as.matrix() as.data.frame() as.list()

slide-22
SLIDE 22
  • All objects have a data mode which is

a mutually exclusive classification according to the object’s basic structure.

Object Modes (Section 2.4)

  • Numeric – a number value with decimal places.
  • If decimal places are present numeric can default to type integer.
  • Complex – an imaginary numeric value
  • Square root of a negative number
  • Character – a variable with values other than numbers or a single “.”

value present

  • Text string
  • If R identifies repeating character values character values can be classified as type

factor which include an imbedded order known as levels.

  • Logical – a Boolean value
  • True or False
  • Recursive objects (objects that have multiple pieces or perform

actions) have modes such as list or function.

Ask R:

is.numeric() is.chracter() is.factor() is.logical()

Convert:

as.numeric() as.character() as.factor() as.logical()

Investigate data structure:

str()

slide-23
SLIDE 23

Object Modes (Section 2.4)

  • This decision tree ignores the complex mode (only used if dealing with

imaginary numbers)

slide-24
SLIDE 24

JasperMigration.csv

slide-25
SLIDE 25

Data Import/Export

(Section 2.2)

Import/Export based on file type:

read.csv(“File name”, … ) write.csv(objectName, “File name”…)

Metadata that is not required in R We need to “skip” these 6 lines when we import skip=6 Column names are present header=T Missing values represented by blank cell na.strings=“ ”

  • When you export data you need to specify the data format (e.g. file type)
  • Also tell R if you want the row numbers exported as a column (often we do not)

row.names=F

  • When you import data you need to

specify attributes in your data (…)

slide-26
SLIDE 26

Missing Values (Section 2.3)

Missing Values:

is.na() na.rm=T (parameter in other functions)

  • Easiest way is to account for missing values within data import

na.strings=“ ” or “NA” or 0, etc.

  • Similar to how we ask R about data modes, we can as if there

are missing values (based on what R perceives as missing value – see first bullet point)

  • This is a Boolean answer which may or may not be helpful
  • Combine is.na() function with the which() function to identify where

the missing values are located.

  • In mathematical operations (mean(), median(), range(), etc.)

you can use the na.rm=TRUE parameter to remove missing values in the calculation (but they remain in the dataset).

slide-27
SLIDE 27

Dates (Section 2.6)

Dates:

install.packages(“lubridate”) ymd() mdy() dmy() year() month() day() dmy_hms() hour() minute() second()

  • Base package includes as.Date class which allows you to

assign dates by specifying formats (clunky and sometimes a bit tricky)

  • Package lubridate (like “lubricating dates”) more popular

because it simplifies working with dates.

  • Built in functions that will automatically format your data based on

year, month, day and hour, minute and second (if applicable)

  • Accepts all date formats
  • 4 digit years and 2 digit years
  • Month names (long, abbreviated) and numbers
  • Allows you to seamlessly analyze data in time zones (convert and

reset)

  • Mathematical operations for date-time objects that calculate time

spans between points, accounting for time zones, leap years, and daylight savings time.

slide-28
SLIDE 28

WORK PERIOD 9:30 – 10:30

Follow the Workbook Examples Any questions?

BREAK 10:30 – 10:45

Get yourself a coffee… Next we’re learning how to manage data

slide-29
SLIDE 29

Unit 3: Data Management in R

Introduction to R

slide-30
SLIDE 30

Datasets Available in R (Section 3.2)

R Datasets:

library(datasets) data(datasetName)

  • Over 100+ datasets available for you to use
  • Including famous (Fisher's or Anderson's) iris data set gives the sepal and petal

measurements for 50 flowers from each of Iris setosa, versicolor, and virginica. https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html

slide-31
SLIDE 31

Data Management & Manipulation (Section 3.3)

  • Base package allows you to shape, subset, order, create new

variables, merge, and append data.

  • Includes working with brackets [ , ] as well as functions subset(),
  • rder(), sort(), duplicated(), merge(), cbind(), rbind(), etc.
  • Packages tidyr and dplyr are MUCH more popular because

combined they allow you to simply “tidy” your data with a few commands.

  • In web examples, pipelines %>% are often used
  • Allow code to skip over indicated data tables within functions

and link outputs to one another in a “chain”

Data Management:

Install.packages(“tidyr”,”dplyr”) library(tidyr) library(dplyr)

Without Pipelines:

a <- filter(data, variable == numeric_value) b <- summarise(a, Total = sum(variable)) c <- arrange(b, desc(Total))

With Pipelines:

data %>% filter(variable == “value”) %>% summarise(Total = sum(variable)) %>% arrange(desc(Total))

slide-32
SLIDE 32

Tidyverse

  • tidyr and dplyr are part of the tidyverse which are R packages

for data science (managing Big Data)

Data Management:

Install.packages(“tidyverse”) library(tidyverse)

slide-33
SLIDE 33

Shaping Data

  • Utilize key value pairs (simple way to record information)
  • A pair contains two parts: a key that explains what the information describes,

and a value that contains the actual information.

  • Key = flower_att (e.g. sepal length, sepal width, petal length, petal width)
  • Value = measurement (recorded value in centimeters)
  • gather() which groups multiple columns, then converts them into key-value pairs. This

function will transform wide from of data to long form.

  • spread() which does reverse of gather by taking a key-value pair and converts it into separate

columns.

ID Species Sepal Length Sepal Width Petal Length Petal Width 1 setosa 5.2 4.2 7.8 4.1 2 setosa 6.1 5.2 8.9 3.9 3 versicolor 4.7 3.6 5.8 3.8 4 virginica 2.1 5.7 6.3 2.4 … … … … … … ID Species Flower_att Measurement 1 setosa Sepal.Length 5.2 1 setosa Sepal.Width 4.2 1 setosa Petal.Length 7.8 1 setosa Petal.Width 4.1 … … … …

gather() spread()

  • separate() which splits a column into multiple columns defined with into (new column

names) and sep arguments (position where to split the column)

  • unite() which does reverse of separate by uniting multiple columns into single column

with the into and sep arguments. Shape Data Table:

gather(data, key, value) spread(data, key, value)

Shape Columns:

separate(data, column, into, sep) unite(data, column, into, sep)

slide-34
SLIDE 34

Subset Data

  • select() which picks out elements based on their names.
  • Subsets by column name (“I only want specific columns”)
  • filter() which picks out elements based on their value.
  • Subsets by value in column (“I only want rows with a specific value”)

Syntax Meaning == != Equal to Note equal to > >= < >= Less than Less than or equal to Greater than Greater than or equal to %in% Group membership

  • distinct() which removes duplicate rows.

Syntax Meaning is.na !is.na Is NA Is not NA & | ! any all Boolean “AND” Boolean “OR” Boolean “NOT” Boolean “do any match criteria?” Boolean “do all match criteria?”

Order Data

  • arrange() which changes the order of rows based on indicated criteria
  • If no criteria is specified, R will default and sort data in chronological,

numerical, or alphabetical order – whatever is appropriate for the data class.

Subset:

select(data, c(columnNames)) filter(data, criteria)

Order:

arrange(data, orderCriteria)

slide-35
SLIDE 35

Create New Variables

  • mutate() which adds new variables that are functions of equations and

existing variables.

  • Mathematically manipulate values by a factor or each other
  • Utilize ifelse() statements to customized variables
  • summarize() which reduces multiple values down to a single summary
  • utput.
  • Required data to be “grouped” – then performs mathematical operations to

summarize data by group

iris2<-group_by(iris2,Species) #group iris2 dataset by Species iris3<-summarize(iris2, Petal.Length.avg=mean(Petal.Length,na.rm=T), Petal.Width.avg=mean(Petal.Width, na.rm=T), Sepal.Length.avg=mean(Sepal.Length, na.rm=T), Sepal.Width.avg=mean(Sepal.Width, na.rm=T)) iris3 #view new dataset

  • There are a number of values you can pass into the summarize() function (see

page 28) New variables:

mutate(data, NewVariable=formula) summarize(GroupedData, functions)

slide-36
SLIDE 36

Merge Data

  • inner_join(x, y) which returns all rows from x where there are matching

values in y, and all columns from x and y.

  • If there are multiple matches between x and y, all combinations of the matches are

returned.

  • left_join(x, y) which return all rows from x, and all columns from x and y.
  • If there are multiple matches between x and y, all combinations of the matches are

returned.

  • right_join(x, y) which return all rows from y, and all columns from x and y.
  • If there are multiple matches between x and y, all combinations of the matches are

returned.

  • semi_join(x, y) which return all rows from x where there are matching values

in y, keeping just columns from x.

  • A semi join differs from an inner join because an inner join will return one row of x

for each matching row of y, where a semi join will never duplicate rows of x.

  • full_join(x, y) which return all rows and all columns from both x and y.
  • Where there are not matching values, returns NA for the one missing.
  • anti_join(x, y) which return all rows from x where there are not matching

values in y, keeping just columns from x.

  • All join options require the by argument which specifies what variable in x and

y to merge by

Merge:

joinOption(data1, data2, by)

slide-37
SLIDE 37

Append Data

Append:

install.packages(“plyr”) library(plyr) rbind.fill(data1, data2)

ID Species Sepal Length Sepal Width Petal Length Petal Width 1 setosa 5.2 4.2 7.8 4.1 2 setosa 6.1 5.2 8.9 3.9

  • Base package allows you to “stack” data tables with the rbind()

function

  • However, if the column names do not match (even just a couple of

the column names) it returns and error

  • Package plyr has a modified version of this function rbind.fill()

which will “stack” the data table and input NA in cells where the columns names do not match.

ID Species Sepal Length Sepal Width Stem Length Stem Width 5 setosa 8.7 2.2 11.8 1.1 6 setosa 7.2 4.2 12.9 2.9 ID Species Sepal Length Sepal Width Petal Length Petal Width Stem Length Stem Width 1 setosa 5.2 4.2 7.8 4.1 NA NA 2 setosa 6.1 5.2 8.9 3.9 NA NA 5 setosa 8.7 2.2 NA NA 11.8 1.1 6 setosa 7.2 4.2 NA NA 12.9 2.9

slide-38
SLIDE 38

WORK PERIOD 11:15 – 12:15

Follow the Workbook Examples Any questions?

LUNCH 12:15 – 1:00

Take a break & get some brain food… you’ll need it ready for this afternoon!

slide-39
SLIDE 39

Unit 4: Control Structures in R

Advanced R

slide-40
SLIDE 40

Control Structures

  • Control structures are functions that analyze variables and

made a decision (referred to as “choosing a direction”) based

  • n set parameters.
  • Commonly referred to as “flow control”
  • Extremely useful if you want to run a piece of code multiple

times, or if you want to run a piece of code if a certain condition is met.

  • Execute using the curly brackets { }
slide-41
SLIDE 41

If and else Statements

(Section 4.1)

  • if() statements are Boolean statements
  • Simply asks R “If this condition is true, do this”
  • If statement returns false to the condition, nothing happens
  • else() statements are Boolean statements
  • Simply ask R “If this condition is true, do this, if false, do that”)
  • Second step on the if statement
  • You can integrate if() and else() statements within loops, but if you

want to create conditional statements within other functions (like the mutate() function we previously learned) it is better to use the vectorized form of this function, ifelse()

  • The ifelse() function does not require actions in brackets { }

If and else statements:

if(condition){ACTION} else() {ACTION} Ifelse(condition, TRUE ACTION, ELSE ACTION)

slide-42
SLIDE 42

For Loops (Section 4.2)

  • for() statements are loops that are used to repeat a specific

block of code.

  • Utilize an iterative variable to repeatedly executed a set of

commands until the end of a sequence for(i in 1:10){

Action #1 Action #2 Action #3 }

For loops:

for(i in sequence){ACTIONS}

Iterative variable (i) will start at 1, increase every run until it reaches 10 There will be 10 occurrences of this loop First bracket opens the loop Second bracket closes the loop Actions to complete in the loop

slide-43
SLIDE 43

While Loops (Section 4.3)

  • Although used to repeat a specific block of code, while() loops

are different than for() statements.

  • Rather than an iterative variable, while() loops rely on a

condition being TRUE to stay within the loop.

  • Once the condition is FALSE, the chain is broken and the while()

loop is exited.

  • The key is that there needs to be a way to exit the loop – some

way of failing the condition – else the loop will continue exiting indefinitely.

  • If you run into this problem, simply press the Esc key to exit the

loop.

Count<-1 while(count<10){ Action #1 Action #2 count<-count+1 }

while loops:

while(condition){ACTIONS}

First bracket opens the loop Second bracket closes the loop Condition to allow for eventual loop exit Set up condition (example- lots of ways to do this) Actions to complete in the loop

slide-44
SLIDE 44

R apply Family(Section 4.4)

  • All functions include FUN ☺.
  • FUN argument simply specifies the mathematical function you

want to apply to the supplied data.

  • Functions follow the same format as listed in Section 3.3, create

new variables.

  • Functions are split into the data format that they use and/or output

(i.e. list, vector, table, matrix, etc.)

  • Functions are passed to all values in the data table, either based on

MARGIN (1-rows, 2-columns, c(1,2)-rows and columns) or the type of function used (vector, matrix, etc.)

Apply functions:

apply(data, MARGIN, FUN) lapply(data, FUN) sapply(data, FUN) vapply(data, FUN, format) tapply(data, FUN) mapply(data, FUN)

  • An alternative to for() or while()

loops – which can be tedious to construct and debug

slide-45
SLIDE 45

Unit 5: Build Functions in R

Advanced R

slide-46
SLIDE 46

Steps for building your own functions

(Sections 5.1 to 5.3)

Steps to create a function:

1. Name your function (allows you to source this function later) 2. Set the arguments you want to be imputed into the function 3. Open function with bracket { 4. Set of commands you need to complete the purpose of your function 5. Return a value, statement, something…. 6. Close function with bracket }

  • Remember objects that are created within the function are local to

the environment of the function – they don’t exist outside of the function.

  • Need to “return” the value of the object from the function to use them.

function.name <- function(argument1, argument 2,…){ statements to complete the desired actions return (value) }

slide-47
SLIDE 47

Debugging Functions (Section 5.4)

  • When you have an error in building your function there are a

couple of tricks you can use.

  • Include print() functions after each line of your code to output a

value for each step so you can see where your function is making an error.

  • Only in RStudio – you can use the debugger function debug()
  • To use the debug() function first call the function with your

functions name, then execute your function.

  • This will open a different window in Rstudio
  • To use it hit return in the console window under Browse> and R

will step through each line of the function.

  • If the line executes correctly it will continue on to the next line, but if

there is an error it will abort and give you the error message.

slide-48
SLIDE 48

Tips for building your own functions (Sections 5.6)

  • Keep your functions as short as possible.
  • Using smaller “chucks” of code makes it easier to update as you only

likely have to change one of the “chunks” and all the functions that reference the piece are automatically updated.

  • Document your function!!!!
  • Even more than regular code, functions should be highly documented so

you know what is going on.

  • If you provide the function to anyone else they will also know what is

going on.

  • Try and check for errors along with way rather than waiting until the

entire function is completed.

  • Use print() functions liberally.
  • Always test your code with simple examples – don’t start by imputing

the entire dataset because if there is an error it can be hard to find it if you have to wade through a bunch of extra data.

slide-49
SLIDE 49

WORK PERIOD 1:45 – 2:45

Follow the Workbook Examples Any questions?

BREAK 2:45 – 3:00

Up next…. Fun with Graphics.

slide-50
SLIDE 50

Unit 6: Graphics R

Advanced R

slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53

Graphing in ggplot2

Reset graphics window (Section 6.1) Initiate a plot Specify where data for plot will come from Define the global plot aesthetics:

  • Plotting axes (x, y)
  • Grouping
  • Fill colour & grouping
  • Outline colour & grouping
  • Point size & grouping
  • Etc..

graphics.off() ggplot(data=data4, aes(x=Variable1, y=Variable 2, fill=colour, …)) +

(Section 6.2)

slide-54
SLIDE 54

Graphing in ggplot2

Use “+” to “join” graphing pieces Define the “geom” or type of graph you want to plot (Section 6.2):

  • geom_point()
  • geom_line()
  • geom_area()
  • geom_density()
  • geom_dotplot()
  • geom_histogram()
  • geom_boxplot()
  • geom_violin()
  • geom_bar()
  • Etc.

graphics.off() ggplot(data=data4, aes(x=Variable1, y=Variable 2, fill=colour, …)) + geom_point()

slide-55
SLIDE 55

Graphing in ggplot2

graphics.off() ggplot(data=data4) + geom_point(aes(x=Variable1,y=Varaible2), colour=“red”, size=value…)

Alternatively to global parameters you can define local plot aesthetics (Section 6.2):

  • Plotting axes (x, y)
  • Grouping
  • Fill colour
  • Outline colour
  • Point size
  • Etc..
slide-56
SLIDE 56

Graphing in ggplot2

graphics.off() ggplot(data=data4) + geom_point(aes(x=Variable1,y=Varaible2), colour=“red”, size=value…) +

You can “layer” multiple graphs on top of one another using the “+” The order that you layer in will result in the final image (Section 6.2)

geom_line(aes(x=Variable1,y=Varaible2, group=1), colour=“blue”, size= value, linetype=1))

slide-57
SLIDE 57

Graphing in ggplot2

graphics.off() ggplot(data=data4) + geom_point(aes(x=Variable1,y=Varaible2), colour=“red”, size=value…) +

You can customize the x and y axes with the scale geoms (Section 6.6) The type of geom you use depends on the type of data you are plotting:

  • scale_x_discrete()
  • scale_y_discrete()
  • scale_x_continuous()
  • scale_y_continuous()

You can scale the axes by transformations (square root, logarithmic, etc.)

geom_line(aes(x=Variable1,y=Varaible2, group=1), colour=“blue”, size= value, linetype=1)) + scale_x_continuous(breaks=c(), labels=c(“Group1”,…), limits=range(minValue,maxValue)) + scale_y_continuous(breaks=c(), labels=c(“Group1”,…), limits=range(minValue,maxValue))

slide-58
SLIDE 58

Graphing in ggplot2

graphics.off() ggplot(data=data4) + geom_point(aes(x=Variable1,y=Varaible2), colour=“red”, size=value…) +

Add a title and/or subtitle to the plot (Section 6.7)

geom_line(aes(x=Variable1,y=Varaible2, group=1), colour=“blue”, size= value, linetype=1)) + scale_x_continuous(breaks=c(), labels=c(“Group1”,…), limits=range(minValue,maxValue)) + scale_y_continuous(breaks=c(), labels=c(“Group1”,…), limits=range(minValue,maxValue)) + ggtitle(label=“Plot Title”, subtitle=“Plot Year”)

slide-59
SLIDE 59

Graphing in ggplot2

graphics.off() ggplot(data=data4) + geom_point(aes(x=Variable1,y=Varaible2), colour=“red”, size=value…) +

Add text to the plot (either data labels specified in global aes) or an individual text string Able to specify font family, and nudge labels to avoid overlaps with clustered points (Section 6.7)

geom_line(aes(x=Variable1,y=Varaible2, group=1), colour=“blue”, size= value, linetype=1)) + scale_x_continuous(breaks=c(), labels=c(“Group1”,…), limits=range(minValue,maxValue)) + scale_y_continuous(breaks=c(), labels=c(“Group1”,…), limits=range(minValue,maxValue)) + ggtitle(label=“Plot Title”, subtitle=“Plot Year”) + geom_text(aes(x=value,y=value),label=“Text”,…)

slide-60
SLIDE 60

Graphing in ggplot2

graphics.off() ggplot(data=data4) + geom_point(aes(x=Variable1,y=Varaible2), colour=“red”, size=value…) +

Add shapes to highlight areas within your plot (Section 6.8)

geom_line(aes(x=Variable1,y=Varaible2, group=1), colour=“blue”, size= value, linetype=1)) + scale_x_continuous(breaks=c(), labels=c(“Group1”,…), limits=range(minValue,maxValue)) + scale_y_continuous(breaks=c(), labels=c(“Group1”,…), limits=range(minValue,maxValue)) + ggtitle(label=“Plot Title”, subtitle=“Plot Year”) + geom_text(aes(x=value,y=value),label=“Text”,…) + geom_rect(xmin=value, xmax=value,…) + stat_ellipse()

slide-61
SLIDE 61

Graphing in ggplot2

graphics.off() ggplot(data=data4) + geom_point(aes(x=Variable1,y=Varaible2), colour=“red”, size=value…) +

Add vertical, horizontal or diagonal reference lines to your plot (Section 6.9)

geom_line(aes(x=Variable1,y=Varaible2, group=1), colour=“blue”, size= value, linetype=1)) + scale_x_continuous(breaks=c(), labels=c(“Group1”,…), limits=range(minValue,maxValue)) + scale_y_continuous(breaks=c(), labels=c(“Group1”,…), limits=range(minValue,maxValue)) + ggtitle(label=“Plot Title”, subtitle=“Plot Year”) + geom_text(aes(x=value,y=value),label=“Text”,…) + geom_rect(xmin=value, xmax=value,…) + stat_ellipse() + geom_abline(slope=value, intercept=value, linetype=1,…)

slide-62
SLIDE 62

Graphing in ggplot2

graphics.off() ggplot(data=data4) + geom_point(aes(x=Variable1,y=Varaible2), colour=“red”, size=value…) +

Add legend depending on what type of legend you need – colour, size of points, or shape of points (Section 6.10)

geom_line(aes(x=Variable1,y=Varaible2, group=1), colour=“blue”, size= value, linetype=1)) + scale_x_continuous(breaks=c(), labels=c(“Group1”,…), limits=range(minValue,maxValue)) + scale_y_continuous(breaks=c(), labels=c(“Group1”,…), limits=range(minValue,maxValue)) + ggtitle(label=“Plot Title”, subtitle=“Plot Year”) + geom_text(aes(x=value,y=value),label=“Text”,…) + geom_rect(xmin=value, xmax=value,…) + stat_ellipse() + geom_abline(slope=value, intercept=value, linetype=1,…) + scale_colour_manual(values=col.spectrum, breaks=c(),...) scale_shape_manual(name=“Legend Name”, values=c(),...)

slide-63
SLIDE 63

Summarizing and Smoothing Data (Section 6.3)

  • One very convenient feature of the ggplot2 packages is its range of

build-in functions to summarize data within the plot

  • This means that you do not always have to summarize your data prior to plotting.

Stat Function Description Use in geom function stat_bin() stat.count() Counts the number of observations in bins. Plots the count. stat=“bin” stat_bin_2d() Divides plane into rectangles, counts the number of cases in each rectangle then (by default) maps the number of cases to the rectangle. Also known as a heat map. stat=”bin2d” stat_smooth() Creates a smooth line. Plots the smoothed line. stat=”smooth” stat.sum() Adds data values. Plots result. stat=”sum” stat_identity() No summary. Plot data as is. stat=”identity”

  • The following produces the same plot:

ggplot(data6, aes(x=Obs,y=Species)) + geom_point(stat="sum") #Summed dataggplot(data6, aes(x=Obs,y=Species)) + stat_sum(geom="point") #Summed data

slide-64
SLIDE 64

Customizing Plots (Sections 6.4-6.5)

  • Customize the Size (point size, line width), Shape (point), Line Type (line),

Transparency, and Colour within geom and aes() functions

slide-65
SLIDE 65

Facets (Section 6.11)

  • There are circumstances where you want to have a single plot with

multiple panels to show corrected relationships between variables.

  • We can use the facet_grid() function to split up your data by one or

more variables and plot the subsets of data together. The function works in rows and columns were individual plots are placed in “cells”.

Evenly scaled Scaled based on data range

slide-66
SLIDE 66

Themes (Section 6.12)

  • Collections of attributes act like global controls to ensure all elements
  • f your plot are consistent.

Allows you to control:

  • Font characteristics (family,

size, colour, etc.)

  • Plot margins
  • Scale aesthetics (tick

marks, labels, etc.)

  • Backgrounds
  • Grid lines
  • Axes titles (position, font,

angle of text, etc.)

  • Legend aesthetics (title font,

position, spacing between elements, position, etc.)

  • Others….
slide-67
SLIDE 67

Create a customized graph!

Use the modification on the JapserMigration dataset (Section 6.2, data6) and the code in Section 6.12 to build your own theme and create the plots above.

slide-68
SLIDE 68

WORK PERIOD 3:30 – 5:00

Follow the Workbook Examples Any questions?

slide-69
SLIDE 69

Thank You for Attending the R Workshop

I you have any further R questions please feel free to contact me Flow Cytometry Core Facility

LKG Consulting

Email: consulting.lkg@gmail.com Website: www.consultinglkg.com