topics for today
play

Topics for today Introduction to R Graphics: Getting started with - PowerPoint PPT Presentation

Topics for today Introduction to R Graphics: Getting started with R g U i R t t fi Using R to create figures Drawing common types of plots (scatter, box, MA) Comparing distributions (histograms, CDF plots) Customizing


  1. Topics for today Introduction to R Graphics: • Getting started with R g U i R t t fi Using R to create figures • Drawing common types of plots (scatter, box, MA) • Comparing distributions (histograms, CDF plots) • Customizing plots (colors, points, lines, margins) • Combining plots on a page • Combining plots on a page • Combining plots on top of each other • More specialized figures and details BaRC Hot Topics – October 2011 George Bell, Ph.D. http://iona.wi.mit.edu/bio/education/R2011/ 2 Why use R for graphics? Why not use R for graphics? • Another application already works fine pp y • Creating custom publication-quality figures Creating custom publication quality figures • It’s hard to use at first • Many figures take only a few commands – You have to know what commands to use • Almost complete control over every aspect of the figure • Getting the exact figure you want can take a series of commands • To automate figure-making (and make them more reproducible) more reproducible) • Final product is editable only in Illustrator • Final product is editable only in Illustrator • Real statisticians use it • Real statisticians use it • It’s free 3 4

  2. Getting started Start of an R session On tak On your own computer • See previous session: Introduction to R: See previous session: Introduction to R: http://iona.wi.mit.edu/bio/education/R2011/ • Hot Topics slides: http://iona.wi.mit.edu/bio/hot_topics/ • R can be run on your computer or on tak. 5 6 Getting help Reading files - intro • Use the Help menu • Take R to your preferred directory () • Check out “Manuals” • Check out Manuals Html help – http://www.r-project.org/ – contributed documentation • Use R’s help ?boxplot [show info] ??boxplot [search docs] • Check where you are (e.g., get your working directory) y ( g , g y g y) example(boxplot) [examples] and see what files are there • Search the web > getwd() [1] "X:/bell/Hot_Topics/Intro_to_R“ – “r-project boxplot” > dir() [1] “all_my_data.txt" 7 8

  3. Reading data files Figure formats and sizes • By default, a figure window will pop up from most R sessions. • Usually it’s easiest to read data from a file • Instead, helpful figure names can be included in code – Pro: You won t need an extra step to save the figure Pro: You won’t need an extra step to save the figure – Organize in Excel with one-word column names Organize in Excel with one word column names – Con: You won’t see what you’re creating – Save as tab-delimited text • To select name and size (in inches) of pdf file (which can be >1 page) • Check that file is there pdf(“tumor_boxplot.pdf”, w=11, h=8.5) boxplot(tumors) # can have >1 page dev.off() # tell R that we’re done list.files() • To create another format (with size in pixels) • Read file png(“tumor_boxplot.png”, w=1800, h=1200) boxplot(tumors) tumors = read.delim( tumors_wt_ko.txt , header=T) tumors = read delim("tumors wt ko txt" header=T) dev.off() • Check that it’s OK • Save your commands (in a text file)! > tumors • Final PDF figures wt ko – can be converted with Acrobat 1 5 8 – are be edited with Illustrator 2 6 9 3 7 11 9 10 Introduction to scatterplots Boxplot conventions • Simplest use of the ‘plot’ command wt ko • Can draw any number of points C d b f i t 5 8 6 9 • Example (comparison of expression values) <= 1.5 x IQR 75 th percentile 7 11 genes = read.delim(“Gene_exp_with_sd.txt”) IQR = interquartile range median plot(genes$WT, genes$KO) Gene WT KO 25 th percentile Any points A 6 8 beyond the whiskers are B 5 5 defined as defined as C 9 12 “outliers”. Right-click to save figure D 4 5 Note that the E 8 9 above data has no F 6 8 “outliers”. The red point was added by But note that A = F 11 hand. 12 Other programs use different conventions!

  4. Comparing sets of numbers Gene expression plots • Why are you making the figure? Typical x-y scatterplot MA (ratio-intensity) plot x-y scatterplot with contour • What is it supposed to show? pp • How much detail is best? • Are the data points paired? plot(genes.all) M = genes.all[,2] - genes.all[,1] library(MASS) abline(0,1) A = apply(genes.all, 1, mean) kde2d() # et density # Add other lines plot(A,M) image() # Draw colors # etc. contour() # Add contour plot(genes) stripchart(genes, vert=T) boxplot(genes) points() # Add points Note the “jitter” (addition of noise) in the first 2 figures. 13 14 Comparing distributions Displaying distributions • Example dataset: log2 expression ratios • Why are you making the figure? • What is it supposed to show? • How much detail is best? • Methods: – Boxplot – Histogram – Density plot – Violin plot – CDF (cumulative distribution function) plot 15 16

  5. Comparing similar distributions Customizing plots • About anything about a plot can be modified, Density plot • Example dataset: • Example dataset: although it can be tricky to figure out how to do although it can be tricky to figure out how to do – MicroRNA is knocked so. down – Colors ex: col=“red” – Expression levels are – Shapes of points ex: pch=18 assayed – Shapes of lines ex: lwd=3, lty=3 CDF plot – Genes are divided into – Axes (labels scale orientation size) Axes (labels, scale, orientation, size) those without miRNA – Margins see ‘mai’ in par() target site (black) vs. – Additional text ex: text(2, 3, “This text”) with target site (red) – See par() for a lot more options 17 18 Point shapes by number Customizing a plot • plot(x, y, type="p") • plot(x, y, type="p", pch=21, col="black", bg=rainbow(6), cex=x+1, ylim=c(0, max(c(y1,y2))), xlab="Time (d)", ylab="Tumor counts", las=1, Ex: cex.axis=1.5, cex.lab=1.5, main="Customized figure", cex.main=1.5) pch=21 • Non-obvious options: – type="p“ yp p # Draw points p – pch=21 # Draw a 2-color circle – col="black“ # Outside color of points – bg=rainbow(6) # Inside color of points – cex=x+1 # Size points using ‘x’ – las=1 # Print horizontal axis labels 19 20

  6. Combining plots on a page Merging plots on same figure • Set up layout with command like • Commands: – par(mfrow = c(num.rows, num.columns)) par(mfrow c(num rows num columns)) – plot plot # start figure # start figure – Ex: par(mfrow = c(1,2)) – points # add point(s) – lines # add line(s) – legend • Note that order of • Note that order of commands determines order of layers 21 22 More graphics details Using error bars library(plotrix) • Creating error bars plotCI(x, y, uiw=y.sd, liw=y.sd) p ( , y, y , y ) # vertical error bars • Drawing a best-fit (regression) line plotCI(x, y, uiw=x.sd, liw=x.sd, err="x", add=T) # horizontal • Using transparent colors • Creating colored segments • Creating log-transformed axes • Labeling selected points • Labeling selected points 23 24

  7. Drawing a regression line Transparent colors • Use ‘lm(response~terms)’ for simple linear regression: regression: • Semitransparent colors can # Calculate y-intercept be indicated by an lmfit = lm(y ~ x) extended RGB code # Set y-intercept to 0 (#RRGGBBAA) – AA = opacity from 0-9,A-F lmfit.0 = lm(y ~ x + 0) (lowest to highest) • Add line(s) with – Sample colors: Red #FF000066 abline(lmfit) Green #00FF0066 Blue #0000FF66 25 26 Colored bars Handling log tranformations • Data or axes can be transformed or scaled. • Which (if either) should be used? Whi h (if ith ) h ld b d? • Colored bars can be used C l d b b d to label rows or columns of a matrix – Ex: cell types, GO terms • Limit each color code to 6- 8 8 colors l • Don’t forget the legend! 27 28

  8. Labeling selected points More resources • R Graph Gallery: 1. Make figure – http://addictedtor.free.fr/graphiques/ http://addictedtor.free.fr/graphiques/ 2. Run “identify” command • R scripts for Bioinformatics – identify(x, y, – http://iona.wi.mit.edu/bio/bioinfo/Rscripts/ labels) • List of R modules installed on tak – Ex: identify(genes, – http://tak/trac/wiki/R labels = • Our favorite book: rownames(genes)) – Introductory Statistics with R 3 3. Click at or near points Click at or near points (Peter Dalgard) to label them • We’re glad to share commands and/or scripts to get 4. Save image you started WT cells KO cells MUC5B::727897 31.7 41.7 HAPLN4::404037 37.3 47.7 29 30 SIGLEC16::400709 24.1 32.7 Upcoming Hot Topics • Introduction to Bioconductor - microarray and RNA-Seq analysis (Thursday) • Unix, Perl, and Perl modules (short course) • Quality control for high-throughput data • RNA-Seq analysis • Gene list enrichment analysis • Galaxy • Sequence alignment: pairwise and multiple Sequence alignment: pairwise and multiple • See http://iona.wi.mit.edu/bio/hot_topics/ • Other ideas? Let us know. 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend