r
play

R A Personalized Introduc3on Debapriyo Majumdar Data Mining - PowerPoint PPT Presentation

R A Personalized Introduc3on Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 18, 2014 About R A suite of software tools for Data manipulation


  1. R ¡ A ¡Personalized ¡Introduc3on ¡ ¡ Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 18, 2014

  2. About ¡“R” ¡ § A suite of software tools for – Data manipulation – Calculations – Graphical display § Largely based on the programming language S § Packages – About 25 packages standard and recommended supplied – Many more available for download at: http://CRAN.R-project.org § Free (GPL). Also BSD, MIT 2 ¡

  3. Basic ¡ § Arithmetic > 2+2 � [1] 4 � § Assign variables > x <- 2 � > y <- 5 � > z <- 2 * x + 3 * y � > z � [1] 19 � § The created objects are now stored in the workspace. List them > ls() � [1] "x" "y" "z” � § Also, we can remove them > rm(x) � > ls() � [1] "y" "z” � � � 3 ¡

  4. Vectors ¡ § Creating a vector > x <- c(2,5,9) � > y <- c(3,1,-1) � > x + y � [1] 5 6 8 � § But x * y would do a element-wise multiplication � > x * y � [1] 6 5 9 � § But x + 2 would add 2 to all elements of x � > x + 2 � [1] 4 7 11 � 4 ¡

  5. Useful ¡func3ons ¡related ¡to ¡vectors ¡ § Sequence of integers from a to b � > seq(2,9) � [1] 2 3 4 5 6 7 8 9 � § The repeat function � > rep(1,3) � [1] 1 1 1 � > rep(1:3,3) � [1] 1 2 3 1 2 3 1 2 3 � § Try the help or ? command � > help(rep) � > ?rep � � � 5 ¡

  6. Data ¡and ¡Sta3s3cs ¡– ¡Basics ¡ ¡ § A lot of things out of the box > x <- c(2,3,1,5,7,2,5,8,3,2,0,3,2,6,7,3,1,3,5,8,4) � > summary(x) � Min. 1st Qu. Median Mean 3rd Qu. Max. � 0.00 2.00 3.00 3.81 5.00 8.00 § Specifying elements or subsets (index starts at 1, not 0) > x[1] � [1] 2 � > x[3:6] � [1] 1 5 7 2 � § Excluding elements by the minus sign > x[-(2:4)] � [1] 2 7 2 5 8 3 2 0 3 2 6 7 3 1 3 5 8 4 � 6 ¡

  7. Matrices ¡ § Bind columns (cbind) or rows (rbind) > x <- c(3,5,2); y <- c(8,2,1) > z <- cbind(x,y) > z x y [1,] 3 8 [2,] 5 2 [3,] 2 1 § Or specify the entries and number of rows > A <- matrix(c(3,5,2,8,2,1),nrow=3) � > B <- matrix(c(3,5,2,8,2,1),nrow=2) � 7 ¡

  8. Matrix ¡opera3ons ¡ § Addition is usual > A + 2* A � � � [,1] [,2] � [1,] 9 24 � [2,] 15 6 � [3,] 6 3 � § Multiplication: x * y is element wise, not matrix multiplication § Matrix multiplication: %*% > A %*% B � � � � [,1] [,2] [,3] � [1,] 49 70 14 � [2,] 25 26 12 � [3,] 11 12 5 � 8 ¡

  9. Inverse ¡and ¡Covariance ¡of ¡matrix ¡ § Computes the inverse of a matrix if it exists: > solve(X) � § Covariance matrix > var(X) � > cov(X) � § Covariance matrix (recall) X 1 ,…, X n are random variables, each with finite variance Σ is the covariance matrix where � Σ ij = cov( X i , X j ) = E [( X i − µ i )( X j − µ j )] § Also called var( X ) = Variance of the random vector X 9 ¡

  10. Wri3ng ¡a ¡func3on ¡ § A new function can be defined > z <- function(x,y) 3*x + 4*y � > z(2,3) � [1] 18 � § A function with many lines > z <- function(x,y) { � � � c <- 3*x + 4*y; � � � 5 * c � } � § The last line is the output § Can write the function in a text file prog.R and source it > source("/Users/deb/…/R/xTest.R") � § Can also define a new binary operator > “%LL%” <- function(x,y) { 3*x + 4*y } � > 5 %LL% 3 � � � 10 ¡

  11. Data ¡ § Read an entire data frame – The first line of the file should have a name for each variable in the data frame – Each additional line of the file has as its first item a row label and the values for each variable Age Income.K Owns.House � 01 25 8 No � 02 33 5 No � 03 30 130 Yes � 04 45 50 Yes � 05 65 5 No � 06 75 7 Yes � � > H <- read.table(”filename") � � 11 ¡

  12. Using ¡data ¡ § Plot tries to figure out what kind of plot will be suitable > plot(H[1:2]) � § We want to label points based on some attribute – Let us select a subset of the data > H[which(H$Owns.House=='Yes'),] � Age Income.K Owns.House � 03 30 130 Yes � 04 45 50 Yes � 06 75 7 Yes � 07 28 200 Yes � 08 35 90 Yes � 10 55 102 Yes � … … … … � � 12 ¡

  13. Using ¡data ¡ § Plot one subset with blue, another with red � 200 > HYes <- H[which(H $Owns.House=='Yes'),] � � 150 New ¡observa3on ¡(black) ¡ > plot(HYes[1:2], Income.K col='blue') � 100 � > points(HNo[1:2], col='red') � 50 Hands ¡on ¡in ¡class ¡ 0 30 40 50 60 70 80 Age 13 ¡

  14. References ¡ § The R manual: http://cran.r-project.org/doc/manuals/r-release/R- intro.html § A self-learn tutorial: https://www.nceas.ucsb.edu/files/scicomp/Dloads/ RProgramming/BestFirstRTutorial.pdf 14 ¡

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend