Session 1 Bases de R et Rmd Teachers: Claire Vandiedonck, Antoine - PowerPoint PPT Presentation

March, 04 th 2020 DU Bioinformatique intégrative Module 3: « R et statistiques » Session 1 Bases de R et Rmd Teachers: Claire Vandiedonck, Antoine Bridier-Nahmias Helpers: Jacques van Helden, Anne Badel Le script "DUBii_R_Session1.R" reprenant l’ensemble du code présenté dans ce diaporama est disponible sur github 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 1 / 79

Plan du module et intervenants Responsables : Claire Vandiedonck et Jacques van Helden Autres intervenants : Guillaume Achaz, Anne Badel, Magali Berland, Antoine Bridier-Nahmias, Olivier Sand, Natacha Cerisier, Site Web : https://du-bii.github.io/module-3-Stat-R/ Jour Horaire Description Bases de R et Rmd 4 mars 9h30 - 12h30 Claire Vandiedonck, Antoine Bridier-Nahmias Statistiques descriptives, tests d'hypothèses, Figures et Paquets 5 mars 13h30 - 16h30 Claire Vandiedonck, Guillaume Achaz 10 Statistiques pour les données à haut débit 14h30 - 17h30 mars Jacques van Helden, Claire Vandiedonck 12 Classification non supervisée 9h00 - 12h00 mars Anne Badel, Jacques van Helden 30 Analyses exploratoires (ACP/MDS) et analyses d'enrichissement 10h00 - 13h00 mars Magali Berland, Jacques van Helden 30 Classification supervisée et apprentissage 14h30 - 17h30 mars Jacques van Helden, Olivier Sand 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 2 / 79

Plan de la session 1. Start-R: connexion au serveur Rstudio de l’IFB 2. Vérification et consolidation des pré-recquis 3. Dataframes Facteurs Listes 4. Programmation Executions conditionnelles Boucles Fonctions 5. Rmarkdown 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 3 / 79

Poll: www.wooclap.com EGIDTQ 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 4 / 79

1. Start-R First steps with R and Rstudio 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 5 / 79

Connexion au serveur Rstudio de l’IFB https://rstudio.cluster.france-bioinformatique.fr/ 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 6 / 79

Connexion au serveur Rstudio de l’IFB 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 7 / 79

Tutorial start-R.html For the next 10 minutes: start-R activity with the Rstudio server of the IFB cluster by following the instructions of the start-R.html file  at the end of this activity, you must have uploaded in a dedicated folder: - the « anthropo.Rdata » generated during the prerequisites activity - the script of the slides of this R session 1 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 8 / 79

2. Prérecquis acquis? 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 9 / 79

Let’s check with a quizz! Quizz on moodle: - Si vous avez un compte ENT: https://moodlesupd.script.univ-paris-diderot.fr/course/view.php?id=10629 - Si vous n’avez pas encore de compte ENT: https://moodlesupd.script.univ-paris-diderot.fr/course/view.php?id=13420 mot de passe: dubii2020 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 10 / 79

Summary on vectors Format one-dimension Datatype homogeneous: only one type of character, numeric, logical , factor… -> ceorcion if heterogeneous - check with class() or mode() - checking type with is.num() , is.charachter() , … - conversion with as.num() , as.charachter() , … Creation c() , : , seq() , rep() , sample() , rnorm() , … Adding new items c() Size length() Slicing my_vector[i] Filling my_vector[i] <- "toto" Naming names() 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 11 / 79

Summary on matrices Format two-dimensions Datatype class() to check it is a matrix homogeneous: only one type of character, numeric, logical, factor -> ceorcion if heterogeneous -> check with mode() Creation matrix() , cbind() , rbind() Adding new items cbind() , rbind() Size length() -> nb of items Dim dim(), str() Slicing my_vector[i,j] Filling my_vector[i,j] <- "toto" Naming colnames() , rownames() 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 12 / 79

3. dataframes 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 13 / 79

Dataframe Dataframe = two-dimensional object that can be heterogeneous,  Create a dataframe with function data.frame() data.frame(..., row.names = NULL, check.rows = FALSE, check.names = TRUE, fix.empty.names = TRUE, stringsAsFactors = default.stringsAsFactors()) 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 14 / 79

Dataframe created with existing vectors  Create a dataframe with function data.frame() Important: > myDataf <- data.frame(weight, size, bmi) If vectors are character chains, > myDataf # it looks pretty much like the matrix myData2 use weight size bmi stringsAsFactors= Fabien 60 1.75 19.59184 FALSE to avoid their Pierre 72 1.80 22.22222 conversion into Sandrine 57 1.65 20.93664 Claire 90 1.90 24.93075 factors Bruno 95 1.74 31.37799 Delphine 72 1.91 19.73630 > class(myDataf) # but this is well a dataframe and not a matrix [1] "data.frame" > str(myDataf) # this one is a homogeneous dataframe with numeric vectors 'data.frame': 6 obs. of 3 variables: $ weight: num 60 72 57 90 95 72 $ size : num 1.75 1.8 1.65 1.9 1.74 1.91 $ bmi : num 19.6 22.2 20.9 24.9 31.4 ... > dim(myDataf) [1] 6 3 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 15 / 79

A dataframe can be heterogeneous  create a new vector with characters and include it in the dataframe > gender <- c("Man","Man","Woman","Woman","Man","Woman") > gender [1] "Man" "Man" "Woman" "Woman" "Man" "Woman" > myDataf$sex <- gender # or use cbind # IMPORTANT: note that I directly specify the name by using a "$« # AND this method do not transform the vector as a factor! > myDataf weight size bmi sex Fabien 60 1.75 19.59184 Man Pierre 72 1.80 22.22222 Man Sandrine 57 1.65 20.93664 Woman Claire 90 1.90 24.93075 Woman Bruno 95 1.74 31.37799 Man Delphine 72 1.91 19.73630 Woman > str(myDataf) # this data.frame is heterogeneous with numeric and character values 'data.frame': 6 obs. of 4 variables: $ weight: num 60 72 57 90 95 72 $ size : num 1.75 1.8 1.65 1.9 1.74 1.91 $ bmi : num 19.6 22.2 20.9 24.9 31.4 ... $ sex : chr "Man" "Man" "Woman" "Woman" ... 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 16 / 79

Creating an empty dataframe  creating an empty dataframe? > d <- data.frame() > d data frame with 0 columns and 0 rows > dim(d) BUT USELESS : impossible to fill! [1] 0 0  Better way: converting a matrix in a dataframe with function as.data.frame() > d <- as.data.frame(matrix(NA,2,3)) > class(myData2) > d [1] "matrix" V1 V2 V3 # by default, col names are V1, V2, etc… > class(as.data.frame(myData2)) 1 NA NA NA # while if you are using the function [1] "data.frame" 2 NA NA NA # data.frame() and not as.dataframe(), You may also use data.frame on a #col names are called X1, X2, etc… matrix generated by binding rows or > dim(d) columns [1] 2 3 > str(d) > d2 <- as.data.frame(cbind(1:2, 10:11) 'data.frame': 2 obs. of 3 variables: > str(d2) $ V1: logi NA NA 'data.frame': 2 obs. of 2 variables: $ V2: logi NA NA $ V1: int 1 2 $ V3: logi NA NA $ V2: int 10 11 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 17 / 79

Row/Column names of dataframes  Either use same fonctions as for matrices rownames() and colnames()  Or better use the ones dedicated to dataframes row.names() and names() > row.names(d) Important: [1] "1" "2" each row name > names(d) must be unique! [1] "V1" "V2" "V3" Note: data.frames are a special case of a list of variables of the same number of rows with unique row names 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 18 / 79

Extracting vectors from dataframes Getting the vector corresponding to a column from a dataframe :  either by specifying its index > myDataf[,2] [1] 1.75 1.80 1.65 1.90 1.74 1.91  Or by giving its name within the " " inside the squared brackets > myDataf[,"size"] [1] 1.75 1.80 1.65 1.90 1.74 1.91  Or by giving its name after the character « $ » > myDataf$size [1] 1.75 1.80 1.65 1.90 1.74 1.91 04/03/2020 Université de Paris- DU Bii - R Session 1 - Vandiedonck C. 19 / 79

Session 1 Bases de R et Rmd Teachers: Claire Vandiedonck, Antoine - PowerPoint PPT Presentation

March, 04 th 2020 DU Bioinformatique intgrative Module 3: R et statistiques Session 1 Bases de R et Rmd Teachers: Claire Vandiedonck, Antoine Bridier-Nahmias Helpers: Jacques van Helden, Anne Badel Le script

Oral Presentation Program Thursday Oct 3, 11:00-12:35 Session 1 Session 2 Session 3 Session 4

Time Room 1 Room 2 Room 3 Room 4 Room 5 Room 6 Room 7 Room 8 Session 1a Session 2a

Celebration of Student Achievement: Poster Session Schedule 2016 Session A: EDU 212 - 12:00-1:00

SESSION 6: SESSION 6: PR SESSION 6: SESSION 6: PR PROCEDURES OF OPEN PROCEDURES OF OPEN

Talks: Session 1 Talks: Session 1 Talks: Session 1 Talks: Session 1 Saturday, April 7, 9:30

Session 2 : Numerical Python and plotting Session 2 In this session: Session 1 exercise

DAY 1 Monday-(March 05, 2018-Afternoon Session) Track 1: Advanced Computing Evening Session

LaGov LaGov Validation Session Agenda Validation Session Agenda Purpose Work Session

SESSION TWO LOGISTICS Session Two: Understanding Behavior Duration: 2 hours Session Goals: This

SESSION TWO LOGISTICS Session Two: Understanding Behavior Duration: 2 hours Session Goals: This

Session Five Five Session Session Five Competing in a global world: a Competing in a global

14/12 Thursday 14.00 - 15.45 Parallel Sessions Session 1 Session 2 Session 3 Economy

Blessed Textiles Limited Corporate Briefing Session Minutes of the Corporate Briefing Session

LaGov LaGov Validation Session Agenda Validation Session Agenda Purpose Work Session

Welcome to the Neighborhood Visioning Session Visioning Session March 14, 2019 THE CHOICE PLAN

Community Forums June 2011 Session 1: Session 1: Grafton Community Centre Grafton Community

+ + Who am I? 2 David G Cooper, PhD Office: 249 Park Email: dgc@cs.brynmawr.edu Interests:

www.accountingadvisors.com About the speaker: David is owner of Accounting Advisors, Inc., an

Happy Birthday, Volker! David S. Watkins Department of Mathematics Washington State University

CMSC 132: Object-Oriented Programming II Problem Specification & Software Architecture

Sticking to the first law of thermodynamics and the rule of three IHP - October 2015 Jean-Marc

Welcome to the co u rse ! VISU AL IZIN G TIME SE R IE S DATA IN P YTH ON Thomas Vincent Head

Lecture 6: Color Information Visualization CPSC 533C, Fall 2007 Tamara Munzner UBC Computer

Validating Mathematical Structures Kazuhiko Sakaguchi University of Tsukuba IJCAR 2020 Packed

Session 1 Bases de R et Rmd Teachers: Claire Vandiedonck, Antoine - PowerPoint PPT Presentation

March, 04 th 2020 DU Bioinformatique intgrative Module 3: R et statistiques Session 1 Bases de R et Rmd Teachers: Claire Vandiedonck, Antoine Bridier-Nahmias Helpers: Jacques van Helden, Anne Badel Le script

Oral Presentation Program Thursday Oct 3, 11:00-12:35 Session 1 Session 2 Session 3 Session 4

Time Room 1 Room 2 Room 3 Room 4 Room 5 Room 6 Room 7 Room 8 Session 1a Session 2a

Celebration of Student Achievement: Poster Session Schedule 2016 Session A: EDU 212 - 12:00-1:00

SESSION 6: SESSION 6: PR SESSION 6: SESSION 6: PR PROCEDURES OF OPEN PROCEDURES OF OPEN

Talks: Session 1 Talks: Session 1 Talks: Session 1 Talks: Session 1 Saturday, April 7, 9:30

Session 2 : Numerical Python and plotting Session 2 In this session: Session 1 exercise

DAY 1 Monday-(March 05, 2018-Afternoon Session) Track 1: Advanced Computing Evening Session

LaGov LaGov Validation Session Agenda Validation Session Agenda Purpose Work Session

SESSION TWO LOGISTICS Session Two: Understanding Behavior Duration: 2 hours Session Goals: This

SESSION TWO LOGISTICS Session Two: Understanding Behavior Duration: 2 hours Session Goals: This

Session Five Five Session Session Five Competing in a global world: a Competing in a global

14/12 Thursday 14.00 - 15.45 Parallel Sessions Session 1 Session 2 Session 3 Economy

Blessed Textiles Limited Corporate Briefing Session Minutes of the Corporate Briefing Session

LaGov LaGov Validation Session Agenda Validation Session Agenda Purpose Work Session

Welcome to the Neighborhood Visioning Session Visioning Session March 14, 2019 THE CHOICE PLAN

Community Forums June 2011 Session 1: Session 1: Grafton Community Centre Grafton Community

+ + Who am I? 2 David G Cooper, PhD Office: 249 Park Email: dgc@cs.brynmawr.edu Interests:

www.accountingadvisors.com About the speaker: David is owner of Accounting Advisors, Inc., an

Happy Birthday, Volker! David S. Watkins Department of Mathematics Washington State University

CMSC 132: Object-Oriented Programming II Problem Specification &amp; Software Architecture

Sticking to the first law of thermodynamics and the rule of three IHP - October 2015 Jean-Marc

Welcome to the co u rse ! VISU AL IZIN G TIME SE R IE S DATA IN P YTH ON Thomas Vincent Head

Lecture 6: Color Information Visualization CPSC 533C, Fall 2007 Tamara Munzner UBC Computer

Validating Mathematical Structures Kazuhiko Sakaguchi University of Tsukuba IJCAR 2020 Packed

CMSC 132: Object-Oriented Programming II Problem Specification & Software Architecture