The R User Conference 2009 Rennes Hup et al. (Institut Curie, - - PowerPoint PPT Presentation

the r user conference 2009 rennes
SMART_READER_LITE
LIVE PREVIEW

The R User Conference 2009 Rennes Hup et al. (Institut Curie, - - PowerPoint PPT Presentation

A suite of R packages for the analysis of DNA copy number microarray experiments Application in cancerology Philippe Hup 1 , 2 1 UMR144 Institut Curie, CNRS 2 U900 Institut Curie, INSERM, Mines Paris Tech The R User Conference 2009 Rennes


slide-1
SLIDE 1

A suite of R packages for the analysis of DNA copy number microarray experiments Application in cancerology

Philippe Hupé1,2

1UMR144 Institut Curie, CNRS 2U900 Institut Curie, INSERM, Mines Paris Tech

The R User Conference 2009 Rennes

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 1 / 19

slide-2
SLIDE 2

Outline

1

Biological / clinical context

2

R packages description

3

End-user interfaces / automatic workflow

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 2 / 19

slide-3
SLIDE 3

Biological / clinical context

Outline

1

Biological / clinical context

2

R packages description

3

End-user interfaces / automatic workflow

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 3 / 19

slide-4
SLIDE 4

Biological / clinical context

DNA copy number alteration in tumour

tumoral cell normal cell

Chaos in cancer cells

gain, loss or amplification of chromosomes or pieces of chromosomes.

Molecular profiling of tumours

Identification of DNA copy number alterations in each patient Is the pattern of alterations is related to patient outcome (e.g. relapse, metastasis)?

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 4 / 19

slide-5
SLIDE 5

Biological / clinical context

DNA copy number alteration in tumour

tumoral cell normal cell

Chaos in cancer cells

gain, loss or amplification of chromosomes or pieces of chromosomes.

Molecular profiling of tumours

Identification of DNA copy number alterations in each patient Is the pattern of alterations is related to patient outcome (e.g. relapse, metastasis)?

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 4 / 19

slide-6
SLIDE 6

Biological / clinical context

High-throughput quantification of DNA copy number

Microarray technology

DNA copy number for 5 × 103 up to 2 × 106 genomic loci Probes spotted on a glass array (i.e. the microarray)

microarray Colour study: squares with concentric circles Wassily Kandinsky, 1913

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 5 / 19

slide-7
SLIDE 7

Biological / clinical context

High-throughput quantification of DNA copy number

Microarray technology

DNA copy number for 5 × 103 up to 2 × 106 genomic loci Probes spotted on a glass array (i.e. the microarray)

N-MYC amplification 1q gain 1p loss 17q gain 1q gain Unbalanced translocation 1p - 17q

DNA copy number profile of the tumour Karyotype of the tumour

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 5 / 19

slide-8
SLIDE 8

Biological / clinical context

High-throughput quantification of DNA copy number

Microarray technology

DNA copy number for 5 × 103 up to 2 × 106 genomic loci Probes spotted on a glass array (i.e. the microarray)

N-MYC amplification 1q gain 1p loss 17q gain 1q gain Unbalanced translocation 1p - 17q

DNA copy number profile of the tumour Karyotype of the tumour

Huge amount of data (∼ 2 × 106 variables for each patient)

Need for biostatistical algorithms and automatic bioinformatic pipelines

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 5 / 19

slide-9
SLIDE 9

Biological / clinical context

Biostatistical workflow

Biological/clinical question Experimental design High-throughput experiments DNA copy number, mRNA expression Extraction of the biological information Clinical biostatistics Classification Systems biology Biological/clinical validation and interpretation Normalisation Quality control Image analysis

➊ ➋ ➌ ➍ ➎ ➏ ➐ ➑ ➒

R packages available from www.bioconductor.org

MANOR: spatial normalisation GLAD: extraction of the biological information ITALICS: normalisation + extraction of the biological information

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 6 / 19

slide-10
SLIDE 10

R packages description

Outline

1

Biological / clinical context

2

R packages description

3

End-user interfaces / automatic workflow

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 7 / 19

slide-11
SLIDE 11

R packages description

MANOR: an algorithm to detect spatial bias

Neuvial et al., BMC Bioinformatics, 2006

1

Abnormal Log-Ratio in the corner

2

Spatial trend estimation by 2D-LOESS

3

Spatial segmentation

4

Bias area are removed

5

Spots are outliers in the genomic profile

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 8 / 19

slide-12
SLIDE 12

R packages description

MANOR: an algorithm to detect spatial bias

Neuvial et al., BMC Bioinformatics, 2006

1

Abnormal Log-Ratio in the corner

2

Spatial trend estimation by 2D-LOESS

3

Spatial segmentation

4

Bias area are removed

5

Spots are outliers in the genomic profile

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 8 / 19

slide-13
SLIDE 13

R packages description

MANOR: an algorithm to detect spatial bias

Neuvial et al., BMC Bioinformatics, 2006

1

Abnormal Log-Ratio in the corner

2

Spatial trend estimation by 2D-LOESS

3

Spatial segmentation

4

Bias area are removed

5

Spots are outliers in the genomic profile

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 8 / 19

slide-14
SLIDE 14

R packages description

MANOR: an algorithm to detect spatial bias

Neuvial et al., BMC Bioinformatics, 2006

1

Abnormal Log-Ratio in the corner

2

Spatial trend estimation by 2D-LOESS

3

Spatial segmentation

4

Bias area are removed

5

Spots are outliers in the genomic profile

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 8 / 19

slide-15
SLIDE 15

R packages description

MANOR: an algorithm to detect spatial bias

Neuvial et al., BMC Bioinformatics, 2006

1

Abnormal Log-Ratio in the corner

2

Spatial trend estimation by 2D-LOESS

3

Spatial segmentation

4

Bias area are removed

5

Spots are outliers in the genomic profile

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 8 / 19

slide-16
SLIDE 16

R packages description

GLAD: Gain and Loss Analysis of DNA

Hupé et al., Bioinformatics, 2004

Profile segmentation

The GLAD algorithm aims at identifying chromosomal regions with identical DNA copy number.

1

Log-Ratio profile

2

Smoothing line estimation

3

Breakpoint detection

4

Status assignment

5

Outliers detection It works with BAC array, cDNA array, oligonucleotide array (Affymetrix, Agilent, Nimblegen, Illumina)

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 9 / 19

slide-17
SLIDE 17

R packages description

GLAD: Gain and Loss Analysis of DNA

Hupé et al., Bioinformatics, 2004

Profile segmentation

The GLAD algorithm aims at identifying chromosomal regions with identical DNA copy number.

1

Log-Ratio profile

2

Smoothing line estimation

3

Breakpoint detection

4

Status assignment

5

Outliers detection It works with BAC array, cDNA array, oligonucleotide array (Affymetrix, Agilent, Nimblegen, Illumina)

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 9 / 19

slide-18
SLIDE 18

R packages description

GLAD: Gain and Loss Analysis of DNA

Hupé et al., Bioinformatics, 2004

Profile segmentation

The GLAD algorithm aims at identifying chromosomal regions with identical DNA copy number.

1

Log-Ratio profile

2

Smoothing line estimation

3

Breakpoint detection

4

Status assignment

5

Outliers detection It works with BAC array, cDNA array, oligonucleotide array (Affymetrix, Agilent, Nimblegen, Illumina)

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 9 / 19

slide-19
SLIDE 19

R packages description

GLAD: Gain and Loss Analysis of DNA

Hupé et al., Bioinformatics, 2004

Profile segmentation

The GLAD algorithm aims at identifying chromosomal regions with identical DNA copy number.

1

Log-Ratio profile

2

Smoothing line estimation

3

Breakpoint detection

4

Status assignment

5

Outliers detection It works with BAC array, cDNA array, oligonucleotide array (Affymetrix, Agilent, Nimblegen, Illumina)

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 9 / 19

slide-20
SLIDE 20

R packages description

GLAD: Gain and Loss Analysis of DNA

Hupé et al., Bioinformatics, 2004

Profile segmentation

The GLAD algorithm aims at identifying chromosomal regions with identical DNA copy number.

1

Log-Ratio profile

2

Smoothing line estimation

3

Breakpoint detection

4

Status assignment

5

Outliers detection It works with BAC array, cDNA array, oligonucleotide array (Affymetrix, Agilent, Nimblegen, Illumina)

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 9 / 19

slide-21
SLIDE 21

R packages description

GLAD: Gain and Loss Analysis of DNA

Hupé et al., Bioinformatics, 2004

Profile segmentation

The GLAD algorithm aims at identifying chromosomal regions with identical DNA copy number.

1

Log-Ratio profile

2

Smoothing line estimation

3

Breakpoint detection

4

Status assignment

5

Outliers detection It works with BAC array, cDNA array, oligonucleotide array (Affymetrix, Agilent, Nimblegen, Illumina)

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 9 / 19

slide-22
SLIDE 22

R packages description

ITALICS: ITerative and Alternative normaLIzation of Copy number Snp array Rigaill et al., Bioinformatics, 2008

Devoted to the analysis of Affymetrix Genome-Wide SNP chip

the specificities of the affymetrix technology are taken into account in the algorithm the signal to noise ratio is better the breakpoint location is more accurate

Spatial artifact

1600 aberrant values ITALICS

90 aberrant values, 13 SNP discarded

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 10 / 19

slide-23
SLIDE 23

End-user interfaces / automatic workflow

Outline

1

Biological / clinical context

2

R packages description

3

End-user interfaces / automatic workflow

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 11 / 19

slide-24
SLIDE 24

End-user interfaces / automatic workflow

Biologist / Clinician end-users

need to visualise their data biological interpretation of their data not necessarly familiar with R programing language no biostatistician/bioinformatician in their lab need easy-to-use interfaces

Diffusion of statistical methods within the scientific community

If we want our statistical methods to be used, we need to package them properly.

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 12 / 19

slide-25
SLIDE 25

End-user interfaces / automatic workflow

Biologist / Clinician end-users

need to visualise their data biological interpretation of their data not necessarly familiar with R programing language no biostatistician/bioinformatician in their lab need easy-to-use interfaces

Diffusion of statistical methods within the scientific community

If we want our statistical methods to be used, we need to package them properly.

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 12 / 19

slide-26
SLIDE 26

End-user interfaces / automatic workflow

VAMP: a software to visualise and analyse data

La Rosa et al., Bioinformatics, 2006

http://bioinfo.curie.fr/vamp

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 13 / 19

slide-27
SLIDE 27

End-user interfaces / automatic workflow

Our tools fo DNA copy number experiments

R packages (MANOR, GLAD, ITALICS) for biostatistical analysis VAMP java software for visualisation (and analysis)

Need for an integrated environment

CAPweb is a web interface which allows the use of all the previous tools.

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 14 / 19

slide-28
SLIDE 28

End-user interfaces / automatic workflow

Our tools fo DNA copy number experiments

R packages (MANOR, GLAD, ITALICS) for biostatistical analysis VAMP java software for visualisation (and analysis)

Need for an integrated environment

CAPweb is a web interface which allows the use of all the previous tools.

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 14 / 19

slide-29
SLIDE 29

End-user interfaces / automatic workflow

CAPweb: an end-user web platform

Liva et al., Nucleic Acids Research, 2006

http://bioinfo.curie.fr/capweb user registration project management input file: Genepix, spot, Imagen, Feature Extraction, CEL normalisation: MANOR, ITALICS breakpoint detection: GLAD summary report visualise the data with VAMP array technology: BAC, cDNA, Agilent, Affymetrix (100K, 500K) (Illumina, Nimblegen soon) integration of clinical data integration of mRNA data

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 15 / 19

slide-30
SLIDE 30

End-user interfaces / automatic workflow

CAPweb: an end-user web platform

Liva et al., Nucleic Acids Research, 2006

http://bioinfo.curie.fr/capweb user registration project management input file: Genepix, spot, Imagen, Feature Extraction, CEL normalisation: MANOR, ITALICS breakpoint detection: GLAD summary report visualise the data with VAMP array technology: BAC, cDNA, Agilent, Affymetrix (100K, 500K) (Illumina, Nimblegen soon) integration of clinical data integration of mRNA data

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 15 / 19

slide-31
SLIDE 31

End-user interfaces / automatic workflow

CAPweb: an end-user web platform

Liva et al., Nucleic Acids Research, 2006

http://bioinfo.curie.fr/capweb user registration project management input file: Genepix, spot, Imagen, Feature Extraction, CEL normalisation: MANOR, ITALICS breakpoint detection: GLAD summary report visualise the data with VAMP array technology: BAC, cDNA, Agilent, Affymetrix (100K, 500K) (Illumina, Nimblegen soon) integration of clinical data integration of mRNA data

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 15 / 19

slide-32
SLIDE 32

End-user interfaces / automatic workflow

CAPweb: an end-user web platform

Liva et al., Nucleic Acids Research, 2006

http://bioinfo.curie.fr/capweb user registration project management input file: Genepix, spot, Imagen, Feature Extraction, CEL normalisation: MANOR, ITALICS breakpoint detection: GLAD summary report visualise the data with VAMP array technology: BAC, cDNA, Agilent, Affymetrix (100K, 500K) (Illumina, Nimblegen soon) integration of clinical data integration of mRNA data

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 15 / 19

slide-33
SLIDE 33

End-user interfaces / automatic workflow

CAPweb: an end-user web platform

Liva et al., Nucleic Acids Research, 2006

http://bioinfo.curie.fr/capweb user registration project management input file: Genepix, spot, Imagen, Feature Extraction, CEL normalisation: MANOR, ITALICS breakpoint detection: GLAD summary report visualise the data with VAMP array technology: BAC, cDNA, Agilent, Affymetrix (100K, 500K) (Illumina, Nimblegen soon) integration of clinical data integration of mRNA data

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 15 / 19

slide-34
SLIDE 34

End-user interfaces / automatic workflow

CAPweb: an end-user web platform

Liva et al., Nucleic Acids Research, 2006

http://bioinfo.curie.fr/capweb user registration project management input file: Genepix, spot, Imagen, Feature Extraction, CEL normalisation: MANOR, ITALICS breakpoint detection: GLAD summary report visualise the data with VAMP array technology: BAC, cDNA, Agilent, Affymetrix (100K, 500K) (Illumina, Nimblegen soon) integration of clinical data integration of mRNA data

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 15 / 19

slide-35
SLIDE 35

End-user interfaces / automatic workflow

CAPweb: an end-user web platform

Liva et al., Nucleic Acids Research, 2006

http://bioinfo.curie.fr/capweb user registration project management input file: Genepix, spot, Imagen, Feature Extraction, CEL normalisation: MANOR, ITALICS breakpoint detection: GLAD summary report visualise the data with VAMP array technology: BAC, cDNA, Agilent, Affymetrix (100K, 500K) (Illumina, Nimblegen soon) integration of clinical data integration of mRNA data

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 15 / 19

slide-36
SLIDE 36

End-user interfaces / automatic workflow

CAPweb: an end-user web platform

Liva et al., Nucleic Acids Research, 2006

http://bioinfo.curie.fr/capweb user registration project management input file: Genepix, spot, Imagen, Feature Extraction, CEL normalisation: MANOR, ITALICS breakpoint detection: GLAD summary report visualise the data with VAMP array technology: BAC, cDNA, Agilent, Affymetrix (100K, 500K) (Illumina, Nimblegen soon) integration of clinical data integration of mRNA data

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 15 / 19

slide-37
SLIDE 37

End-user interfaces / automatic workflow

Client / Server Architecture

DNA COPY NUMBER MICROARRAYS Databases mySQL Web Server Client(s): Firefox, ...

VAMP

Applet

R packages:

  • MANOR
  • GLAD
  • ITALICS

DATA MANAGEMENT XML Files

I N T E R F A C E S

PROJECT MANAGEMENT

WEB WEB UCSC GENECARDS ENSEMBL NCBI

Our R packages are used calling CGI from any web browser

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 16 / 19

slide-38
SLIDE 38

End-user interfaces / automatic workflow

Recent evolutions and perspectives

Recent changes

Possibility to use HaarSeg algorithm (Ben-Yaacov and Eldar, Bioinformatics, 2008) in GLAD → 2 millions genomic profiles can be analysed within 1 minute Use C/C++ in order to reduce computing time

On-going work

Improvement of ITALICS in order to analyse Affymetrix Genome Wide SNP 6.0 Extension to Next-Generation Sequencing technologies (Terabytes of data!!) aroma.affymetrix (Bengtsson et al.) R package offers many functionalities for affymetrix data analysis

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 17 / 19

slide-39
SLIDE 39

End-user interfaces / automatic workflow

Recent evolutions and perspectives

Recent changes

Possibility to use HaarSeg algorithm (Ben-Yaacov and Eldar, Bioinformatics, 2008) in GLAD → 2 millions genomic profiles can be analysed within 1 minute Use C/C++ in order to reduce computing time

On-going work

Improvement of ITALICS in order to analyse Affymetrix Genome Wide SNP 6.0 Extension to Next-Generation Sequencing technologies (Terabytes of data!!) aroma.affymetrix (Bengtsson et al.) R package offers many functionalities for affymetrix data analysis

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 17 / 19

slide-40
SLIDE 40

End-user interfaces / automatic workflow

Recent evolutions and perspectives

Recent changes

Possibility to use HaarSeg algorithm (Ben-Yaacov and Eldar, Bioinformatics, 2008) in GLAD → 2 millions genomic profiles can be analysed within 1 minute Use C/C++ in order to reduce computing time

On-going work

Improvement of ITALICS in order to analyse Affymetrix Genome Wide SNP 6.0 Extension to Next-Generation Sequencing technologies (Terabytes of data!!) aroma.affymetrix (Bengtsson et al.) R package offers many functionalities for affymetrix data analysis

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 17 / 19

slide-41
SLIDE 41

Acknowledgements

Acknowledgements

Institut Curie / Bioinformatics U900 Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 18 / 19

slide-42
SLIDE 42

Thanks

THANKS

Hupé et al. (Institut Curie, Paris, France) Analysis of DNA copy number experiments INSERM workshop, 2009 19 / 19