Complete Genome analysis: Detection of copy number alterations and loss of heterozygosity Control-FREEC tutorial
Valentina BOEVA – Institut Curie, INSERM, Mines ParisTech
Detection of copy number alterations and loss of heterozygosity - - PowerPoint PPT Presentation
Complete Genome analysis: Detection of copy number alterations and loss of heterozygosity Control-FREEC tutorial Valentina BOEVA Institut Curie, INSERM, Mines ParisTech Workshop outlines Motivation for copy number detection in cancer
Valentina BOEVA – Institut Curie, INSERM, Mines ParisTech
3
4
advanced-stage disease, rapid tumor progression, and poor prognosis.
5
advanced-stage disease, rapid tumor progression, and poor prognosis.
6
From Kawa K et al. JCO 1999
Overall survival curve for MYCN-amplified neuroblastoma patients relative to treatment after induction chemotherapy. A, patients who underwent autologous bone marrow transplantation (ABMT)/peripheral-blood stem-cell transplantation (PBSCT) ; B, patients who did not undergo ABMT/PBSCT.
From Schneiderman, J. et al. 2008
Kaplan-Meier survival curves for 600 stage A, B, and Ds patients by MYCN status. Event-free survival.
Probability of event- free survival (%)
7
Adopted from C. Link et al., 2011
In UPD, a person receives two copies of a chromosome, or part of a chromosome, from
This acquired homozygosity could lead to development of cancer if the individual inherited a non-functional allele of a tumor suppressor gene.
8
From Wikipedia
9
10 From Carén H et al. PNAS 2010;107:4323-4328
Kaplan-Meier overall survival for patients with tumors with different genomic profiles.
11
12
13
14
15
– read count in each window chromosome position
16
17
1000 2000 3000 200 400 600 800 50kb-window, chr 5 1000 2000 3000 50kb-window, chr 5
Read count per 50kb-window
200 600 1000
Position, chr5
18
1000 2000 3000 0.0 1.0 2.0 3.0 50kb-window, chr 5 Normalized Read Count
Normalized read count per 50kb- window Position, chr5
19
20
Read count per 100kb-window
– main component – components corresponding to losses and gains
Control, COLO-829BL mate pairs NCI-H2171 paired ends COLO-829 mate pairs
21
GC-content GC-content GC-content Read count per 50kb-window
gi = GC-content in window i RCi = is read count in window i, NRCi = resulting normalized read count
i i i
– normal copy number – loss – gain
Normalized copy number Genomic position (3-kb window), chr5
24
– normal copy number – loss – gain
25
100 200 300 400 0.0 1.0 2.0 window along the genome normalized ratio 100 200 300 400 0.0 1.0 2.0 window along the genome normalized ratio
26
.
Normalized copy number Genomic position (3-kb window) 27 Normalized copy number Genomic position (3-kb window)
.
Normalized copy number Genomic position (3-kb window) 28 Normalized copy number Genomic position (3-kb window)
29
from Kost-Alimova et al, BMC Cancer 2007
30
http://compbio.cs.brown.edu/projects/theta/
from high-throughput DNA sequencing data. Genome Biology. 14:R80.
31
32
33
34
35
The fit indicates that the genotype = AB The fit indicates that the genotype = AA/BB with 40% contamination by normal (“AB”) cells
Fit with 3 modes:
Fit with 4 modes:
36
37
38
39
40
Allelic status Copy number BAF*
A/B 1 0 1 AB 2 0 1 0.5 AA/BB 2 0 1 AAB/ABB 3 0 1 0.33 0.66 AAA/BBB 3 0 1 AABB 4 0 1 0.5 AAAB/ABB B 4 0 1 0.25 0.75 AAAA/BBB B 4 0 1 … … …
41
Allelic status Copy numbe r BAF**
A/B 1 0 1 p/(1+p) 1/(1+p) AB 2 0 1 0.5 AA/BB 2 0 1 p/2 1-p/2 AAB/ABB 3 0 1 1/(1-p) 1-1/(1-p) AAA/BBB 3 0 1 p/(3-p) 1-p(3-p) AABB 4 0 1 0.5 AAAB/ABB B 4 0 1 1/(4-2p) 1-1/(4- 2p) AAAA/BBB B 4 0 1 p/(4-2p) 1-p/(4- 2p) … … …
42
Allelic ratio about ½: A/B: logLikelihood
AA/BB: logLikelihood -3270 AB: logLikeluhood 120
43
44
45 Distribution of read count (RC) in tumor cells vs normal cells. Each point represents the number of sequence reads aligned to a 50-kb window (also called RC) for a control cell line vs tumor cell line. A,B - COLO-829 cell line; C,D - HCC1143 cell line. Red and yellow dots represent a fit for tumor genome ploidy. Red – near-triploid genome (A and C), yellow – near-tetraploid genome (B and D). Red points provide better fit to RC density than yellow dots, suggesting near-triploidy of COLO-829 and HCC1143.
tumor.genome.bam
In Galaxy version use only « Any » Mappabity for 35-100bp kmers is very similar; read length does not really matter Better to know in advance Better to run « without contamination » at first « yes » only for high depth of coverage data
Normal Read count Tumor Read count
~mappability threshold Run the first time without contamination adjustment Set to “Yes” to get allelic profiles and LOH Run the first time without this option
Set to >=2 if you are not interested in one-exon outliers
Run with contamination adjustment Will use BAF profiles for copy number prediction in ambiguous cases In this case, the contamination seems to be about 25%; Leave the field black if you don’t have expertise to evaluate contamination by eye
Increase the threshold if you want less breakpoints