Detection of copy number alterations and loss of heterozygosity - PowerPoint PPT Presentation

Complete Genome analysis: Detection of copy number alterations and loss of heterozygosity Control-FREEC tutorial Valentina BOEVA – Institut Curie, INSERM, Mines ParisTech

Workshop outlines • Motivation for copy number detection in cancer samples • ControlFREEC tool presentation  Methodology & functionalities • ControlFREEC tutorial on Galaxy  Hands on workshop

Cancer genomes are often significantly rearranged A 24 color karyotype of a neuroblastoma cell line 3

In cancer genome, it is important to detect CNAs and LOH CNAs – copy number alterations: • Large-scale genomic deletions • Large-scale genomic duplications • Amplicons (duplications >10 times) LOH – loss of heterozygosity regions 4

Amplification of an important gene can favor cancer development MYCN amplification, which occurs in approximately 22% of primary neuroblastomas, is one • of the most powerful prognostic factors identified to date. It is significantly associated with advanced-stage disease, rapid tumor progression, and poor prognosis. MYCN part of chr2 DDX1 more than 100 copies 5

Amplification of an important gene can favor cancer development MYCN amplification, which occurs in approximately 22% of primary neuroblastomas, is one • of the most powerful prognostic factors identified to date. It is significantly associated with advanced-stage disease, rapid tumor progression, and poor prognosis. Probability of event- free survival (%) From Kawa K et al. JCO 1999 From Schneiderman, J. et al. 2008 Overall survival curve for MYCN-amplified neuroblastoma patients relative to treatment after induction chemotherapy. Kaplan-Meier survival curves for 600 stage A, B, and Ds patients by A, patients who underwent autologous bone marrow MYCN status. Event-free survival. transplantation (ABMT)/peripheral-blood stem-cell transplantation (PBSCT) ; B, patients who did not undergo ABMT/PBSCT. 6

Deletion in an important gene can favor cancer development • Patient was treated again breast and ovarian cancer • She developed therapy- related acute myeloid leukemia (t-AML) • Whole-genome sequencing revealed a novel, heterozygous 3-kilobase deletion removing exons 7-9 of TP53 in the patient’s normal skin DNA, which was homozygous in the leukemia DNA as a result of acquired uniparental disomy . Adopted from C. Link et al., 2011 7

Copy neutral loss of heterozygosity (LOH) or acquired uniparental disomy (UPD) often happens in cancer In UPD, a person receives two copies of a chromosome, or part of a chromosome, from one parent and no copies from the other parent. This acquired homozygosity could lead to development of cancer if the individual inherited a non-functional allele of a tumor suppressor gene. 8 From Wikipedia

Identification of regions of gain and loss helps to predict the aggressiveness of cancer Copy number profile (chr 11) of a metastatic neuroblastoma sample: 9

Identification of regions of gain and loss helps to predict the aggressiveness of cancer From Carén H et al. PNAS 2010;107:4323-4328 Kaplan-Meier overall survival for patients with tumors with different genomic profiles. 10

Detection of SNVs, indels, structural variants, copy number changes and LOH has become possible with Next Generation Sequencing (NGS) • Next Generation sequencing = Fast, Accurate Reading of DNA  Whole genome  Exome sequencing  Targeted sequencing 11

Detection of SNVs, indels, structural variants, copy number changes and LOH has become possible with Next Generation Sequencing (NGS) • Next Generation sequencing = Fast, Accurate Reading of DNA  Whole genome  Sequencing of the whole cancer genome including intragenic regions and introns  Complete information about the genome  Exome sequencing  Targeted sequencing 12

Detection of SNVs, indels, structural variants, copy number changes and LOH has become possible with Next Generation Sequencing (NGS) • Next Generation sequencing = Fast, Accurate Reading of DNA  Whole genome  Exome sequencing  Sequencing of exons of ~20000 well characterized genes  Complete information about SNVs, indels and copy number changes of the coding part of the genome  Targeted sequencing 13

Detection of SNVs, indels, structural variants, copy number changes and LOH has become possible with Next Generation Sequencing (NGS) • Next Generation sequencing = Fast, Accurate Reading of DNA  Whole genome  Exome sequencing  Targeted sequencing  Complete information about SNVs, indels, copy numbers of a small panel of genes (10-500) actionable in cancer 14

Today we will speak only about detection of CNAs and LOH, only in WGS and WES data • Screenshot. 15

Read count (RC) is calculated in sliding windows – read count in each window Gain Normal Loss chromosome position 16

We need to normalize read count per window to get meaningful profiles Sample Control 1000 800 Read count per 50kb-window 600 600 400 200 200 0 0 0 0 1000 1000 2000 2000 3000 3000 50kb-window, chr 5 50kb-window, chr 5 Position, chr5 Loss 17 ?

If control is available, the problem is easily solved 3.0 Normalized Read Count Normalized read count per 50kb- window 2.0 1.0 0.0 0 1000 2000 3000 Position, chr5 50kb-window, chr 5 Loss 18

If there is no control dataset, normalization can be done using the GC-content Control GC-content Position, chr5 19

RC can be modeled as a polynomial on GC-content A scatter plot shows the dependency RC ~ GC-content Read count per 100kb-window GC-content 20 ?

RC can be modeled as a polynomial on GC-content Control, COLO-829BL COLO-829 NCI-H2171 mate pairs mate pairs paired ends Read count per 50kb-window GC-content GC-content GC-content – main component – components corresponding to losses and gains Here RC was modeled as a polynomial of order three on GC-content 21

The resulting profiles are segmented to detect gains and losses g i = GC-content in window i RC   Transformation: i NRC ploidy RC i = is read count in window i, i f ( g ) NRC i = resulting normalized read count i – normal copy number – loss – gain Normalized copy number Genomic position (3-kb window), chr5

In summary • Control-FREEC detects Copy Number Alterations (CNAs) in whole genome sequencing data • Control-FREEC uses a sliding window approach • It also allows visualizing CNAs and LOH at the genome scale

Visualization of copy number profiles calculated by software FREEC – normal copy number – loss – gain 24

There are 3 problems of genomic profiling 1. Reference point for copy number variation (diploid, triploid, tetraploid genomes) One copy gain in a diploid genome One copy gain in a tetraploid genome 2.0 2.0 normalized ratio normalized ratio 1.0 1.0 0.0 0.0 0 100 200 300 400 0 100 200 300 400 window along the genome window along the genome 25

There are 3 problems of genomic profiling 2. Contamination of tumor samples by normal stroma cells 26

. We can evaluate contamination of a tumor sample by normal cells Normalized copy number Normalized copy number Genomic position (3-kb window) Genomic position (3-kb window) 27

. We can evaluate contamination of a tumor sample by normal cells Normalized copy number Normalized copy number Genomic position (3-kb window) Genomic position (3-kb window) 28

There are 3 problems of genomic profiling 3. Intra-tumoral heterogeneity from Kost-Alimova et al, BMC Cancer 2007 29

There are 3 problems of genomic profiling 3. Intra-tumoral heterogeneity One solution: Tumor Heterogeneity Analysis (THetA) http://compbio.cs.brown.edu/projects/theta/ L. Oesper, A. Mahmoody, and B.J. Raphael. (2013) THetA: Inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biology. 14:R80. 30

Now we want to detect genotype status (including LOH) or Loss Of Heterozygosity (LOH) 31

We characterize the allelic content via the B allele frequency (BAF) • B allele = alternative variant in dbSNP 0.44 0.5 0.57 0.45 B allele frequency (BAF) Observed nucleotide frequencies ac G atgacgtca A atgctagcgag G cacacaa T ac Reference genome (A allele) ac C atgacgtca T atgctagcgag C cacacaa A ac dbSNP (B allele) 32

There is a correspondence between copy number and possible BAF 33

We infer the genotype status of a region from B allele frequency profiles AA or BB AB 34 ?

To infer the genotype status of a region from B allele frequency profiles we use Gaussian mixture model (GMM) fit • We try different fits and choose a fit with the best likelihood The fit indicates that the genotype = AA/BB The fit indicates that the genotype = AB with 40% contamination by normal (“AB”) cells Fit with 3 modes: Fit with 4 modes: • AA • AA • AB • BB • BB • AA*0.6+AB*0.4 • BB*0.6+AB*0.4 35

Visualization of BAF 36

Extending Control-FREEC to the exome sequencing data uneven coverage of exons • Exome data:  Capture bias  GC-content and mappability correction is not enough • Mandatory use of a control sample to normalize read counts 37

Exome sequencing data may be much more noisy than whole genome sequencing data Additional bias (capture) => additional noise 38

Detection of copy number alterations and loss of heterozygosity - PowerPoint PPT Presentation

Complete Genome analysis: Detection of copy number alterations and loss of heterozygosity Control-FREEC tutorial Valentina BOEVA Institut Curie, INSERM, Mines ParisTech Workshop outlines Motivation for copy number detection in cancer

T Levels/Skills Plan Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body

GISTIC Somatic Copy Number Alterations SCNAs Step 1 Copy Number Segregation Circular Binary

A Variational Model for Joint Segmentation of Copy Number Data Sandro Morganella, Michele

Multi-cancer mutual exclusivity analysis of genomic alterations Giovanni Ciriello Computational

Chapter 2: Method of Alterations The Probabilistic Method Summer 2020 Freie Universitt Berlin

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Honeoye Falls-Lima Central School District 2016 Additions and Alterations Phase 2 Project Update

Honeoye Falls-Lima Central School District 2016 Additions and Alterations Phase 2 Project Update

Honeoye Falls-Lima Central School District 2016 Additions and Alterations Phase 2 Project Update

Metabolic Alterations in Fumarate Hydratase Deficient Cells Christian Frezza 1 MRC Cancer Unit,

12 Tips for giving an Effective Presentation Louise Lehane, UoL, Ireland Tip Number One Tip

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Copying Objects: Reference Copy Copies: Reference vs. Shallow vs. Deep Reference Copy c1 := c2

CSCI 104 Copy Semantics Mark Redekopp David Kempe 2 Copy constructors and assignment operators

Advanced Access Content System (AACS) AACS Managed Copy Overview Presented to BD GPC October

Communication in History: The Key to Understanding Get your digital copy today at

M i c r o s a t e l l i t e e v o l u t i o n i n A d l i e p e

State of the art on molecular characterization in advanced prostate cancer (APC) Colin C.

1 Consider two loci and 1 generation of random mating: Random association in gametes Alleles at

[1] Defini=on of allele-specific expression (ASE) Adopted from Unneberg, 2010 One gene can

Familial Pulmonary Fibrosis And Role of Genetic Testing Saturday, November 3, 2018 UCSF CME

Computations with Markers Paulino Prez 1 Jos Crossa 1 1 ColPos-Mxico 2 CIMMyT-Mxico June,

PhaRmacodynamic Effects of Switching thErapy in PCI patients with high on Treatment platelet

Introduction to PLINK Scott Hazelhurst Sydney Brenner Institute for Molecular Bioscience and

Sambuz

Useful Links

Newsletter

Mail Us