Extracting relevant information from UHTS data: analysis pipelines - - PowerPoint PPT Presentation

extracting relevant information from uhts data analysis
SMART_READER_LITE
LIVE PREVIEW

Extracting relevant information from UHTS data: analysis pipelines - - PowerPoint PPT Presentation

Extracting relevant information from UHTS data: analysis pipelines (smallRNA) Patricia Otten 3th July 2012 JOBIM Rennes - France Fasteris SA: Illumina sequencing - founded in 2003 by L. FARINELLI and M. OSTERAS - 2012: about 20


slide-1
SLIDE 1

Extracting relevant information from UHTS data: analysis pipelines (smallRNA)

Patricia Otten 3th July 2012

JOBIM

Rennes - France

slide-2
SLIDE 2

3th July 2012 JOBIM - Rennes 2

  • founded in 2003 by L. FARINELLI and M. OSTERAS
  • 2012: about 20 collaborators
  • capillary and UHTS sequencing + bioinformatics
  • private and academic labs
  • no business plan, no external investors, no sales forces

Fasteris SA: Illumina sequencing

slide-3
SLIDE 3

3th July 2012 JOBIM - Rennes 3

Key technology based on the concept of DNA colonies, invented in 1996 at the GlaxoWellcome's Geneva Biomedical Research Institute

Mayer P., Farinelli L. and Kawashima, E., 1997, Patent application WO 98/44151

Illumina sequencing

slide-4
SLIDE 4

3th July 2012 JOBIM - Rennes 4

Library preparation (smallRNA protocol)

3 ug total RNA selection of small RNAs (20-30 nt) acrylamide gel purification single-stranded ligation

  • f the 3' adapter

single-stranded ligation

  • f the 5' adapter

reverse transcription, PCR, index addition, gel purification

P7

P5

Illumina sequencing: step1

index

library

slide-5
SLIDE 5

3th July 2012 JOBIM - Rennes 5

Templates are hybridized to a surface (flowcell) and in situ amplified (bridge amplification) to form DNA colonies.

  • each colony produces one read
  • all colonies are sequenced in parallel
  • ~150 mio passed filter reads per lane

Illumina sequencing: step2

x

Flowcell preparation

slide-6
SLIDE 6

3th July 2012 JOBIM - Rennes 6

Illumina sequencing: step 3

Incorporation of reversible- terminator nucleotides labeled with fluorescent dyes

  • base per base sequencing (50, 100 cycles, SR or PE)
  • laser excitation and image capture; release of dye;
  • intensities extraction and base calling by RTA software

1x100 run: 1 week; 1.5 TB intensities; 200 GB sequences;

Sequencing

slide-7
SLIDE 7

3th July 2012 JOBIM - Rennes 7

Adapter trimming

Trimming (smallRNAs)

slide-8
SLIDE 8

3th July 2012 JOBIM - Rennes 8

Introduction to smallRNAs

chemical modifications of

  • ther RNAs, mainly rRNAs,

tRNAs and snRNAs RNA splicing, guides for telomere elongation translation from dsRNA downregulation transposons silencing poorly conserved downregulation of genes highly conserved

Sequencing by siRNA: a novel generic tool for virus discovery

Kreuze et al. (2009) Complete viral genome sequence and discovery of

novel viruses by deep sequencing of small RNAs: a generic method for diagnosis, discovery and sequencing of viruses. Virology 388: 1-7

Expression analysis Virus assembly

slide-9
SLIDE 9

3th July 2012 JOBIM - Rennes 9

Introduction to smallRNAs

slide-10
SLIDE 10

3th July 2012 JOBIM - Rennes 10

Pipelines and automation → automation + checks → handle unexpected issues, keep time for the client → a pipeline is a set of predetermined tasks that have to be executed to complete a specific analysis produce meaningfull data time resources

www.photo-dictionary.com

slide-11
SLIDE 11

3th July 2012 JOBIM - Rennes 11

Pipelines and automation Eg: comparison of libraries in terms of miRNA coverage mapping (ref. genome) miRNA coverage normalization library comparison visualization mapping (ref. genome) Makefiles Bash scripts R scripts

  • 1. reference
  • 2. indexing
  • 3. mapping
  • 4. format conversion
  • 5. reporting

Each module may involve one

  • r several processes

insert selection

slide-12
SLIDE 12

3th July 2012 JOBIM - Rennes 12

reads Expression pipelines inserts mapping (ref. genome) sequence profile annotated features peak detection mapping (sequence db) post-processing coverage

BWA Bash R Seqmonk Bedtools Bash/R Perl PMRD mirBase iGenome

slide-13
SLIDE 13

3th July 2012 JOBIM - Rennes 13

reads Expression inserts mapping (ref. genome) sequence profile annotated features peak detection mapping (sequence db) post-processing coverage

BWA Bash R Seqmonk Bedtools Bash/R Perl PMRD mirBase iGenome

slide-14
SLIDE 14

3th July 2012 JOBIM - Rennes 14

reads Expression inserts mapping (ref. genome) sequence profile annotated features peak detection mapping (sequence db) post-processing coverage

BWA Bash R Seqmonk Bedtools Bash/R Perl PMRD mirBase iGenome

slide-15
SLIDE 15

3th July 2012 JOBIM - Rennes 15

reads Expression inserts mapping (ref. genome) sequence profile annotated features peak detection mapping (sequence db) post-processing coverage

BWA Bash R Seqmonk Bedtools Bash/R Perl PMRD mirBase iGenome

23 24 25 26 27 28 29

  • 150000
  • 100000
  • 50000

50000 100000 150000 200000 250000

Size profile, BDGP5.25

LIB-1

Insert size RPM

slide-16
SLIDE 16

3th July 2012 JOBIM - Rennes 16

reads Expression inserts mapping (ref. genome) sequence profile annotated features peak detection mapping (sequence db) post-processing coverage

BWA Bash R Seqmonk Bedtools Bash/R Perl PMRD mirBase iGenome

slide-17
SLIDE 17

3th July 2012 JOBIM - Rennes 17

reads Expression inserts mapping (ref. genome) sequence profile annotated features peak detection mapping (sequence db) post-processing coverage

BWA Bash R Seqmonk Bedtools Bash/R Perl PMRD mirBase iGenome

slide-18
SLIDE 18

3th July 2012 JOBIM - Rennes 18

reads Expression inserts mapping (ref. genome) sequence profile annotated features peak detection mapping (sequence db) post-processing coverage

BWA Bash R Seqmonk Bedtools Bash/R Perl PMRD mirBase iGenome

slide-19
SLIDE 19

3th July 2012 JOBIM - Rennes 19

reads Expression inserts mapping (ref. genome) sequence profile annotated features peak detection mapping (sequence db) post-processing coverage

BWA Bash R Seqmonk Bedtools Bash/R Perl PMRD mirBase iGenome count insertNb [M] * probeLength [K] RPKM =

slide-20
SLIDE 20

3th July 2012 JOBIM - Rennes 20

reads Expression inserts mapping (ref. genome) sequence profile annotated features peak detection mapping (sequence db) post-processing coverage

BWA Bash R Seqmonk Bedtools Bash/R Perl PMRD mirBase iGenome

0.8 0.9 0.95 0.99

Comparison scores between pairs of libraries. n1,n2~binomial distribution with same probability of event (p=(n1/N2+n2/N2)/2); score~p(observing a count <n1 or >n2)

slide-21
SLIDE 21

3th July 2012 JOBIM - Rennes 21

reads Expression inserts mapping (ref. genome) sequence profile annotated features peak detection mapping (sequence db) post-processing coverage

BWA Bash R Seqmonk Bedtools Bash/R Perl PMRD mirBase iGenome

s

slide-22
SLIDE 22

3th July 2012 JOBIM - Rennes 22

Virus identification

SiRNAs:

  • class of dsRNAs of 20-25 nts
  • involved in post-transcriptional gene silencing
  • endogenous or exogenous

→ synthetic dsRNA introduced into cells can induce silencing of specific

genes of interest → viral infection: presence of viral dsRNA leading to siRNAs that participate in the cell antiviral response;

Sequencing by siRNA: a novel generic tool for virus discovery

Kreuze et al. (2009) Complete viral genome sequence and discovery of

novel viruses by deep sequencing of small RNAs: a generic method for diagnosis, discovery and sequencing of viruses. Virology 388: 1-7

slide-23
SLIDE 23

3th July 2012 JOBIM - Rennes 23

reads (infected sample) Virus assembly pipelines inserts mapping (host ref. genome) mapping (host contigs) mapping (viral db) de novo assembly (unmapped inserts) reads (control sample) inserts de novo assembly

Perl/R Perl/R BWA Velvet+Oases Velvet+Oases Blast/BWA/Mummer Velvet+Oases iGenome RefSeq

slide-24
SLIDE 24

3th July 2012 JOBIM - Rennes 24

reads (infected library) Virus assembly inserts mapping (host ref. genome) mapping (host contigs) mapping (viral db) de novo assembly (unmapped inserts) reads (control library) inserts de novo assembly

Perl Perl BWA Velvet+Oases Velvet+Oases Blast/BWA/Mummer Velvet+Oases iGenome RefSeq

slide-25
SLIDE 25

3th July 2012 JOBIM - Rennes 25

reads (infected library) Virus assembly inserts mapping (host ref. genome) mapping (host contigs) mapping (viral db) de novo assembly (unmapped inserts) reads (control library) inserts de novo assembly

Perl Perl BWA Velvet+Oases Velvet+Oases Blast/BWA/Mummer Velvet+Oases iGenome RefSeq

slide-26
SLIDE 26

3th July 2012 JOBIM - Rennes 26

reads (infected library) Virus assembly inserts mapping (host ref. genome) mapping (host contigs) mapping (viral db) de novo assembly (unmapped inserts) reads (control library) inserts de novo assembly

Perl Perl BWA Velvet+Oases Velvet+Oases Blast/BWA/Mummer Velvet+Oases iGenome RefSeq

slide-27
SLIDE 27

3th July 2012 JOBIM - Rennes 27

reads (infected library) Virus assembly inserts mapping (host ref. genome) mapping (host contigs) mapping (viral db) de novo assembly (unmapped inserts) reads (control library) inserts de novo assembly

Perl Perl BWA Velvet+Oases Velvet+Oases Blast/BWA/Mummer Velvet+Oases iGenome RefSeq

slide-28
SLIDE 28

3th July 2012 JOBIM - Rennes 28

Thank you for your attention