support whole-exome sequencing experiments design and data - - PowerPoint PPT Presentation

support whole exome sequencing
SMART_READER_LITE
LIVE PREVIEW

support whole-exome sequencing experiments design and data - - PowerPoint PPT Presentation

An integrated annotation system to support whole-exome sequencing experiments design and data management Ivan Limongelli, Angelo Nuzzo, Annalisa Vetro, Erika Della Mina, Roberto Ciccone, Orsetta Zuffardi, Riccardo Bellazzi IRCCS C. Mondino,


slide-1
SLIDE 1

UNIVERSITÀ DI PAVIA

An integrated annotation system to support whole-exome sequencing experiments design and data management

Ivan Limongelli, Angelo Nuzzo, Annalisa Vetro, Erika Della Mina, Roberto Ciccone, Orsetta Zuffardi, Riccardo Bellazzi

IRCCS C. Mondino, Pavia Dipartimento di Genetica Medica Dipartimento di Informatica e Sistemistica

slide-2
SLIDE 2

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Exome Sequencing – Analysis Workflow

Primary Analysis Nucleotide sequences (short reads) Reads Mapping Mapping Analysis

Reference genome sequence

?

Secondary Analysis

slide-3
SLIDE 3

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Exome Sequencing – Secondary Analysis

SNVs Short In-dels

  • Variants Detection
slide-4
SLIDE 4

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Secondary Analysis – Variants Annotation

chr1 , 153170600 , A>G , NM_015383, NBPF14, I >R , ….

Does this variant affect the product (protein structure and function) coded in this region?

Public bio-databases

slide-5
SLIDE 5

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Secondary Analysis – Variants Prediction

  • Some open tools addressing this aim:
  • Polyphen2, Mutation Taster , SIFT, Annovar,

Sequence variant Analyzer

  • Suitable for loss-of-function mutations
  • Score assignment to each mutation

corresponding to its probability to damage protein

  • Principally based on type of amino acid

substitution, conservation across species, polymorphisms databases

slide-6
SLIDE 6

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Secondary Analysis – Data management

  • Some considerations
  • Illumina GAIIx platform: max 8 whole-exome

samples sequenced in each experiment

  • 13-18K variants per sample (SNVs/Indels) are

detected, but about 95% of them are common variants

  • Needs
  • I would like to easily perform cross-samples

and cross-experiments analysis

  • I would NOT like to annotate and predict

changes again for previously (already annotated) identified variants

slide-7
SLIDE 7

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Secondary Analysis – Data management

Output by software device manufacturer

Pipeline 1 Pipeline 2 Pipeline N ….

MiddelWare

(data uploads)

DB

Applications Module

  • omics DB

Functional prediction Tools

Synchronising

slide-8
SLIDE 8

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Secondary Analysis – Data management

Step 1: create experiment Step 1: create experiment

slide-9
SLIDE 9

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Secondary Analysis – Data management

Step 2: choose experiment to add a sample

slide-10
SLIDE 10

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Secondary Analysis – Data management

Step 3: create sample Step 4: upload mutation data for that sample

slide-11
SLIDE 11

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Secondary Analysis – Data management

Step 5: select cases/controls

slide-12
SLIDE 12

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Secondary Analysis – Data management

Step 5: filtering parameters setup

slide-13
SLIDE 13

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Secondary Analysis – Data management

slide-14
SLIDE 14

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Secondary Analysis – Data management

slide-15
SLIDE 15

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

Secondary Analysis – Data management

  • Current prototype implementation

allows to:

  • Store experiments and samples data
  • Store identified variants (SNVs/Indels) and

their reliability parameters (VCF 4.0 currently supported)

  • Annotate variants
  • Predict their probability to damage protein

and store results (Polyphen2, Mutation Taster, SIFT)

  • Control-case studies modelling
slide-16
SLIDE 16

Angelo Nuzzo IIT@SEMM, Milan, 2011 NETTAB, Pavia, 2011 Ivan Limongelli

The end.. Thank you for your attention!