HIV tropism assessment HIV tropism assessment HIV tropism assessment - - PowerPoint PPT Presentation

hiv tropism assessment hiv tropism assessment hiv tropism
SMART_READER_LITE
LIVE PREVIEW

HIV tropism assessment HIV tropism assessment HIV tropism assessment - - PowerPoint PPT Presentation

HIV tropism assessment HIV tropism assessment HIV tropism assessment HIV tropism assessment using next generation sequencing using next generation sequencing using next generation sequencing using next generation sequencing Mattia CF Prosperi


slide-1
SLIDE 1

HIV tropism assessment HIV tropism assessment HIV tropism assessment HIV tropism assessment using next generation sequencing using next generation sequencing using next generation sequencing using next generation sequencing

Mattia CF Prosperi

National Institute for Infectious Diseases “Lazzaro Spallanzani” (INMI)

  • Dept. Virology

Via Portuense, 292 – 00149 – Rome, Italy. e‐mail: ahnven@yahoo.it

slide-2
SLIDE 2

Summary

  • Next‐generation (aka

ultra‐deep) sequencing (NGS)

  • Technologies, features
  • Low level tools to analyse NGS data
  • Sequence alignment
  • Error‐correction
  • High level tool for clinical purposes
  • Ultra‐deep prediction of HIV‐1 coreceptor

usage

  • Statistical learning model
  • Web server
slide-3
SLIDE 3

Next generation sequencing

  • Technologies

– 454, Illumina, ABI Solid, Polonator, Helicos

  • Fields of application

– De‐novo sequencing – Re‐sequencing – Metagenomics

slide-4
SLIDE 4

Next‐generation sequencing data

  • 454 GS FLX, Roche

– A sequence read is ~ 400 bases long (with Titanium upgrade) – 400‐600 million bases per 10‐hour run – Higher error rate than Sanger sequencing

  • Approximately 0.1% and 0.05% for homopolymeric and non‐

homopolymeric regions (estimated on a HIV plasmid clone)

– Possible presence of contaminants

  • Other technologies: Illumina, ABI Solid, Helicos…

– shorter reads, higher base throughput

slide-5
SLIDE 5
  • Easy user interface

– Parallelization of read alignment and error correction

  • Computational burden reduced from hours to

minutes – Online tools for ngs‐aided diagnostics:

  • HIV‐1 tropism prediction

– Graph generator for variability analysis

Web‐server

slide-6
SLIDE 6

Caspur associated universities

slide-7
SLIDE 7

Sequence alignment

  • Optimized

local pairwise alignment against a given consensus sequence

  • Smith‐Waterman‐Gotoh in forward

and reverse

– gap open/extension parameter optimisation via grid search in [1, 30] and [0.3, 3] with step size of 5 and 0.5 respectively – Two possible optimisation functions, where m is the number of matches, g is the number of gaps, N is the alignment length: » m/N (similarity maximisation) » m‐m*g/N (gap minimisation and similarity maximisation, accounting for alignment length)

slide-8
SLIDE 8

Contaminant detection

  • A random alignment score distribution

is derived by

– aligning n (at least n=400) random sequences, whose lengths are normally distributed on the actual lane average read length and std – applying the given optimisation procedure to each random sequence

  • A z test with Gumbel’s

extreme value distribution test (like BLAST e‐value) is performed for each real read alignment score, corrected for multiple testing with Benjamini Hochberg

  • Sequences with an adj.p>0.01 are discarded
slide-9
SLIDE 9
  • For each position of the consensus (and relative indels)

we execute a statistical test for over‐representation

  • f

changes within the reads – chi‐square statistic

  • After Bonferroni correction for multiple testing, we

exclude positions with adj.p>0.01

Error detection/correction

slide-10
SLIDE 10

Web Service Interface

slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

Variations plot Variations plot

slide-16
SLIDE 16

Shannon entropy plot Shannon entropy plot

slide-17
SLIDE 17

HIV Diagnostics application HIV Diagnostics application

  • Idea from Martin Daumer’s group (institute of

Immunology, Kaiserslauten) and MPI

  • HIV‐1 coreceptor usage prediction

– Uses statistical learning applied to NGS data

  • Existing methods are: geno2pheno, pssm
  • We developed a new method based on logistic regression

(Prosperi et al. AIDS Research and Human Retroviruses 2009; 25(3).)

– Alternative to TROFILE method

  • Pro: less expensive, quicker results, NGS gives also

description of the quasispecies

  • Contra: results not always concordant with TROFILE
slide-18
SLIDE 18

Statistical Learning Model Statistical Learning Model

  • Logistic Regression

– accuracy 92.76% – AUC (0.93)

slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

CXCR4 usage prediction

slide-22
SLIDE 22

CXCR4 usage prediction

slide-23
SLIDE 23

People at CASPUR and INMI

  • MR Capobianchi, G Ippolito
  • A Desideri, G Chillemi
  • I Abbate, G Rozera
  • A Barbato, A Bruselles