HIV tropism assessment HIV tropism assessment HIV tropism assessment - - PowerPoint PPT Presentation

▶

Mar 19, 2023 543 likes •786 views

HIV tropism assessment HIV tropism assessment HIV tropism assessment HIV tropism assessment using next generation sequencing using next generation sequencing using next generation sequencing using next generation sequencing Mattia CF Prosperi

SLIDE 1

HIV tropism assessment HIV tropism assessment HIV tropism assessment HIV tropism assessment using next generation sequencing using next generation sequencing using next generation sequencing using next generation sequencing

Mattia CF Prosperi

National Institute for Infectious Diseases “Lazzaro Spallanzani” (INMI)

Dept. Virology

Via Portuense, 292 – 00149 – Rome, Italy. e‐mail: ahnven@yahoo.it

SLIDE 2

Summary

Next‐generation (aka

ultra‐deep) sequencing (NGS)

Technologies, features
Low level tools to analyse NGS data
Sequence alignment
Error‐correction
High level tool for clinical purposes
Ultra‐deep prediction of HIV‐1 coreceptor

usage

Statistical learning model
Web server

SLIDE 3

Next generation sequencing

Technologies

– 454, Illumina, ABI Solid, Polonator, Helicos

Fields of application

– De‐novo sequencing – Re‐sequencing – Metagenomics

SLIDE 4

Next‐generation sequencing data

454 GS FLX, Roche

– A sequence read is ~ 400 bases long (with Titanium upgrade) – 400‐600 million bases per 10‐hour run – Higher error rate than Sanger sequencing

Approximately 0.1% and 0.05% for homopolymeric and non‐

homopolymeric regions (estimated on a HIV plasmid clone)

– Possible presence of contaminants

Other technologies: Illumina, ABI Solid, Helicos…

– shorter reads, higher base throughput

SLIDE 5

Easy user interface

– Parallelization of read alignment and error correction

Computational burden reduced from hours to

minutes – Online tools for ngs‐aided diagnostics:

HIV‐1 tropism prediction

– Graph generator for variability analysis

Web‐server

SLIDE 6

Caspur associated universities

SLIDE 7

Sequence alignment

Optimized

local pairwise alignment against a given consensus sequence

Smith‐Waterman‐Gotoh in forward

and reverse

– gap open/extension parameter optimisation via grid search in [1, 30] and [0.3, 3] with step size of 5 and 0.5 respectively – Two possible optimisation functions, where m is the number of matches, g is the number of gaps, N is the alignment length: » m/N (similarity maximisation) » m‐m*g/N (gap minimisation and similarity maximisation, accounting for alignment length)

SLIDE 8

Contaminant detection

A random alignment score distribution

is derived by

– aligning n (at least n=400) random sequences, whose lengths are normally distributed on the actual lane average read length and std – applying the given optimisation procedure to each random sequence

A z test with Gumbel’s

extreme value distribution test (like BLAST e‐value) is performed for each real read alignment score, corrected for multiple testing with Benjamini Hochberg

Sequences with an adj.p>0.01 are discarded

SLIDE 9

For each position of the consensus (and relative indels)

we execute a statistical test for over‐representation

changes within the reads – chi‐square statistic

After Bonferroni correction for multiple testing, we

exclude positions with adj.p>0.01

Error detection/correction

SLIDE 10

Web Service Interface

SLIDE 11

SLIDE 12

SLIDE 13

SLIDE 14

SLIDE 15

Variations plot Variations plot

SLIDE 16

Shannon entropy plot Shannon entropy plot

SLIDE 17

HIV Diagnostics application HIV Diagnostics application

Idea from Martin Daumer’s group (institute of

Immunology, Kaiserslauten) and MPI

HIV‐1 coreceptor usage prediction

– Uses statistical learning applied to NGS data

Existing methods are: geno2pheno, pssm
We developed a new method based on logistic regression

(Prosperi et al. AIDS Research and Human Retroviruses 2009; 25(3).)

– Alternative to TROFILE method

Pro: less expensive, quicker results, NGS gives also

description of the quasispecies

Contra: results not always concordant with TROFILE

SLIDE 18

Statistical Learning Model Statistical Learning Model

Logistic Regression

– accuracy 92.76% – AUC (0.93)

SLIDE 19

SLIDE 20

SLIDE 21

CXCR4 usage prediction

SLIDE 22

CXCR4 usage prediction

SLIDE 23

People at CASPUR and INMI

MR Capobianchi, G Ippolito
A Desideri, G Chillemi
I Abbate, G Rozera
A Barbato, A Bruselles