HIV-1 coreceptor usage prediction without multiple alignments S - PowerPoint PPT Presentation

HIV-1 coreceptor usage prediction without multiple alignments S´ ebastien Boisvert, M.Sc. student, Universit´ e Laval www.graal.ift.ulaval.ca Directors: Jacques Corbeil and Mario Marchand 1

HIV • HIV (human immunodeficiency virus) is the causative agent of the deadly disease known as AIDS (acquired immunodeficiency syndrome) • HIV integrates its genome in the host genome. • genome size: 10 kb • molecule type: RNA • 9 genes • HIV-1 (spread world-wide) and HIV-2 2

HIV infection • HIV uses a CD4 receptor and a chemokine receptor to infect cells • chemokine receptors are CCR5 and CXCR4 • CXCR4-using viruses are associated with faster depletion of T cells CD4+ • HIV usually infects with CCR5 and switches to CXCR4 with disease pro- gression • The V3 loop inside the gp120 protein of the retroviral envelope is a strong determinant of the coreceptor usage 3

Fighting HIV • Many drugs are available, each having a specific molecular target (integrase, envelope, reverse transcriptase, coreceptor, etc.) • Coreceptor inhibitors (CCR5- or CXCR4-specific) • If one knows if a virus uses CCR5 and/or CXCR4, then a coreceptor inhibitor can be selected accordingly 4

Determination of the coreceptor usage • Phenotypic assays and genotypic assays • Phenotypic assays rely on recombinant DNA • Genotypic assays rely on DNA sequencing (only the env gene of HIV is relevant here) and machine learning • We investigated how the machine learning component can be enhanced. 5

A mathematical view of the problem • X : V3 loop protein sequences • Y = {− 1 , +1 } is a binary output space (ex.: CXCR4: yes or no) • training set S = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) } , with ( x i , y i ) ∈ X ×Y ∀ i • Each example ( x i , y i ) is distributed identically and independently with an unknown, but constant distribution P X , Y • Learn from the patterns in the training set 6

Machine learning • An algorithm A learns a classification function h : X → Y • only the observations in the training set S can be utilized • h is a classifier • h must be accurate on examples that are not in the training set 7

A kernel is a measure of similarity • mapping function φ : X → R n • a kernel is a dot product in a feature space: k ( x, x ′ ) = φ ( x ) · φ ( x ′ ) • the kernel measures similarity: k : X × X → R (biologically, we look for common motifs) 8

Linear classifiers • We are interested in classifiers that can be written as w · φ ( x ) because the predicted class is simply the sign of the dot product • The support vector machine is a linear classifier 9

Support vector machines • binary classifier h : X → {− 1 , +1 } • primal representation: ( w, b ) , w is the normal vector and b is the bias • separation surface: { φ ( x ) : w · φ ( x ) + b = 0 } • h ( x ) = sgn( w · φ ( x ) + b ) 10

Duality • dual representation: ( α, b ) , α is the lagragian and b is the bias • the vector w can be computed from α : w = � m i =1 α i y i φ ( x i ) • h ( x ) = sgn ( w · φ ( x ) + b ) = sgn ( � m i =1 α i y i k ( x, x i ) + b ) • φ is not needed at all • only k ( x, x ′ ) appears in the dual representation 11

The charge rule The simpliest method for coreceptor usage prediction. (Fouchier et al. 1992) 1. Build a multiple alignment with all sequences 2. Check the (basic) charge of positions 11 and 25 only Drawbacks • Some sequences need to be discarded to have a good alignment • Using only 2 positions reduces the information the data 12

Other methods • SVM (support vector machines) with linear kernel • Random forests • Neural networks Issues Multiple alignments are needed in all cases because those methods need the same amount of attributes for each example. (many sequences have to be discarded to yield a good multiple alignment and therefore we do not use the maximun amount of information.) 13

Our solution • SVM with string kernels instead of linear kernels • We describe a new string kernel: the distant segments kernel Pros 1. no multiple alignment needed at all. 2. string kernels are natural similarity measures. 3. V3 sequences don’t need to be aligned. 4. can be applied to a great number of biologically similar questions 14

Summary 1. We define a new kernel for HIV-1 coreceptor usage prediction 2. We compare it to existing kernels (data not shown) and we show that multiple alignments are not necessary 15

The distant segments kernel Let the following set be the occurances of subsequences of exactly δ symbols beginning with sequence α and ending with α ′ : def S δ = { ( µ, α, ν, α ′ , µ ′ ) : s = µανα ′ µ ′ α,α ′ ( s ) ∧ 1 ≤| α | ∧ 1 ≤| α ′ | δ = | s |−| µ |−| µ ′ |} ∧ 0 ≤ | ν | ∧ Then, let the mapping function be the size of such sets for many ( δ, α, α ′ ) : �� def � φ δ m ,θ m � S δ ( s ) = α,α ′ ( s ) � � DS � { ( δ,α,α ′ ): 1 ≤| α |≤ θ m ∧ 1 ≤| α ′ |≤ θ m ∧ | α | + | α ′ |≤ δ ≤ δ m } The kernel is the inner product of sequences in feature space. def k δ m ,θ m = � φ δ m ,θ m ( s ) , φ δ m ,θ m ( s, t ) ( t ) � DS DS DS 16

Comparison for CXCR4 • charge rule (Pillai et al. 2003) : 87.45% • SVM with linear kernel (Pillai et al. 2003) : 90.86% • SVM with structural descriptors (Sander et al. 2007): 91.56% • SVM with distant segments kernel: 94.80% • Our method is the only one without multiple alignments! • we used a test set to validate our classifier whereas other methods rely on the cross-validation method (which is biaised) 17

Perspectives • Sequencing technologies are improving (Roche/454, Illumina/Solexa, ABI SOLiD) • Machine learning is an emerging science (multiple kernel learning, theorit- ical risk bounds) • The next generation of bioinformatic programs for the prediction of HIV-1 coreceptor usage promises improvements for treatment selection in clinical settings. • Submitted to the journal Retrovirology 18

Acknownledgements • Mario Marchand, Fran¸ cois Laviolette, Jacques Corbeil • Canadian Institutes of Health Research • Natural Sciences and Engineering Research Council of Canada • Canada Research Chair in Medical Genomics • Los Alamos National Laboratory HIV Databases 19

Links • Web server: genome.ulaval.ca/hiv-dskernel • Our machine learning research group: www.graal.ift.ulaval.ca • Jacques Corbeil’s group: genome.ulaval.ca/corbeillab • Machine learning course: cours.ift.ulaval.ca/65764 • Kernel methods: www.kernel-methods.net • Support vector machines: www.support-vector.net 20

HIV-1 coreceptor usage prediction without multiple alignments S - PowerPoint PPT Presentation

HIV-1 coreceptor usage prediction without multiple alignments S ebastien Boisvert, M.Sc. student, Universit e Laval www.graal.ift.ulaval.ca Directors: Jacques Corbeil and Mario Marchand 1 HIV HIV (human immunodeficiency virus) is the

Outline HIV coreceptors Importance of coreceptor usage Determination and prediction

CRF01_AE: Do we need a customized CCR5 antagonist treatment recommendation? Nico Pfeifer Max

European Clinical Data on HIV-1 Coreceptor Usage and Genotypic Identification of Tropism in HIV-2

Coreceptor Tropism of HIV-1 Development of a virus-free Assay HIV-1 Tropism Study

A genotypic method for the identification of HIV-2 coreceptor usage Matthias Dring Max Planck

Update 2017 HIV-Grade Tools HIV-1 HIV-2 NNRTI -- NRTI NRTI PI PI

Interpretation tools for coreceptor usage Rolf Kaiser Institute of Virology University of

EucoHIV - The European coreceptor HIV-1 cohort study: outcomes of maraviroc use across Europe

Recommendations for determining HIV-1 coreceptor usage CCR5 antagonists proved their good efficacy

The impact of APOBEC on the tropism of HIV Eva Heger AREVIR 09.05.2015 Eva Heger - The

Genotypic analysis of coreceptor usage New developments and applications for geno2pheno [

Geno2pheno[coreceptor] 3 Geno2pheno[454] Geno2pheno[454] fasta-format sff-, or fasta-format

HIV tropism assessment HIV tropism assessment HIV tropism assessment HIV tropism assessment

HIV- -1 Integrase: 1 Integrase: HIV not just an not just an other HIV enzyme other HIV

HIV- -1 tropism prediction 1 tropism prediction HIV Mattia CF Prosperi ahnven@yahoo.it

HIV mother-to-child HIV identified in 1983 transmission of HIV AIDS syndrome described

Lesson Plan: Circulatory and Lymphatic System Pathology 5 minutes: Breath of Arrival and

Public Health Crisis Alice Bell, L.C.S.W. Overdose Prevention Project Prevention Point

Fever of Unknown Origin (FUO) Clinical Presentation Updated: Mar 20, 2017 Author: Sandra G Gompf,

PESPECTIVES of the HEALTHCARE LANDSCAPE for People Experiencing HOMELESSNESS in Denver

Biology 105 Human Biology Session 2016: Spring Spring Sections: 66263 4 Units 66264 4

Collecting Cancer Data: Hematopoietic Disease November 4, 2011 NAACCR Cancer Registry &

ACQ ACQUI UISI SITI TION OF ON OF VERSUM VERSUM MAT ATER ERIAL ALS Merck KGaA,

+ CREATING A NEXT GENERATION CONSUMER PRODUCTS PLATFORM 1 This presentation and some of our

HIV-1 coreceptor usage prediction without multiple alignments S - PowerPoint PPT Presentation

HIV-1 coreceptor usage prediction without multiple alignments S ebastien Boisvert, M.Sc. student, Universit e Laval www.graal.ift.ulaval.ca Directors: Jacques Corbeil and Mario Marchand 1 HIV HIV (human immunodeficiency virus) is the

Outline HIV coreceptors Importance of coreceptor usage Determination and prediction

CRF01_AE: Do we need a customized CCR5 antagonist treatment recommendation? Nico Pfeifer Max

European Clinical Data on HIV-1 Coreceptor Usage and Genotypic Identification of Tropism in HIV-2

Coreceptor Tropism of HIV-1 Development of a virus-free Assay HIV-1 Tropism Study

A genotypic method for the identification of HIV-2 coreceptor usage Matthias Dring Max Planck

Update 2017 HIV-Grade Tools HIV-1 HIV-2 NNRTI -- NRTI NRTI PI PI

Interpretation tools for coreceptor usage Rolf Kaiser Institute of Virology University of

EucoHIV - The European coreceptor HIV-1 cohort study: outcomes of maraviroc use across Europe

Recommendations for determining HIV-1 coreceptor usage CCR5 antagonists proved their good efficacy

The impact of APOBEC on the tropism of HIV Eva Heger AREVIR 09.05.2015 Eva Heger - The

Genotypic analysis of coreceptor usage New developments and applications for geno2pheno [

Geno2pheno[coreceptor] 3 Geno2pheno[454] Geno2pheno[454] fasta-format sff-, or fasta-format

HIV tropism assessment HIV tropism assessment HIV tropism assessment HIV tropism assessment

HIV- -1 Integrase: 1 Integrase: HIV not just an not just an other HIV enzyme other HIV

HIV- -1 tropism prediction 1 tropism prediction HIV Mattia CF Prosperi ahnven@yahoo.it

HIV mother-to-child HIV identified in 1983 transmission of HIV AIDS syndrome described

Lesson Plan: Circulatory and Lymphatic System Pathology 5 minutes: Breath of Arrival and

Public Health Crisis Alice Bell, L.C.S.W. Overdose Prevention Project Prevention Point

Fever of Unknown Origin (FUO) Clinical Presentation Updated: Mar 20, 2017 Author: Sandra G Gompf,

PESPECTIVES of the HEALTHCARE LANDSCAPE for People Experiencing HOMELESSNESS in Denver

Biology 105 Human Biology Session 2016: Spring Spring Sections: 66263 4 Units 66264 4

Collecting Cancer Data: Hematopoietic Disease November 4, 2011 NAACCR Cancer Registry &amp;

ACQ ACQUI UISI SITI TION OF ON OF VERSUM VERSUM MAT ATER ERIAL ALS Merck KGaA,

+ CREATING A NEXT GENERATION CONSUMER PRODUCTS PLATFORM 1 This presentation and some of our

Collecting Cancer Data: Hematopoietic Disease November 4, 2011 NAACCR Cancer Registry &