hiv 1 coreceptor usage prediction without multiple
play

HIV-1 coreceptor usage prediction without multiple alignments S ebastien Boisvert, M.Sc. student, Universit e Laval www.graal.ift.ulaval.ca Directors: Jacques Corbeil and Mario Marchand 1 HIV HIV (human immunodeficiency virus) is the

0 downloads 8 Views 102 KB Size Report
  1. HIV-1 coreceptor usage prediction without multiple alignments S´ ebastien Boisvert, M.Sc. student, Universit´ e Laval www.graal.ift.ulaval.ca Directors: Jacques Corbeil and Mario Marchand 1

  2. HIV • HIV (human immunodeficiency virus) is the causative agent of the deadly disease known as AIDS (acquired immunodeficiency syndrome) • HIV integrates its genome in the host genome. • genome size: 10 kb • molecule type: RNA • 9 genes • HIV-1 (spread world-wide) and HIV-2 2

  3. HIV infection • HIV uses a CD4 receptor and a chemokine receptor to infect cells • chemokine receptors are CCR5 and CXCR4 • CXCR4-using viruses are associated with faster depletion of T cells CD4+ • HIV usually infects with CCR5 and switches to CXCR4 with disease pro- gression • The V3 loop inside the gp120 protein of the retroviral envelope is a strong determinant of the coreceptor usage 3

  4. Fighting HIV • Many drugs are available, each having a specific molecular target (inte- grase, envelope, reverse transcriptase, coreceptor, etc.) • Coreceptor inhibitors (CCR5- or CXCR4-specific) • If one knows if a virus uses CCR5 and/or CXCR4, then a coreceptor inhibitor can be selected accordingly 4

  5. Determination of the coreceptor usage • Phenotypic assays and genotypic assays • Phenotypic assays rely on recombinant DNA • Genotypic assays rely on DNA sequencing (only the env gene of HIV is relevant here) and machine learning • We investigated how the machine learning component can be enhanced. 5

  6. A mathematical view of the problem • X : V3 loop protein sequences • Y = {− 1 , +1 } is a binary output space (ex.: CXCR4: yes or no) • training set S = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) } , with ( x i , y i ) ∈ X ×Y ∀ i • Each example ( x i , y i ) is distributed identically and independently with an unknown, but constant distribution P X , Y • Learn from the patterns in the training set 6

  7. Machine learning • An algorithm A learns a classification function h : X → Y • only the observations in the training set S can be utilized • h is a classifier • h must be accurate on examples that are not in the training set 7

  8. A kernel is a measure of similarity • mapping function φ : X → R n • a kernel is a dot product in a feature space: k ( x, x ′ ) = φ ( x ) · φ ( x ′ ) • the kernel measures similarity: k : X × X → R (biologically, we look for common motifs) 8

  9. Linear classifiers • We are interested in classifiers that can be written as w · φ ( x ) because the predicted class is simply the sign of the dot product • The support vector machine is a linear classifier 9

  10. Support vector machines • binary classifier h : X → {− 1 , +1 } • primal representation: ( w, b ) , w is the normal vector and b is the bias • separation surface: { φ ( x ) : w · φ ( x ) + b = 0 } • h ( x ) = sgn( w · φ ( x ) + b ) 10

  11. Duality • dual representation: ( α, b ) , α is the lagragian and b is the bias • the vector w can be computed from α : w = � m i =1 α i y i φ ( x i ) • h ( x ) = sgn ( w · φ ( x ) + b ) = sgn ( � m i =1 α i y i k ( x, x i ) + b ) • φ is not needed at all • only k ( x, x ′ ) appears in the dual representation 11

  12. The charge rule The simpliest method for coreceptor usage prediction. (Fouchier et al. 1992) 1. Build a multiple alignment with all sequences 2. Check the (basic) charge of positions 11 and 25 only Drawbacks • Some sequences need to be discarded to have a good alignment • Using only 2 positions reduces the information the data 12

  13. Other methods • SVM (support vector machines) with linear kernel • Random forests • Neural networks Issues Multiple alignments are needed in all cases because those methods need the same amount of attributes for each example. (many sequences have to be discarded to yield a good multiple alignment and therefore we do not use the maximun amount of information.) 13

  14. Our solution • SVM with string kernels instead of linear kernels • We describe a new string kernel: the distant segments kernel Pros 1. no multiple alignment needed at all. 2. string kernels are natural similarity measures. 3. V3 sequences don’t need to be aligned. 4. can be applied to a great number of biologically similar questions 14

  15. Summary 1. We define a new kernel for HIV-1 coreceptor usage prediction 2. We compare it to existing kernels (data not shown) and we show that multiple alignments are not necessary 15

  16. The distant segments kernel Let the following set be the occurances of subsequences of exactly δ symbols beginning with sequence α and ending with α ′ : def S δ = { ( µ, α, ν, α ′ , µ ′ ) : s = µανα ′ µ ′ α,α ′ ( s ) ∧ 1 ≤| α | ∧ 1 ≤| α ′ | δ = | s |−| µ |−| µ ′ |} ∧ 0 ≤ | ν | ∧ Then, let the mapping function be the size of such sets for many ( δ, α, α ′ ) : �� � def � φ δ m ,θ m � S δ ( s ) = α,α ′ ( s ) � � DS � { ( δ,α,α ′ ): 1 ≤| α |≤ θ m ∧ 1 ≤| α ′ |≤ θ m ∧ | α | + | α ′ |≤ δ ≤ δ m } The kernel is the inner product of sequences in feature space. def k δ m ,θ m = � φ δ m ,θ m ( s ) , φ δ m ,θ m ( s, t ) ( t ) � DS DS DS 16

  17. Comparison for CXCR4 • charge rule (Pillai et al. 2003) : 87.45% • SVM with linear kernel (Pillai et al. 2003) : 90.86% • SVM with structural descriptors (Sander et al. 2007): 91.56% • SVM with distant segments kernel: 94.80% • Our method is the only one without multiple alignments! • we used a test set to validate our classifier whereas other methods rely on the cross-validation method (which is biaised) 17

  18. Perspectives • Sequencing technologies are improving (Roche/454, Illumina/Solexa, ABI SOLiD) • Machine learning is an emerging science (multiple kernel learning, theorit- ical risk bounds) • The next generation of bioinformatic programs for the prediction of HIV-1 coreceptor usage promises improvements for treatment selection in clinical settings. • Submitted to the journal Retrovirology 18

  19. Acknownledgements • Mario Marchand, Fran¸ cois Laviolette, Jacques Corbeil • Canadian Institutes of Health Research • Natural Sciences and Engineering Research Council of Canada • Canada Research Chair in Medical Genomics • Los Alamos National Laboratory HIV Databases 19

  20. Links • Web server: genome.ulaval.ca/hiv-dskernel • Our machine learning research group: www.graal.ift.ulaval.ca • Jacques Corbeil’s group: genome.ulaval.ca/corbeillab • Machine learning course: cours.ift.ulaval.ca/65764 • Kernel methods: www.kernel-methods.net • Support vector machines: www.support-vector.net 20

Recommend Documents


outline hiv coreceptors importance of coreceptor usage
Outline HIV coreceptors Importance of

Alexander Thielen geno2pheno [ CORECEPTOR ] Outline HIV coreceptors

crf01 ae do we need a customized ccr5 antagonist
CRF01_AE: Do we need a customized

CRF01_AE: Do we need a customized CCR5 antagonist treatment recommendation?

genotypic analysis of coreceptor usage
Genotypic analysis of coreceptor usage

Genotypic analysis of coreceptor usage New developments and applications for

geno2pheno coreceptor
Geno2pheno[coreceptor] 3

geno2pheno [454] Coreceptor usage prediction with massively parallel

interpretation tools for coreceptor usage
Interpretation tools for coreceptor

Interpretation tools for coreceptor usage Rolf Kaiser Institute of Virology

a genotypic method for the identification of hiv 2
A genotypic method for the

A genotypic method for the identification of HIV-2 coreceptor usage Matthias

european clinical data on hiv 1 coreceptor usage and
European Clinical Data on HIV-1

European Clinical Data on HIV-1 Coreceptor Usage and Genotypic Identification

eucohiv the european coreceptor hiv 1 cohort study
EucoHIV - The European coreceptor HIV-1

EucoHIV - The European coreceptor HIV-1 cohort study: outcomes of maraviroc

recommendations for determining hiv 1 coreceptor usage
Recommendations for determining HIV-1

Recommendations for determining HIV-1 coreceptor usage CCR5 antagonists proved

v3 loop sequence space analysis
V3 Loop Sequence Space Analysis Kasia

V3 Loop Sequence Space Analysis Kasia Bo ek bozek@mpi-inf.mpg.de MPI

lesson plan circulatory and lymphatic system pathology
Lesson Plan: Circulatory and Lymphatic

Lesson Plan: Circulatory and Lymphatic System Pathology 5 minutes: Breath of

public health crisis
Public Health Crisis Alice Bell,

Harm Reduction Strategies to Public Health Crisis Alice Bell, L.C.S.W.

fever of unknown origin fuo clinical presentation updated
Fever of Unknown Origin (FUO) Clinical

Fever of Unknown Origin (FUO) Clinical Presentation Updated: Mar 20, 2017

pespectives
PESPECTIVES of the HEALTHCARE

PESPECTIVES of the HEALTHCARE LANDSCAPE for People Experiencing

healthcare reform and the new opportunities for the
Healthcare Reform and the New

Healthcare Reform and the New Opportunities for the Ethically Challenged May

hiv
HIV mother-to-child HIV identified

Prevention of HIV mother-to-child HIV identified in 1983 transmission of

biology 105 human biology
Biology 105 Human Biology Session

Biology 105 Human Biology Session 2016: Spring Spring Sections: 66263 4

collecting cancer data hematopoietic disease
Collecting Cancer Data: Hematopoietic

Collecting Cancer Data: Hematopoietic 11/4/2010 disease Collecting Cancer

acq acqui uisi siti tion of on of
ACQ ACQUI UISI SITI TION OF ON OF

ACQ ACQUI UISI SITI TION OF ON OF VERSUM VERSUM MAT ATER ERIAL ALS

creating a next generation consumer products platform 1
+ CREATING A NEXT GENERATION CONSUMER

+ CREATING A NEXT GENERATION CONSUMER PRODUCTS PLATFORM 1 This presentation

lvmh reaches an agreement to acquire tiffany amp co
LVMH reaches an agreement to acquire

LVMH reaches an agreement to acquire Tiffany & Co. November 25, 2019

melrose plc
Melrose PLC Acquisition of Elster

Buy Improve Sell Strictly private and confidential Melrose PLC Acquisition