Protein Fold Recognition with Recurrent Kernel Networks Dexiong Chen - PowerPoint PPT Presentation

Protein Fold Recognition with Recurrent Kernel Networks Dexiong Chen 1 Laurent Jacob 2 Julien Mairal 1 1 Inria Grenoble 2 CNRS/LBBE Lyon MLCB 2019, Vancouver Dexiong Chen Recurrent Kernel Networks 1 / 11

Sequence modeling as a supervised learning problem Dexiong Chen Recurrent Kernel Networks 2 / 11

Sequence modeling as a supervised learning problem Biological sequences x 1 , . . . x n ∈ X and their associated labels y 1 , . . . , y n . Goal: learning a predictive and interpretable function f : X → R n 1 � min L ( y i , f ( x i )) + µ Ω( f ) n f ∈F � �� i = 1 regularization � �� empirical risk, data fit How do we define the functional space F ? Dexiong Chen Recurrent Kernel Networks 2 / 11

Convolutional kernel networks Using a string kernel to define F [Chen et al., 2019] | x | | x ′ | � � K CKN ( x , x ′ ) = , x ′ [ j : j + k ]) K 0 ( x [ i : i + k ] � �� i = 1 j = 1 one k-mer Kernel methods map data to a high- or infinite-dimensional Hilbert space F (RKHS). Predictive models f in F are linear forms: f ( x ) = � f , ϕ ( x ) � F . Example:   A 0 0 0 1 0 1 1 0 0 0 T   x [ i : i + 5 ] := TTGAG �→   C 0 0 0 0 0   G 0 0 1 0 1 [Leslie et al., 2002, 2004] Dexiong Chen Recurrent Kernel Networks 3 / 11

Convolutional kernel networks Using a string kernel to define F [Chen et al., 2019] | x | | x ′ | � � K CKN ( x , x ′ ) = , x ′ [ j : j + k ]) K 0 ( x [ i : i + k ] � �� i = 1 j = 1 one k-mer Kernel methods map data to a high- or infinite-dimensional Hilbert space F (RKHS). Predictive models f in F are linear forms: f ( x ) = � f , ϕ ( x ) � F . K 0 is a Gaussian kernel over one-hot representations of k-mers (in R k × d ). A continuous relaxation of the mismatch kernel. i = 1 ϕ 0 ( x [ i : i + k ]) with ϕ 0 : z �→ e − α/ 2 � z − · � 2 the kernel mapping ϕ ( x ) := � | x | associated with K 0 . [Leslie et al., 2002, 2004] Dexiong Chen Recurrent Kernel Networks 3 / 11

Mixing kernel methods with CNNs Kernel method Rich infinite-dimensional models may be learned. Regularization is natural | f ( x ) − f ( x ′ ) | ≤ � f � F � ϕ ( x ) − ϕ ( x ′ ) � F Representation and classifier learning are decoupled . Scalability limitation. Mixing kernels with CNNs using approximation Scalable , task-adaptive representations and data-efficient . No tricks (DropOut, batch normalization), parameter-free initialization . Two ways of learning: Nyström and end-to-end learning with back-propagation. Dexiong Chen Recurrent Kernel Networks 4 / 11

Convolutional kernel networks (Nyström approximation) Nyström approximation Finite-dimensional projection of the kernel map: given a set of anchor points Z := ( z 1 , . . . , z q ) , we Hilbert space F project ϕ 0 ( x ) for any k-mer x orthogonally onto E 0 such that K 0 ( x , x ′ ) ≈ � ψ 0 ( x ) , ψ 0 ( x ′ ) � R q . φ 0 ( x ) An approximate feature map of a sequence x is ψ 0 ( x ) φ 0 ( x ′ ) φ ( z 1 ) | x | φ ( z 2 ) � E 0 ψ 0 ( x [ i : i + k ]) ∈ R q ψ ( x ) = i = 1 Then solve the linear classification problem ψ 0 ( x ′ ) E 0 = span ( ϕ 0 ( z 1 ) , . . . , ϕ 0 ( z q )) n � L ( w ⊤ ψ ( x i ) , y i ) + µ � w � 2 . min w ∈ R p i = 1 [Williams and Seeger, 2001, Zhang et al., 2008] Dexiong Chen Recurrent Kernel Networks 5 / 11

Convolutional kernel networks (end-to-end kernel learning) Nyström approximation and end-to-end training Finite-dimensional projection of the kernel map: given a set of anchor points Z := ( z 1 , . . . , z q ) , we Hilbert space F project ϕ 0 ( x ) for any k-mer x orthogonally onto E 0 such that K 0 ( x , x ′ ) ≈ � ψ 0 ( x ) , ψ 0 ( x ′ ) � R q . φ 0 ( x ) An approximate feature map of a sequence x is ψ 0 ( x ) φ 0 ( x ′ ) φ ( z 1 ) | x | φ ( z 2 ) E 0 � ψ 0 ( x [ i : i + k ]) ∈ R q ψ ( x ) = i = 1 Then solve ψ 0 ( x ′ ) E 0 = span ( ϕ 0 ( z 1 ) , . . . , ϕ 0 ( z q )) n � L ( w ⊤ ψ ( x i ) , y i ) + µ � w � 2 . min w ∈ R p , Z i = 1 Dexiong Chen Recurrent Kernel Networks 6 / 11

Convolutional kernel networks (end-to-end kernel learning) Nyström approximation and end-to-end training Hilbert space F Then solve φ 0 ( x ) n � L ( w ⊤ ψ ( x i ) , y i ) + µ � w � 2 . min ψ 0 ( x ) φ 0 ( x ′ ) w ∈ R p , Z i = 1 φ ( z 1 ) φ ( z 2 ) E 0 CKN kernels only take contiguous k-mers into account. Limitation: unable to capture gapped motifs (e.g. ψ 0 ( x ′ ) useful to model genetic insertions). E 0 = span ( ϕ 0 ( z 1 ) , . . . , ϕ 0 ( z q )) Dexiong Chen Recurrent Kernel Networks 6 / 11

From k-mers to gapped k-mers Gap-allowed k-mers For a sequence x = x 1 . . . x n ∈ X of length n , for a sequence of ordered indices i ∈ I ( k , n ) , we define a k-substring as: x [ i ] = x i 1 x i 2 . . . x i k . The length of the gaps in the substring is gaps ( i ) = number of gaps in the indices . Example: x = BAARACADACRB i = ( 4 , 5 , 8 , 9 , 11 ) x [ i ] = RADAR gaps ( i ) = 3 Dexiong Chen Recurrent Kernel Networks 7 / 11

Recurrent kernel networks Comparing all the k-mers between a pair of sequences | x | | x ′ | � � � � K CKN ( x , x ′ ) = x [ i : i + k ] , x ′ [ j : j + k ] K 0 i = 1 j = 1 [Lodhi et al., 2002, Lei et al., 2017] Dexiong Chen Recurrent Kernel Networks 8 / 11

Recurrent kernel networks Comparing all the gapped k-mers between a pair of sequences � � � � K RKN ( x , x ′ ) = λ gaps ( i ) λ gaps ( j ) K 0 x [ i ] , x ′ [ j ] i ∈I ( k , | x | ) j ∈I ( k , | x ′ | ) Larger set of partial patterns (i.e. gapped k-mers) is taken into account. λ gaps ( i ) penalizes the gaps. ϕ ( x ) = � i ∈I ( k , | x | ) λ gaps ( i ) ϕ 0 ( x [ i ]) . A continuous relaxation of substring kernel. [Lodhi et al., 2002, Lei et al., 2017] Dexiong Chen Recurrent Kernel Networks 8 / 11

Approximation and recursive computation of RKN Approximate feature map of RKN kernel The approximate feature map of K RKN via Nyström approximation is � λ gaps ( i ) ψ 0 ( x [ i ]) , ψ ( x ) = i ∈I ( k , t ) Exhaustive enumeration of all substrings can be exponentially costly. But the sum can be computed fast using dynamic programming [Lodhi et al., 2002, Lei et al., 2017]. Leads to a particular recurrent neural network with a kernel interpretation. Dexiong Chen Recurrent Kernel Networks 9 / 11

Results Protein fold classification on SCOP 2.06 [Hou et al., 2017] (multi-class classification, using more informative sequence features including PSSM, secondary structure and solvent accessibility) Method ♯ Params Accuracy Level-stratified accuracy (top1/top5) top 1 top 5 family superfamily fold PSI-BLAST - 84.53 86.48 82.20/84.50 86.90/88.40 18.90/35.100 DeepSF 920k 73.00 90.25 75.87/91.77 72.23/90.08 51.35/67.57 CKN (128 filters) 211k 76.30 92.17 83.30/94.22 74.03/91.83 43.78/67.03 CKN (512 filters) 843k 84.11 94.29 90.24 / 95.77 82.33/94.20 45.41/69.19 RKN (128 filters) 211k 77.82 92.89 76.91/93.13 78.56/92.98 60.54/83.78 RKN (512 filters) 843k 85.29 94.95 84.31/94.80 85.99 / 95.22 71.35 / 84.86 Note: More experiments with statistical tests have been conducted in our paper. [Hou et al., 2017, Chen et al., 2019] Dexiong Chen Recurrent Kernel Networks 10 / 11

Availability Our code in Pytorch is freely available at https://gitlab.inria.fr/dchen/CKN-seq https://github.com/claying/RKN Dexiong Chen Recurrent Kernel Networks 11 / 11

References I D. Chen, L. Jacob, and J. Mairal. Biological sequence modeling with convolutional kernel networks. Bioinformatics , 35(18):3294–3302, 02 2019. S. Hochreiter, M. Heusel, and K. Obermayer. Fast model-based protein homology detection without alignment. Bioinformatics , 23(14):1728–1736, 2007. J. Hou, B. Adhikari, and J. Cheng. DeepSF: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics , 34(8):1295–1303, 12 2017. ISSN 1367-4803. doi: 10.1093/bioinformatics/btx780 . URL https://doi.org/10.1093/bioinformatics/btx780 . T. Lei, W. Jin, R. Barzilay, and T. Jaakkola. Deriving neural architectures from sequence and graph kernels. In International Conference on Machine Learning (ICML) , 2017. C. Leslie, E. Eskin, J. Weston, and W. Noble. Mismatch String Kernels for SVM Protein Classification. In Advances in Neural Information Processing Systems 15 . MIT Press, 2003. URL http://www.cs.columbia.edu/~cleslie/papers/mismatch-short.pdf . C. S. Leslie, E. Eskin, and W. S. Noble. The spectrum kernel: A string kernel for svm protein classification. In Pacific Symposium on Biocomputing , volume 7, pages 566–575. Hawaii, USA, 2002. C. S. Leslie, E. Eskin, A. Cohen, J. Weston, and W. S. Noble. Mismatch string kernels for discriminative protein classification. Bioinformatics , 20(4):467–476, 2004. Dexiong Chen Recurrent Kernel Networks 12 / 11

Protein Fold Recognition with Recurrent Kernel Networks Dexiong Chen - PowerPoint PPT Presentation

Protein Fold Recognition with Recurrent Kernel Networks Dexiong Chen 1 Laurent Jacob 2 Julien Mairal 1 1 Inria Grenoble 2 CNRS/LBBE Lyon MLCB 2019, Vancouver Dexiong Chen Recurrent Kernel Networks 1 / 11 Sequence modeling as a supervised

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Encoding natural numbers datatype nat = Z | S of nat val zero = Z val one = S Z val two = S

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

Protein Sequence Analysis Protein Sequence Analysis Domain review Domain review What is a

Animal protein production in a Animal protein production in a Animal protein production in a

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

Dynamics of Protein-Protein Interactions: A Probabilistic Model Toward Protein Function Amir

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

MOtif aNAlysis with Lisa European Bioconductor Meeting 2019 Dania Machlab Lukas Burger Michael

Orientation Workshop Sem 2, 2017 Helping you transition successfully to University 1. Orientation

Co lle g e o f Die titia ns o f Alb e rta Co ntinuing Co mpe te nc e Pro g ra m Upda te Sha

Opportunity in the cities of the developing world Motivation 1 There is a rich literature on

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

Algorithms for analysing and predicting RNA 3D structures Alain Denise LRI and I2BC Universit

Spin glasses and proteins: an old tool for an old problem ? Olivier Rivoire L aboratoire I

Reaction networks, stability of steady states, motifs for oscillatory dynamics, and parameter