Machine Learning for Sequence Learning Learning in an - PowerPoint PPT Presentation

Machine Learning for Sequence Learning Learning in an All-Subsequence Space Severin Gsponer, Georgiana Ifrim, Barry Smyth January 20, 2016

Outline • Background • Linear Classifiers for Sequences • SEQL Approach • Contribution • Future Work Insight Centre for Data Analytics January 20, 2016 Slide 2

Background for Sequence Learning Definition of a sequence A sequence consists of symbols of a given finite alphabet Σ in a given order: s 0 , s 1 , . . . , s n Examples • Genetic sequence: AGCTGTTCGT , | Σ | = 4 , Σ = { A , C , G , T } • Protein sequence: KVKTGCKATLR , | Σ | = 20 • Text: The house is blue , | Σ | = 4 , (# distinct words in corpus) Insight Centre for Data Analytics January 20, 2016 Slide 3

Sequence Classification Class Data points +1 C70124045 F0*EE*AD C00E9D64A000C 6689 CCF1C70 +1 7413BAEF01000 6689 51488B7000 F0*EE*AD 00081CA -1 08F9C81A80 C18B484 000895110B8040000C20C00CCC -1 CCCFF8CC84C8B5C8B C18B484 C8B505C8340240481 Find subsequences that can be used to identify the class. ?? CC8CC84C8BC8B458B4CC0F82B505FB4C83B4B0481 Insight Centre for Data Analytics January 20, 2016 Slide 4

Related Work Bag of Words • Loss of structural order ( e.g., Mary is faster than John) • Often not accurate enough Kernel SVM • Lift into implicit high-dimensional feature space through kernel trick • Restrict features for scale (e.g., max 5-gram) • Not easily interpretable (Blackbox) SEQL (Our Approach) • Works in explicit high-dimensional feature space • Unrestricted features (i.e. all-length subsequences) • Interpretable classifier (Whitebox) Insight Centre for Data Analytics January 20, 2016 Slide 5

All-Subsequence Feature Space Sample sequence: . . . F09EE1AD . . . Uni-gram (all): 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F (16 possible) (16 2 = 256 possible) Bi-gram: F0, 09, 9E, EE, 1A,. . . (16 3 = 4096 possible) Tri-gram: F09, 09E, EE1, E1A, 1AD,. . . . . . . . . (16 8 = 4294967296 possible) 8-gram: F09EE1AD,. . . Representation of sequence in explicit vectorspace of all subsequences: 0 , 1 , 2 , 3 , 4 , . . . , F , 00 , 01 , 02 , 03 , . . . , FF , 000 , 0001 , . . . x i = ( 1 , 1 , 0 , 0 , 0 , . . . , 1 , 1 , 0 , 0 , 1 , . . . , 1 , 0 , 0 , . . . ) Insight Centre for Data Analytics January 20, 2016 Slide 6

Linear Sequence Classifier Given: Training set of labeled examples: { x i , y i } for i = 1 , . . . , N where y i ∈ {− 1 , 1 } x i ∈ R d with d = number of features Goal: Find β = ( β 1 , β 2 , . . . , β d ) , β i ∈ R by optimizing: N β ∗ = arg min � L ( β ) = arg min ξ ( y i , x i , β ) + CR ( β ) β ∈ R d β ∈ R d i = 1 Classical gradient descent is computationally infeasible for a large feature space β ( t ) = β ( t − 1 ) − η t ∇ L ( β ( t − 1 ) ) Insight Centre for Data Analytics January 20, 2016 Slide 7

SEQL Algorithm 1 SEQL worflow Set β ( 0 ) = 0 while !termination condition do Calculate objective function L ( β ( t ) ) Find feature with maximum gradient value Find step length η t by line search Update β ( t ) = β ( t − 1 ) − η t ∂ L ∂ β jt ( β ( t − 1 ) ) Add corresponding feature to feature set end while Insight Centre for Data Analytics January 20, 2016 Slide 8

Contribution 1. Study influence of problem characteristics on classification performance (simulation) 2. Extend SEQL approach to regression (gradient bound for squared error loss) 3. Real-World Applications Insight Centre for Data Analytics January 20, 2016 Slide 9

Contribution 1: Simulation Dimensions • Alphabet size | Σ | • Sequence length L • Data set size N • Motif length m • Sparsity of the feature space • Noise in the motifs Insight Centre for Data Analytics January 20, 2016 Slide 10

Contribution 1: Analysis Accuracy • Classification performance (ACC, AUC, F1, ...) Speed • Number of iterations • Quality of gradient bound (pruning ration) • Run time Interpretability • Number of produced features Insight Centre for Data Analytics January 20, 2016 Slide 11

Contribution 1: Simulation Framework Systematic experiments on generated sequences: Generation of N sequences of length L l 1 , l 2 , . . . , l L where l i ∼ U ( Alphabet ) Insert motifs of length m in positive sequences. Ratio of positive to negative sequences is 1:10 Insight Centre for Data Analytics January 20, 2016 Slide 12

Contribution 1: Data Generation 1. Random generation of a motif 2. Determine motif insertion position randomly for each sequence 3. Random generation of sequence and insertion of motif at position Insight Centre for Data Analytics January 20, 2016 Slide 13

Contribution 1: Data Generation Algorithm 2 Positive sequences generation Generate motif by drawing m symbols from ∼ U ( Alphabet ) for i < N · 0 . 1 do pos ∼ U ( L − m ) for l < ( L − m ) do if l = pos then add motif to sequence else add symbol l ∼ U ( Alphabet ) to sequence end if end for add sequence to data set end for Insight Centre for Data Analytics January 20, 2016 Slide 14

Contribution 2: Extension to Regression Value Data points +0.2 C70124045C00E9D64A000CCCF1C70 +1.4 7413BAEF0100051488B700000081CA -3.2 08F9C81A80000895110B8040000C20 -0.1 CCF8CC84C8B5C8BC8B505C834024 Implementation of squared error loss and new gradient bound N � ( y i − β t x i ) 2 ξ ( y i , x i , β ) = i = 1 With L1 regularization known as LASSO. Questions Influence of loss function and quality of the bound Insight Centre for Data Analytics January 20, 2016 Slide 15

Contribution 3: Real World Application Classification Task Microsoft Malware Challenge (BIG 2015) Kaggle Competition in early 2015 Goal Classification of Malware into 9 families Data ∼ 500GB of hexadecimal sequences Regression Task We are still looking for problem domains for sequence regression? Insight Centre for Data Analytics January 20, 2016 Slide 16

Future Work Regression applications Test on real world application. Rescaling of features TF-IDF style rescaling of feature instead of binary indicator [1] and analysis of influence for the gradient bound quality. Insight Centre for Data Analytics January 20, 2016 Slide 17

References Bibliography L. Miratrix and R. Ackerman. Conducting sparse feature selection on arbitrarily long phrases in text corpora with a focus on interpretability. pages 1--41, 2015. Insight Centre for Data Analytics January 20, 2016 Slide 18

Machine Learning for Sequence Learning Learning in an - PowerPoint PPT Presentation

Machine Learning for Sequence Learning Learning in an All-Subsequence Space Severin Gsponer, Georgiana Ifrim, Barry Smyth January 20, 2016 Outline Background Linear Classifiers for Sequences SEQL Approach Contribution Future

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Machine Translation and Sequence-to-sequence Models http://phontron.com/class/mtandseq2seq2018/

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

Asynchronous sequence circuits An asynchronous sequence machine is a sequence circuit without

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

Sequence-to-sequence models used for machine translation and Murat Apishev Katya Artemova

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

16 Applications 1: Monolingual Sequence-to-sequence Prob- lems Up until now, we have largely

Sequence to Sequence Models for Machine Translation (2) CMSC 723 / LING 723 / INST 725 Marine

ELENA, CRYRING, or FLAIR ELENA, CRYRING, or FLAIR D. Fischer, R. Moshammer (Heidelberg), Ulrik

Primary 2 to 3 Parent Engagement 11 January 2020 School Mission, Values, Motto & Vision

Tropospheric column ozone variability from space: results from the first multi-instrument

Direct production of states with positive charge conjugation in e + e annihilation J.H. K

61A Lecture 29 Friday, November 15 2 Processing Sequential Data Many data sets can be processed

On the Existence of Semi-Regular Sequences Sergio Molina 1 joint work with T. J. Hodges 1 J.

Anisotropy of Cosmic Ray Fluxes measured with AMS-02 on the ISS M. A. Velasco, CIEMAT, Madrid

Spin-dependent muon to electron conversion and muon to positron conversion Yoshitaka Kuno

Machine Learning for Sequence Learning Learning in an - PowerPoint PPT Presentation

Machine Learning for Sequence Learning Learning in an All-Subsequence Space Severin Gsponer, Georgiana Ifrim, Barry Smyth January 20, 2016 Outline Background Linear Classifiers for Sequences SEQL Approach Contribution Future

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Machine Translation and Sequence-to-sequence Models http://phontron.com/class/mtandseq2seq2018/

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

Asynchronous sequence circuits An asynchronous sequence machine is a sequence circuit without

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

Sequence-to-sequence models used for machine translation and Murat Apishev Katya Artemova

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

16 Applications 1: Monolingual Sequence-to-sequence Prob- lems Up until now, we have largely

Sequence to Sequence Models for Machine Translation (2) CMSC 723 / LING 723 / INST 725 Marine

ELENA, CRYRING, or FLAIR ELENA, CRYRING, or FLAIR D. Fischer, R. Moshammer (Heidelberg), Ulrik

Primary 2 to 3 Parent Engagement 11 January 2020 School Mission, Values, Motto &amp; Vision

Tropospheric column ozone variability from space: results from the first multi-instrument

Direct production of states with positive charge conjugation in e + e annihilation J.H. K

61A Lecture 29 Friday, November 15 2 Processing Sequential Data Many data sets can be processed

On the Existence of Semi-Regular Sequences Sergio Molina 1 joint work with T. J. Hodges 1 J.

Anisotropy of Cosmic Ray Fluxes measured with AMS-02 on the ISS M. A. Velasco, CIEMAT, Madrid

Spin-dependent muon to electron conversion and muon to positron conversion Yoshitaka Kuno

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Primary 2 to 3 Parent Engagement 11 January 2020 School Mission, Values, Motto & Vision