Sequential Supervised Learning Sequential Supervised Learning Many - PowerPoint PPT Presentation

Sequential Supervised Learning Sequential Supervised Learning

Many Application Problems Require Many Application Problems Require Sequential Learning Sequential Learning Part- -of of- -speech Tagging speech Tagging Part Information Extraction from the Web Information Extraction from the Web Text- -to to- -Speech Mapping Speech Mapping Text

Part- -of of- -Speech Tagging Speech Tagging Part Given an English sentence, can we assign Given an English sentence, can we assign a part of speech to each word? a part of speech to each word? “Do you want fries with that? Do you want fries with that?” ” “ <verb pron pron verb noun prep verb noun prep pron pron> > <verb

Information Extraction from the Information Extraction from the Web Web <dl><dt><b>Srinivasan Seshan</b> (Carnegie Mellon University) <dt><a href=…><i>Making Virtual Worlds Real</i></a><dt>Tuesday, June 4, 2002<dd>2:00 PM , 322 Sieg<dd>Research Seminar * * * name name * * affiliation affiliation affiliation * * * * title title title title * * * date date date date * time time * location location * event-type event-type

Text- -to to- -Speech Mapping Speech Mapping Text “photograph photograph” ” => / => / f / “ - / f- -Ot@graf Ot@graf-

Sequential Supervised Learning Sequential Supervised Learning (SSL) (SSL) Given: A set of training examples of the Given: A set of training examples of the form ( X , Y ), where form ( i , i ), where X i Y i = h , … … , , x i and and i = i,1 , h x i i X i x i,1 x i,T X i,T i = h , … … , , y i are sequences of length are sequences of length i = i,1 , h y i i Y i y i,1 y i,T Y i,T i T i T i Find: A function f for predicting new Find: A function f for predicting new sequences: Y Y = = f( f( X ). sequences: X ).

Examples of of Examples Sequential Supervised Learning Sequential Supervised Learning Domain Input X Output Y Domain Input Output X i Y i i i Part- -of of- -speech speech sequence of sequence of Part sequence of sequence of Tagging words parts of speech Tagging words parts of speech Information sequence of sequence of field Information sequence of sequence of field Extraction tokens labels {name, … …} } Extraction tokens labels {name, Test- -to to- -speech speech sequence of sequence Test sequence of sequence Mapping letters phonemes Mapping letters phonemes

Two Kinds of Relationships Two Kinds of Relationships y 1 y 2 y 3 x 1 x 2 x 3 “Vertical Vertical” ” relationship between the relationship between the x ’s s and and y ’s s “ t ’ t ’ x t y t – Example: Example: “ “Friday Friday” ” is usually a is usually a “ “date date” ” – “Horizontal Horizontal” ” relationships among the relationships among the y ’s s “ t ’ y t – Example: Example: “ “name name” ” is usually followed by is usually followed by “ “affiliation affiliation” ” – SSL can (and should) exploit both kinds of SSL can (and should) exploit both kinds of information information

Existing Methods Existing Methods Hacks Hacks – Sliding Sliding windows windows – – Recurrent sliding windows Recurrent sliding windows – Hidden Markov models models Hidden Markov – joint distribution: P(X,Y) joint distribution: P(X,Y) – Conditional Random Fields Conditional Random Fields – conditional distribution: P(Y|X) conditional distribution: P(Y|X) – Discriminant Methods: HM- -SVMs, MMMs, voted SVMs, MMMs, voted Discriminant Methods: HM perceptrons perceptrons – discriminant function: f(Y; X) discriminant function: f(Y; X) –

Sliding Windows Sliding Windows ___ Do you want fries with that ___ ___ Do you want fries with that ___ ___ Do you verb ___ Do you verb → → Do you want pron Do you want pron → → you want fries verb you want fries verb → → want fries with noun want fries with noun → → fries with that prep fries with that prep → → with that ___ pron with that ___ pron → →

Properties of Sliding Windows Properties of Sliding Windows Converts SSL to ordinary supervised Converts SSL to ordinary supervised learning learning Only captures the relationship between Only captures the relationship between (part of) X and y . Does not explicitly (part of) X and t . Does not explicitly y t model relations among the y ’s s model relations among the t ’ y t Assumes each window is independent Assumes each window is independent

Recurrent Sliding Windows Recurrent Sliding Windows ___ Do you want fries with that ___ ___ Do you want fries with that ___ ___ Do you ___ verb ___ Do you ___ verb → → Do you want verb pron Do you want verb pron → → you want fries pron verb you want fries pron verb → → want fries with verb noun want fries with verb noun → → fries with that noun prep fries with that noun prep → → with that ___ prep pron with that ___ prep pron → →

Recurrent Sliding Windows Recurrent Sliding Windows Key Idea: Include y as input feature when Key Idea: Include t as input feature when y t computing y computing y t+1 . t+1 . During training: During training: – Use the correct value of Use the correct value of y – y t t – Or train iteratively (especially recurrent neural Or train iteratively (especially recurrent neural – networks) networks) During evaluation: During evaluation: – Use the predicted value of Use the predicted value of y – y t t

Properties of Recurrent Sliding Properties of Recurrent Sliding Windows Windows Captures relationship among the y y’ ’s s, but , but Captures relationship among the only in one direction! only in one direction! Results on text- -to to- -speech: speech: Results on text Method Direction Words Letters Method Direction Words Letters sliding window none 12.5% 69.6% sliding window none 12.5% 69.6% recurrent s. w. left- -right right 17.0% 67.9% recurrent s. w. left 17.0% 67.9% recurrent s. w. right- -left left 24.4% 74.2% recurrent s. w. right 24.4% 74.2%

Hidden Markov Models Hidden Markov Models Generalization of Naï ïve Bayes to SSL ve Bayes to SSL Generalization of Na y 1 y 2 y 3 y 4 y 5 x 1 x 2 x 3 x 4 x 5 P(y 1 ) P(y 1 ) P(y t | y t ) assumed the same for all t P(y t | y 1 ) assumed the same for all t t- -1 P( x | y t ) = P(x t,1 | y t ) · · P(x P(x t,2 | y t ) L L P(x P(x t,n ,y t ) P( t | y t ) = P(x t,1 | y t ) t,2 | y t ) t,n ,y t ) x t assumed the same for all t assumed the same for all t

Making Predictions with HMMs Making Predictions with HMMs Two possible goals: Two possible goals: – argmax argmax Y P(Y|X) – Y P(Y|X) find the most likely sequence sequence of labels Y given the of labels Y given the find the most likely input sequence X input sequence X – argmax argmax y P(y t | X) forall t – t P(y t | X) forall t yt find the most likely label y t at each time t given the find the most likely label y t at each time t given the entire input sequence X entire input sequence X

Finding Most Likely Label Sequence: Finding Most Likely Label Sequence: The Trellis The Trellis verb pronoun verb noun noun s verb 1 1 1 1 1 pronoun 2 2 2 2 2 noun 3 3 3 3 3 adjective 4 4 4 4 4 f Do you want fries sir? Every label sequence corresponds to a path through the trellis graph. The probability of a label sequence is proportional to P(y 1 ) · P(x 1 |y 1 ) · P(y 2 |y 1 ) · P(x 2 |y 2 ) L P(y T | y T-1 ) · P(x T | y T )

Finding Most Likely Label Sequence: Finding Most Likely Label Sequence: The Viterbi Algorithm The Viterbi Algorithm verb pronoun verb noun noun s verb 1 1 1 1 1 pronoun 2 2 2 2 2 noun 3 3 3 3 3 adjective 4 4 4 4 4 f Do you want fries sir? Step t of the Viterbi algorithm computes the possible successors of state y t-1 _ and computes the total path length for each edge

Finding Most Likely Label Sequence: Finding Most Likely Label Sequence: The Viterbi Algorithm The Viterbi Algorithm verb pronoun verb noun noun s verb 1 1 1 1 1 pronoun 2 2 2 2 2 noun 3 3 3 3 3 adjective 4 4 4 4 4 f Do you want fries sir? Each node y t =k stores the cost µ of the shortest path that reaches it from s and the predecessor class y t-1 = k’ that achieves this cost k’ = argmin yt-1 –log [P(y t | y t-1 ) · P(x t | y t )] + µ (y t-1 ) µ( k) = min yt-1 –log [P(y t | y t-1 ) · P(x t | y t )] + µ (y t-1 )

Sequential Supervised Learning Sequential Supervised Learning Many - PowerPoint PPT Presentation

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Many Application Problems Require Sequential Learning Sequential Learning Part- -of of- -speech Tagging speech Tagging Part Information

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Chapter 5 Synchronous Sequential Logic 5-1 Outline ! Sequential Circuits ! Latches ! Flip-Flops

Introduction to Synchronous Sequential Introduction to Synchronous Sequential Circuits Circuits

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Learning frameworks Self-supervised learning: (Auto)encoder networks Supervised learning Network

ProbabilisticGraphicalModels(Cmput651): UndirectedGraphicalModels1

t t sts

Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang Biointelligence

Fields for Information Extraction S U N I T A S A R A W A G I A N D W I L L I A M C O H E N

CS 6784: Spring 2010 Advanced Topics in Machine Learning Review Guozhang Wang February 25, 2010

On the Conditional Mutual I nformation in Gaussian- Markov Structured Grids Hanie Sedghi &

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed

Lecture 28/Chapters 22 & 23 Hypothesis Tests Variable Types and Appropriate Tests

Sequential Supervised Learning Sequential Supervised Learning Many - PowerPoint PPT Presentation

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Many Application Problems Require Sequential Learning Sequential Learning Part- -of of- -speech Tagging speech Tagging Part Information

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Chapter 5 Synchronous Sequential Logic 5-1 Outline ! Sequential Circuits ! Latches ! Flip-Flops

Introduction to Synchronous Sequential Introduction to Synchronous Sequential Circuits Circuits

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Learning frameworks Self-supervised learning: (Auto)encoder networks Supervised learning Network

ProbabilisticGraphicalModels(Cmput651): UndirectedGraphicalModels1

t t sts

Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang Biointelligence

Fields for Information Extraction S U N I T A S A R A W A G I A N D W I L L I A M C O H E N

CS 6784: Spring 2010 Advanced Topics in Machine Learning Review Guozhang Wang February 25, 2010

On the Conditional Mutual I nformation in Gaussian- Markov Structured Grids Hanie Sedghi &amp;

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed

Lecture 28/Chapters 22 &amp; 23 Hypothesis Tests Variable Types and Appropriate Tests

On the Conditional Mutual I nformation in Gaussian- Markov Structured Grids Hanie Sedghi &

Lecture 28/Chapters 22 & 23 Hypothesis Tests Variable Types and Appropriate Tests