A semi-automatic structure learning method for language modeling - PowerPoint PPT Presentation

A semi-automatic structure learning method for language modeling Vitor Pera September 11, 2019 Faculdade de Engenharia da Universidade do Porto (FEUP) 1/14

Outline Linguistic Classes Prediction Model (LCPM) LCPM’s Structure Learning Method Preliminary Results Conclusions References 2/14

Linguistic Classes Prediction Model (LCPM) • Multiclass-dependent Ngram ( M > N > 1 ) � P ( ω t | ω 1: t − 1 ) = P ( ω t | c t , ω 1: t − 1 ) P ( c t | ω 1: t − 1 ) c t ∈ C ( ω t ) � ≈ P ( ω t | c t , ω t − N +1: t − 1 ) P ( c t | c t − M +1: t − 1 ) c t ∈ C ( ω t ) 3/14

Linguistic Classes Prediction Model (LCPM) • Multiclass-dependent Ngram ( M > N > 1 ) � P ( ω t | ω 1: t − 1 ) = P ( ω t | c t , ω 1: t − 1 ) P ( c t | ω 1: t − 1 ) c t ∈ C ( ω t ) � ≈ P ( ω t | c t , ω t − N +1: t − 1 ) P ( c t | c t − M +1: t − 1 ) c t ∈ C ( ω t ) • LCPM (FLM formalism) c ↔ f 1: K P ( f 1: K | f 1: K P ( c t | c t − M +1: t − 1 ) − − − − − → t − M +1: t − 1 ) t 3/14

Linguistic Classes Prediction Model (LCPM) • Multiclass-dependent Ngram ( M > N > 1 ) � P ( ω t | ω 1: t − 1 ) = P ( ω t | c t , ω 1: t − 1 ) P ( c t | ω 1: t − 1 ) c t ∈ C ( ω t ) � ≈ P ( ω t | c t , ω t − N +1: t − 1 ) P ( c t | c t − M +1: t − 1 ) c t ∈ C ( ω t ) • LCPM (FLM formalism) c ↔ f 1: K P ( f 1: K | f 1: K P ( c t | c t − M +1: t − 1 ) − − − − − → t − M +1: t − 1 ) t • LCPM structure learning (Goal) • accurate and simple • two steps method 3/14

LCPM’s Structure Learning Method - Step 1: Intro • Given • The need for a LCPM to compute P ( f 1: K | f 1: K t − M +1: t − 1 ) t (factors not known, yet) • Common knowledge on Linguistics • Full knowledge of the specific language interface 4/14

LCPM’s Structure Learning Method - Step 1: Intro • Given • The need for a LCPM to compute P ( f 1: K | f 1: K t − M +1: t − 1 ) t (factors not known, yet) • Common knowledge on Linguistics • Full knowledge of the specific language interface • Solve (non-automatically) • Which linguistic features use? • Which linguistic features exhibit some special statistical independence property? 4/14

LCPM’s Structure Learning Method - Step 1: Procedure 1. Choose the linguistic features ( → f 1: K ) • Informative to model P ( ω t | f 1: K , ω t − N +1: t − 1 ) t • Adequate to data resources (annotation and robustness) 5/14

LCPM’s Structure Learning Method - Step 1: Procedure 1. Choose the linguistic features ( → f 1: K ) • Informative to model P ( ω t | f 1: K , ω t − N +1: t − 1 ) t • Adequate to data resources (annotation and robustness) 2. Make the (credible) assumption: f n t is statistically independent of any other factors, given its own history, iff 1 ≤ n ≤ J (accordingly, split f 1: K → f 1: J ++ f J +1: K , 1 ≤ J < K ) 5/14

LCPM’s Structure Learning Method - Step 1: Procedure 1. Choose the linguistic features ( → f 1: K ) • Informative to model P ( ω t | f 1: K , ω t − N +1: t − 1 ) t • Adequate to data resources (annotation and robustness) 2. Make the (credible) assumption: f n t is statistically independent of any other factors, given its own history, iff 1 ≤ n ≤ J (accordingly, split f 1: K → f 1: J ++ f J +1: K , 1 ≤ J < K ) LCPM factorization � � J � P ( f J +1: K | f 1: J , f 1: K i =1 P ( f i t | f i t − M +1: t − 1 ) t − M +1: t − 1 ) t t � �� Step 2 5/14

LCPM’s Structure Learning Method - Step 1: Example Given some application and a corpus annotated by multiple tags 1. Admit the following tags are judged as the most appropriate: • Part-of-speech (POS) • Semantic tag (ST) • Gender inflection (GI) 6/14

LCPM’s Structure Learning Method - Step 1: Example Given some application and a corpus annotated by multiple tags 1. Admit the following tags are judged as the most appropriate: • Part-of-speech (POS) • Semantic tag (ST) • Gender inflection (GI) 2. Assuming that from these three LFs only ST can be predicted based uniquely on its own history: • ST → f 1 • (POS,GI) → f 2:3 6/14

LCPM’s Structure Learning Method - Step 1: Example Given some application and a corpus annotated by multiple tags 1. Admit the following tags are judged as the most appropriate: • Part-of-speech (POS) • Semantic tag (ST) • Gender inflection (GI) 2. Assuming that from these three LFs only ST can be predicted based uniquely on its own history: • ST → f 1 • (POS,GI) → f 2:3 Results the LCPM approximation: P ( f 1:3 | f 1:3 t − M +1: t − 1 ) ≈ P ( f 1 t | f 1 t − M +1: t − 1 ) P ( f 2:3 | f 1 t , f 1:3 t − M +1: t − 1 ) t t 6/14

LCPM’s Structure Learning Method - Step 2: Intro • Goal is to learn the structure of statistical model to compute P ( f J +1: K | f 1: J , f 1: K t − M +1: t − 1 ) , more precisely ... t t 7/14

LCPM’s Structure Learning Method - Step 2: Intro • Goal is to learn the structure of statistical model to compute P ( f J +1: K | f 1: J , f 1: K t − M +1: t − 1 ) , more precisely ... t t • Determine automatically Z ⊂ f 1: K t − M +1: t − 1 such that • | Z | is fixed and | Z | << | f 1: K t − M +1: t − 1 | (robustness constraint) • and P ( f J +1: K | f 1: J , Z ) approximates the original conditional t t probabilities according to Information Theory based criteria 7/14

LCPM’s Structure Learning Method - Step 2: Intro • Goal is to learn the structure of statistical model to compute P ( f J +1: K | f 1: J , f 1: K t − M +1: t − 1 ) , more precisely ... t t • Determine automatically Z ⊂ f 1: K t − M +1: t − 1 such that • | Z | is fixed and | Z | << | f 1: K t − M +1: t − 1 | (robustness constraint) • and P ( f J +1: K | f 1: J , Z ) approximates the original conditional t t probabilities according to Information Theory based criteria Notation simplification (hereafter): Y = f J +1: K X = f 1: J Z ⊂ W = f 1: K ; ; t − M +1: t − 1 ; → P ( Y | X, Z ) t t 7/14

LCPM’s SL Method - Step 2: Rules to determine Z • Information Theory measures • Conditional entropy, H ( Y | X ) • Conditional mutual information (CMI), I ( Y ; Z | X ) • Cross-context conditional mutual information (CCCMI), I X l ( Y ; Z | X m ) 8/14

LCPM’s SL Method - Step 2: Rules to determine Z • Information Theory measures • Conditional entropy, H ( Y | X ) • Conditional mutual information (CMI), I ( Y ; Z | X ) • Cross-context conditional mutual information (CCCMI), I X l ( Y ; Z | X m ) • Possible/experimented rules ( → P ( Y | X, Z ) w/ Z ⊂ W ) • To discard Z ∗ If I ( Y ; Z ∗ | X ) < η H ( Y | X ) then Z ∗ is non-relevant 8/14

LCPM’s SL Method - Step 2: Rules to determine Z • Information Theory measures • Conditional entropy, H ( Y | X ) • Conditional mutual information (CMI), I ( Y ; Z | X ) • Cross-context conditional mutual information (CCCMI), I X l ( Y ; Z | X m ) • Possible/experimented rules ( → P ( Y | X, Z ) w/ Z ⊂ W ) • To discard Z ∗ If I ( Y ; Z ∗ | X ) < η H ( Y | X ) then Z ∗ is non-relevant • To determine Z ∗ Z ∗ = argmax { I ( Y ; Z | X ) } Z ⊂ W | Z | = ζ 8/14

LCPM’s SL Method - Step 2: Rules to determine Z (cont.) • Rule to determine Z ∗ using the “Utility” measure N λ Z ∗ = argmax { N λ ( Y ; Z | X ) } , 0 < λ ≤ 1 Z ⊂ W | Z | = ζ 9/14

LCPM’s SL Method - Step 2: Rules to determine Z (cont.) • Rule to determine Z ∗ using the “Utility” measure N λ Z ∗ = argmax { N λ ( Y ; Z | X ) } , 0 < λ ≤ 1 Z ⊂ W | Z | = ζ where N λ ( Y ; Z | X ) represents � � � � P ( X m ) I ( Y ; Z | X m ) − λ P ( X l ) I X l ( Y ; Z | X m ) X m X l � = X m 9/14

LCPM’s SL Method - Step 2: Rules to determine Z (cont.) • Rule to determine Z ∗ using the “Utility” measure N λ Z ∗ = argmax { N λ ( Y ; Z | X ) } , 0 < λ ≤ 1 Z ⊂ W | Z | = ζ where N λ ( Y ; Z | X ) represents � � � � P ( X m ) I ( Y ; Z | X m ) − λ P ( X l ) I X l ( Y ; Z | X m ) X m X l � = X m and I X l ( Y ; Z | X m ) represents P ( Y, Z | X m ) � � P ( Y, Z | X l ) log P ( Y | X m ) P ( Z | X m ) Y Z 9/14

LCPM’s SL Method - Step 2: Example Choose Z 1 or Z 2 to model P ( Y | X, Z ) ; Problem: X ∈ { F, S } , Y ∈ { A, B, U } , Z 1 ∈ { C, D, V } , Z 2 ∈ { E, F, W } 10/14

LCPM’s SL Method - Step 2: Example Choose Z 1 or Z 2 to model P ( Y | X, Z ) ; Problem: X ∈ { F, S } , Y ∈ { A, B, U } , Z 1 ∈ { C, D, V } , Z 2 ∈ { E, F, W } Data: P ( X = F ) = P ( X = S ) 10/14

LCPM’s SL Method - Step 2: Example Choose Z 1 or Z 2 to model P ( Y | X, Z ) ; Problem: X ∈ { F, S } , Y ∈ { A, B, U } , Z 1 ∈ { C, D, V } , Z 2 ∈ { E, F, W } Data: P ( X = F ) = P ( X = S ) “Utility” & Solutions: N 0 ( Y ; Z 1 | X ) < N 0 ( Y ; Z 2 | X ) (near equality) ∴ λ = 0 ⇒ choose Z 2 N 1 ( Y ; Z 1 | X ) > N 1 ( Y ; Z 2 | X ) ∴ λ = 1 ⇒ choose Z 1 10/14

A semi-automatic structure learning method for language modeling - PowerPoint PPT Presentation

A semi-automatic structure learning method for language modeling Vitor Pera September 11, 2019 Faculdade de Engenharia da Universidade do Porto (FEUP) 1/14 Outline Linguistic Classes Prediction Model (LCPM) LCPMs Structure Learning Method

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Semi-Crystalline Polymer Morphologies and their Hierarchical Morphologies 1 Semi-Crystalline

Amendments to Maltas Firearms Legislation TARGET SHOOTERS (SEMI-AUTOMATIC FIREARMS),

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

The complementarity of automatic, semi-automatic and phonetic measures of vocal tract output

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 15: Language

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Lecture 20: Semiconductor Structures Kittel Ch 17, p 494-503, 507- 511 + extra material in the

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Roadmap recall Previous courses (General Writing Skills) Course-1: Preparation, Writing

Dedicated Staff Assignment Anglicare Australia National Conference Wednesday 19 th September 2018

For personal use only Australian Securities and Investments Commission ASX Market Announcements

Man Trunk UIDs & Eccles WwTW CEO Briefing 22 nd April 2013 Background and Purpose The

Lecture #2: Adj. Ass. Prof. Dr. Gerald Friedland Programming Structures: Loops and Functions

INF580 Large-scale Mathematical Programming TD6 Random projections Leo Liberti CNRS

Software Engineering I (02161) Week 5 Assoc. Prof. Hubert Baumeister Informatics and

Aaron Turon Mozilla Research C/C++ ML/Haskell Rust Safe systems programming Why

A semi-automatic structure learning method for language modeling - PowerPoint PPT Presentation

A semi-automatic structure learning method for language modeling Vitor Pera September 11, 2019 Faculdade de Engenharia da Universidade do Porto (FEUP) 1/14 Outline Linguistic Classes Prediction Model (LCPM) LCPMs Structure Learning Method

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Semi-Crystalline Polymer Morphologies and their Hierarchical Morphologies 1 Semi-Crystalline

Amendments to Maltas Firearms Legislation TARGET SHOOTERS (SEMI-AUTOMATIC FIREARMS),

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

The complementarity of automatic, semi-automatic and phonetic measures of vocal tract output

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 15: Language

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Lecture 20: Semiconductor Structures Kittel Ch 17, p 494-503, 507- 511 + extra material in the

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Roadmap recall Previous courses (General Writing Skills) Course-1: Preparation, Writing

Dedicated Staff Assignment Anglicare Australia National Conference Wednesday 19 th September 2018

For personal use only Australian Securities and Investments Commission ASX Market Announcements

Man Trunk UIDs &amp; Eccles WwTW CEO Briefing 22 nd April 2013 Background and Purpose The

Lecture #2: Adj. Ass. Prof. Dr. Gerald Friedland Programming Structures: Loops and Functions

INF580 Large-scale Mathematical Programming TD6 Random projections Leo Liberti CNRS

Software Engineering I (02161) Week 5 Assoc. Prof. Hubert Baumeister Informatics and

Aaron Turon Mozilla Research C/C++ ML/Haskell Rust Safe systems programming Why

Man Trunk UIDs & Eccles WwTW CEO Briefing 22 nd April 2013 Background and Purpose The