a semi automatic structure learning method for language
play

A semi-automatic structure learning method for language modeling - PowerPoint PPT Presentation

A semi-automatic structure learning method for language modeling Vitor Pera September 11, 2019 Faculdade de Engenharia da Universidade do Porto (FEUP) 1/14 Outline Linguistic Classes Prediction Model (LCPM) LCPMs Structure Learning Method


  1. A semi-automatic structure learning method for language modeling Vitor Pera September 11, 2019 Faculdade de Engenharia da Universidade do Porto (FEUP) 1/14

  2. Outline Linguistic Classes Prediction Model (LCPM) LCPM’s Structure Learning Method Preliminary Results Conclusions References 2/14

  3. Linguistic Classes Prediction Model (LCPM) • Multiclass-dependent Ngram ( M > N > 1 ) � P ( ω t | ω 1: t − 1 ) = P ( ω t | c t , ω 1: t − 1 ) P ( c t | ω 1: t − 1 ) c t ∈ C ( ω t ) � ≈ P ( ω t | c t , ω t − N +1: t − 1 ) P ( c t | c t − M +1: t − 1 ) c t ∈ C ( ω t ) 3/14

  4. Linguistic Classes Prediction Model (LCPM) • Multiclass-dependent Ngram ( M > N > 1 ) � P ( ω t | ω 1: t − 1 ) = P ( ω t | c t , ω 1: t − 1 ) P ( c t | ω 1: t − 1 ) c t ∈ C ( ω t ) � ≈ P ( ω t | c t , ω t − N +1: t − 1 ) P ( c t | c t − M +1: t − 1 ) c t ∈ C ( ω t ) • LCPM (FLM formalism) c ↔ f 1: K P ( f 1: K | f 1: K P ( c t | c t − M +1: t − 1 ) − − − − − → t − M +1: t − 1 ) t 3/14

  5. Linguistic Classes Prediction Model (LCPM) • Multiclass-dependent Ngram ( M > N > 1 ) � P ( ω t | ω 1: t − 1 ) = P ( ω t | c t , ω 1: t − 1 ) P ( c t | ω 1: t − 1 ) c t ∈ C ( ω t ) � ≈ P ( ω t | c t , ω t − N +1: t − 1 ) P ( c t | c t − M +1: t − 1 ) c t ∈ C ( ω t ) • LCPM (FLM formalism) c ↔ f 1: K P ( f 1: K | f 1: K P ( c t | c t − M +1: t − 1 ) − − − − − → t − M +1: t − 1 ) t • LCPM structure learning (Goal) • accurate and simple • two steps method 3/14

  6. LCPM’s Structure Learning Method - Step 1: Intro • Given • The need for a LCPM to compute P ( f 1: K | f 1: K t − M +1: t − 1 ) t (factors not known, yet) • Common knowledge on Linguistics • Full knowledge of the specific language interface 4/14

  7. LCPM’s Structure Learning Method - Step 1: Intro • Given • The need for a LCPM to compute P ( f 1: K | f 1: K t − M +1: t − 1 ) t (factors not known, yet) • Common knowledge on Linguistics • Full knowledge of the specific language interface • Solve (non-automatically) • Which linguistic features use? • Which linguistic features exhibit some special statistical independence property? 4/14

  8. LCPM’s Structure Learning Method - Step 1: Procedure 1. Choose the linguistic features ( → f 1: K ) • Informative to model P ( ω t | f 1: K , ω t − N +1: t − 1 ) t • Adequate to data resources (annotation and robustness) 5/14

  9. LCPM’s Structure Learning Method - Step 1: Procedure 1. Choose the linguistic features ( → f 1: K ) • Informative to model P ( ω t | f 1: K , ω t − N +1: t − 1 ) t • Adequate to data resources (annotation and robustness) 2. Make the (credible) assumption: f n t is statistically independent of any other factors, given its own history, iff 1 ≤ n ≤ J (accordingly, split f 1: K → f 1: J ++ f J +1: K , 1 ≤ J < K ) 5/14

  10. LCPM’s Structure Learning Method - Step 1: Procedure 1. Choose the linguistic features ( → f 1: K ) • Informative to model P ( ω t | f 1: K , ω t − N +1: t − 1 ) t • Adequate to data resources (annotation and robustness) 2. Make the (credible) assumption: f n t is statistically independent of any other factors, given its own history, iff 1 ≤ n ≤ J (accordingly, split f 1: K → f 1: J ++ f J +1: K , 1 ≤ J < K ) LCPM factorization � � J � P ( f J +1: K | f 1: J , f 1: K i =1 P ( f i t | f i t − M +1: t − 1 ) t − M +1: t − 1 ) t t � �� � Step 2 5/14

  11. LCPM’s Structure Learning Method - Step 1: Example Given some application and a corpus annotated by multiple tags 1. Admit the following tags are judged as the most appropriate: • Part-of-speech (POS) • Semantic tag (ST) • Gender inflection (GI) 6/14

  12. LCPM’s Structure Learning Method - Step 1: Example Given some application and a corpus annotated by multiple tags 1. Admit the following tags are judged as the most appropriate: • Part-of-speech (POS) • Semantic tag (ST) • Gender inflection (GI) 2. Assuming that from these three LFs only ST can be predicted based uniquely on its own history: • ST → f 1 • (POS,GI) → f 2:3 6/14

  13. LCPM’s Structure Learning Method - Step 1: Example Given some application and a corpus annotated by multiple tags 1. Admit the following tags are judged as the most appropriate: • Part-of-speech (POS) • Semantic tag (ST) • Gender inflection (GI) 2. Assuming that from these three LFs only ST can be predicted based uniquely on its own history: • ST → f 1 • (POS,GI) → f 2:3 Results the LCPM approximation: P ( f 1:3 | f 1:3 t − M +1: t − 1 ) ≈ P ( f 1 t | f 1 t − M +1: t − 1 ) P ( f 2:3 | f 1 t , f 1:3 t − M +1: t − 1 ) t t 6/14

  14. LCPM’s Structure Learning Method - Step 2: Intro • Goal is to learn the structure of statistical model to compute P ( f J +1: K | f 1: J , f 1: K t − M +1: t − 1 ) , more precisely ... t t 7/14

  15. LCPM’s Structure Learning Method - Step 2: Intro • Goal is to learn the structure of statistical model to compute P ( f J +1: K | f 1: J , f 1: K t − M +1: t − 1 ) , more precisely ... t t • Determine automatically Z ⊂ f 1: K t − M +1: t − 1 such that • | Z | is fixed and | Z | << | f 1: K t − M +1: t − 1 | (robustness constraint) • and P ( f J +1: K | f 1: J , Z ) approximates the original conditional t t probabilities according to Information Theory based criteria 7/14

  16. LCPM’s Structure Learning Method - Step 2: Intro • Goal is to learn the structure of statistical model to compute P ( f J +1: K | f 1: J , f 1: K t − M +1: t − 1 ) , more precisely ... t t • Determine automatically Z ⊂ f 1: K t − M +1: t − 1 such that • | Z | is fixed and | Z | << | f 1: K t − M +1: t − 1 | (robustness constraint) • and P ( f J +1: K | f 1: J , Z ) approximates the original conditional t t probabilities according to Information Theory based criteria Notation simplification (hereafter): Y = f J +1: K X = f 1: J Z ⊂ W = f 1: K ; ; t − M +1: t − 1 ; → P ( Y | X, Z ) t t 7/14

  17. LCPM’s SL Method - Step 2: Rules to determine Z • Information Theory measures • Conditional entropy, H ( Y | X ) • Conditional mutual information (CMI), I ( Y ; Z | X ) • Cross-context conditional mutual information (CCCMI), I X l ( Y ; Z | X m ) 8/14

  18. LCPM’s SL Method - Step 2: Rules to determine Z • Information Theory measures • Conditional entropy, H ( Y | X ) • Conditional mutual information (CMI), I ( Y ; Z | X ) • Cross-context conditional mutual information (CCCMI), I X l ( Y ; Z | X m ) • Possible/experimented rules ( → P ( Y | X, Z ) w/ Z ⊂ W ) • To discard Z ∗ If I ( Y ; Z ∗ | X ) < η H ( Y | X ) then Z ∗ is non-relevant 8/14

  19. LCPM’s SL Method - Step 2: Rules to determine Z • Information Theory measures • Conditional entropy, H ( Y | X ) • Conditional mutual information (CMI), I ( Y ; Z | X ) • Cross-context conditional mutual information (CCCMI), I X l ( Y ; Z | X m ) • Possible/experimented rules ( → P ( Y | X, Z ) w/ Z ⊂ W ) • To discard Z ∗ If I ( Y ; Z ∗ | X ) < η H ( Y | X ) then Z ∗ is non-relevant • To determine Z ∗ Z ∗ = argmax { I ( Y ; Z | X ) } Z ⊂ W | Z | = ζ 8/14

  20. LCPM’s SL Method - Step 2: Rules to determine Z (cont.) • Rule to determine Z ∗ using the “Utility” measure N λ Z ∗ = argmax { N λ ( Y ; Z | X ) } , 0 < λ ≤ 1 Z ⊂ W | Z | = ζ 9/14

  21. LCPM’s SL Method - Step 2: Rules to determine Z (cont.) • Rule to determine Z ∗ using the “Utility” measure N λ Z ∗ = argmax { N λ ( Y ; Z | X ) } , 0 < λ ≤ 1 Z ⊂ W | Z | = ζ where N λ ( Y ; Z | X ) represents � � � � P ( X m ) I ( Y ; Z | X m ) − λ P ( X l ) I X l ( Y ; Z | X m ) X m X l � = X m 9/14

  22. LCPM’s SL Method - Step 2: Rules to determine Z (cont.) • Rule to determine Z ∗ using the “Utility” measure N λ Z ∗ = argmax { N λ ( Y ; Z | X ) } , 0 < λ ≤ 1 Z ⊂ W | Z | = ζ where N λ ( Y ; Z | X ) represents � � � � P ( X m ) I ( Y ; Z | X m ) − λ P ( X l ) I X l ( Y ; Z | X m ) X m X l � = X m and I X l ( Y ; Z | X m ) represents P ( Y, Z | X m ) � � P ( Y, Z | X l ) log P ( Y | X m ) P ( Z | X m ) Y Z 9/14

  23. LCPM’s SL Method - Step 2: Example Choose Z 1 or Z 2 to model P ( Y | X, Z ) ; Problem: X ∈ { F, S } , Y ∈ { A, B, U } , Z 1 ∈ { C, D, V } , Z 2 ∈ { E, F, W } 10/14

  24. LCPM’s SL Method - Step 2: Example Choose Z 1 or Z 2 to model P ( Y | X, Z ) ; Problem: X ∈ { F, S } , Y ∈ { A, B, U } , Z 1 ∈ { C, D, V } , Z 2 ∈ { E, F, W } Data: P ( X = F ) = P ( X = S ) 10/14

  25. LCPM’s SL Method - Step 2: Example Choose Z 1 or Z 2 to model P ( Y | X, Z ) ; Problem: X ∈ { F, S } , Y ∈ { A, B, U } , Z 1 ∈ { C, D, V } , Z 2 ∈ { E, F, W } Data: P ( X = F ) = P ( X = S ) “Utility” & Solutions: N 0 ( Y ; Z 1 | X ) < N 0 ( Y ; Z 2 | X ) (near equality) ∴ λ = 0 ⇒ choose Z 2 N 1 ( Y ; Z 1 | X ) > N 1 ( Y ; Z 2 | X ) ∴ λ = 1 ⇒ choose Z 1 10/14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend