structures
play

Structures Tao Lei Yu Xin, Yuan Zhang, Regina Barzilay, Tommi - PowerPoint PPT Presentation

Low-Rank Tensors for Scoring Dependency Structures Tao Lei Yu Xin, Yuan Zhang, Regina Barzilay, Tommi Jaakkola CSAIL, MIT 1 Dependency Parsing ROOT I ate cake with a fork today PRON VB NN IN DT NN NN Dependency parsing as


  1. Low-Rank Tensors for Scoring Dependency Structures Tao Lei Yu Xin, Yuan Zhang, Regina Barzilay, Tommi Jaakkola CSAIL, MIT 1

  2. Dependency Parsing ROOT I ate cake with a fork today PRON VB NN IN DT NN NN β€’ Dependency parsing as maximization problem: 𝑧 βˆ— = argmax 𝑇 𝑦, 𝑧; πœ„ yβˆˆπ‘ˆ(𝑦) β€’ Key aspects of a parsing system: 𝑇(𝑦, 𝑧; πœ„) Our Goal 1. Accurate scoring function argmax 2. Efficient decoding procedure 2

  3. Finding Expressive Feature Set Traditional view: requires a rich, expressive set of manually-crafted feature templates ROOT I ate cake with a fork today PRON VB NN IN DT NN NN High-dim. sparse vector 𝜚 𝑦, 𝑧 ∈ ℝ 𝑀 … … 1 0 1 1 0 0 0 0 Feature Template: Feature Example: β€œVB ⨁ NN ⨁ 2” head POS, modifier POS and length 3

  4. Finding Expressive Feature Set Traditional view: requires a rich, expressive set of manually-crafted feature templates ROOT I ate cake with a fork today PRON VB NN IN DT NN NN High-dim. sparse vector 𝜚 𝑦, 𝑧 ∈ ℝ 𝑀 … … 1 0 1 1 0 0 0 0 Feature Template: Feature Example: β€œate ⨁ cake” head word and modifier word 4

  5. Finding Expressive Feature Set Traditional view: requires a rich, expressive set of manually-crafted feature templates ROOT I ate cake with a fork today PRON VB NN IN DT NN NN High-dim. sparse vector 𝜚 𝑦, 𝑧 ∈ ℝ 𝑀 … … 1 0 2 1 2 0 0 0 β‹… Parameter vector πœ„ ∈ ℝ 𝑀 … … 0.1 0.3 2.2 1.1 0 0.1 0.9 0 𝑇 πœ„ 𝑦, 𝑧 = πœ„, 𝜚 𝑦, 𝑧 5

  6. Traditional Scoring Revisited β€’ Features and templates are manually-selected concatenations of atomic features, in traditional vector-based scoring: ROOT I ate cake with a fork today PRON VB NN IN DT NN NN Arc Features: Attach Modifier Head Length? ate ⨁ cake ⨁ 2 ate cake HW_MW_LEN: Word: Yes NN VB POS: ⨁ ⨁ NN+cake VB+ate POS+Word: VB PRON Left POS: No IN NN Right POS: 6

  7. Traditional Scoring Revisited β€’ Features and templates are manually-selected concatenations of atomic features, in traditional vector-based scoring: ROOT I ate cake with a fork today PRON VB NN IN DT NN NN Arc Features: Attach Modifier Head Length? ate ⨁ cake ⨁ 2 ate cake HW_MW_LEN: Word: Yes ate ⨁ cake NN VB POS: HW_MW: ⨁ ⨁ NN+cake VB+ate POS+Word: VB PRON Left POS: No IN NN Right POS: 7

  8. Traditional Scoring Revisited β€’ Features and templates are manually-selected concatenations of atomic features, in traditional vector-based scoring: ROOT I ate cake with a fork today PRON VB NN IN DT NN NN Arc Features: Attach Modifier Head Length? ate ⨁ cake ⨁ 2 ate cake HW_MW_LEN: Word: Yes ate ⨁ cake NN VB POS: HW_MW: ⨁ ⨁ NN+cake VB+ate POS+Word: VB ⨁ NN ⨁ 2 HP_MP_LEN: VB PRON Left POS: No VB ⨁ NN HP_MP: IN NN Right POS: … … 8

  9. Traditional Scoring Revisited β€’ Problem: very difficult to pick the best subset of concatenations Too few templates Lose performance Too many templates Too many parameters to estimate Features are correlated Searching the best set? Choices are exponential β€’ Our approach: use low-rank tensor (i.e. multi-way array)  Capture a whole range of feature combinations  Keep the parameter estimation problem in control 9

  10. Low-Rank Tensor Scoring: Formulation β€’ Formulate ALL possible concatenations as a rank-1 tensor 𝜚 β„Ž,𝑛 𝜚 β„Ž 𝜚 𝑛 atomic head atomic modifier atomic arc feature vector feature vector feature vector Attach Head Modifier Length? cake ate Yes VB NN VB+ate NN+cake VB PRON No IN NN 10

  11. Low-Rank Tensor Scoring: Formulation β€’ Formulate ALL possible concatenations as a rank-1 tensor ∈ ℝ π‘œΓ—π‘œΓ—π‘’ 𝜚 β„Ž,𝑛 𝜚 β„Ž 𝜚 𝑛 βŠ— βŠ— atomic head atomic modifier atomic arc feature vector feature vector feature vector 𝑦⨂𝑧⨂𝑨 π‘—π‘˜π‘™ = 𝑦 𝑗 𝑧 π‘˜ 𝑨 𝑙 tensor product Each entry indicates the occurrence of one feature concatenation 11

  12. Low-Rank Tensor Scoring: Formulation β€’ Formulate ALL possible concatenations as a rank-1 tensor ∈ ℝ π‘œΓ—π‘œΓ—π‘’ 𝜚 β„Ž,𝑛 𝜚 β„Ž 𝜚 𝑛 βŠ— βŠ— atomic head atomic modifier atomic arc feature vector feature vector feature vector β€’ Formulate the parameters as a tensor as well πœ„ ∈ ℝ 𝑀 : 𝑇 πœ„ β„Ž β†’ 𝑛 = πœ„, 𝜚 β„Žβ†’π‘› (vector-based) 𝐡 ∈ ℝ π‘œΓ—π‘œΓ—π‘’ : 𝑇 π‘’π‘“π‘œπ‘‘π‘π‘  β„Ž β†’ 𝑛 = 𝐡, 𝜚 β„Ž βŠ— 𝜚 𝑛 βŠ— 𝜚 β„Ž,𝑛 (tensor-based) Involves features not in πœ„ Can be huge. On English: π‘œ Γ— π‘œ Γ— 𝑒 β‰ˆ 10 11 12

  13. Low-Rank Tensor Scoring: Formulation β€’ Formulate ALL possible concatenations as a rank-1 tensor ∈ ℝ π‘œΓ—π‘œΓ—π‘’ 𝜚 β„Ž,𝑛 𝜚 β„Ž 𝜚 𝑛 βŠ— βŠ— atomic head atomic modifier atomic arc feature vector feature vector feature vector β€’ Formulate the parameters as a low-rank tensor πœ„ ∈ ℝ 𝑀 : 𝑇 πœ„ β„Ž β†’ 𝑛 = πœ„, 𝜚 β„Žβ†’π‘› (vector-based) 𝐡 ∈ ℝ π‘œΓ—π‘œΓ—π‘’ : 𝑇 π‘’π‘“π‘œπ‘‘π‘π‘  β„Ž β†’ 𝑛 = 𝐡, 𝜚 β„Ž βŠ— 𝜚 𝑛 βŠ— 𝜚 β„Ž,𝑛 (tensor-based) 𝑉, π‘Š ∈ ℝ π‘ Γ—π‘œ , 𝑋 ∈ ℝ 𝑠×𝑒 : 𝐡 = 𝑉 𝑗 β¨‚π‘Š 𝑗 ⨂𝑋(𝑗) Low-rank tensor 13 r rank-1 tensors

  14. Low-Rank Tensor Scoring: Formulation 𝑇 π‘’π‘“π‘œπ‘‘π‘π‘  β„Ž β†’ 𝑛 = 𝐡, 𝜚 β„Ž β¨‚πœš 𝑛 β¨‚πœš β„Ž,𝑛 𝐡 = 𝑉 𝑗 β¨‚π‘Š 𝑗 ⨂𝑋(𝑗) ⟹ 𝑠 = π‘‰πœš β„Ž 𝑗 π‘Šπœš 𝑛 𝑗 π‘‹πœš β„Ž,𝑛 𝑗 𝑗=1 𝑠 π‘‰πœš β„Ž 𝑗 π‘Šπœš 𝑛 𝑗 π‘‹πœš β„Ž,𝑛 𝑗 ∈ ℝ 𝑠 Dense low-dim representations: 𝑗=1 = Γ— dense dense sparse 14

  15. Low-Rank Tensor Scoring: Formulation 𝑇 π‘’π‘“π‘œπ‘‘π‘π‘  β„Ž β†’ 𝑛 = 𝐡, 𝜚 β„Ž β¨‚πœš 𝑛 β¨‚πœš β„Ž,𝑛 𝐡 = 𝑉 𝑗 β¨‚π‘Š 𝑗 ⨂𝑋(𝑗) ⟹ 𝑠 = π‘‰πœš β„Ž 𝑗 π‘Šπœš 𝑛 𝑗 π‘‹πœš β„Ž,𝑛 𝑗 𝑗=1 𝑠 π‘‰πœš β„Ž 𝑗 π‘Šπœš 𝑛 𝑗 π‘‹πœš β„Ž,𝑛 𝑗 ∈ ℝ 𝑠 Dense low-dim representations: 𝑗=1 𝑠 π‘‰πœš β„Ž 𝑗 π‘Šπœš 𝑛 𝑗 π‘‹πœš β„Ž 𝑛 𝑗 Element-wise products: , 𝑗=1 𝑠 π‘‰πœš β„Ž 𝑗 π‘Šπœš 𝑛 𝑗 π‘‹πœš β„Ž 𝑛 𝑗 Sum over these products: , 𝑗=1 15

  16. Intuition and Explanations Example: Collaborative Filtering Approximate user-ratings via low-rank β€œprice” 𝑉 2Γ—π‘œ : preferences β€œquality” ?? β€œprice” π‘Š 2×𝑛 : properties β€œquality” user-rating sparse matrix A  Ratings not completely independent  Items share hidden properties (β€œprice” and β€œquality”)  Users have hidden preferences over properties 16

  17. Intuition and Explanations Example: Collaborative Filtering Approximate user-ratings via low-rank V ( 1 ) V ( r ) β‰ˆ + β‹― + ?? β€œprice” β€œquality” U ( 1 ) U ( r ) user-rating sparse matrix A 𝐡 = 𝑉 T π‘Š = βˆ‘π‘‰ 𝑗 βŠ— π‘Š(𝑗) π‘œ Γ— 𝑛 π‘œ + 𝑛 𝑠 # of parameters: Intuition: Data and parameters can be approximately characterized by a small number of hidden factors 17

  18. Intuition and Explanations Our Case: Approximate parameters (feature weights) via low-rank parameter tensor A ?? ... 2 … 4 ?? ... 2 … 4 ?? ... 2 … 4 … 0 0 … … … 0 0 … … … 0 0 … … + β‹― + β‰ˆ … 0 0 … … 0 0 … … 0 0 … … 1 0.9 … 5 … 1 0.9 … 5 … 1 0.9 … 5 similar values because … 0.1 0.1 … … … 0.1 0.1 … … … 0.1 0.1 … … β€œapple” and β€œbanana” have similar syntactic behavior 𝐡 = βˆ‘π‘‰ 𝑗 βŠ— π‘Š 𝑗 βŠ— 𝑋 𝑗  Hidden properties associated with each word  Share parameter values via the hidden properties 18

  19. Low-Rank Tensor Scoring: Summary β€’ Naturally captures full feature expansion (concatenations) -- Without mannually specifying a bunch of feature templates β€’ Controlled feature expansion by low-rank (small r ) -- better feature tuning and optimization Head Atomic ate VB VB+ate β€’ Easily add and utilize new, auxiliary features PRON NN -- Simply append them as atomic features person:I number:singular Emb[1]: -0.0128 19 Emb[2]: 0.5392

  20. Combined Scoring β€’ Combining traditional and tensor scoring in 𝑇 𝛿 (𝑦, 𝑧) : 𝛿 β‹… 𝑇 πœ„ 𝑦, 𝑧 + 1 βˆ’ 𝛿 β‹… 𝑇 π‘’π‘“π‘œπ‘‘π‘π‘  𝑦, 𝑧 𝛿 ∈ [0,1] Set of manual Full feature expansion selected features controlled by low-rank Similar β€œ sparse+low-rank ” idea for matrix decomposition: Tao and Yuan, 2011; Zhou and Tao, 2011; Waters et al., 2011; Chandrasekaran et al., 2011 β€’ Final maximization problem given parameters πœ„, 𝑉, π‘Š, 𝑋 : 𝑧 βˆ— = argmax 𝑇 𝛿 𝑦, 𝑧; πœ„, 𝑉, π‘Š, 𝑋 yβˆˆπ‘ˆ(𝑦) 20

  21. Learning Problem 𝑂 β€’ Given training set D = 𝑦 𝑗 , 𝑧 𝑗 𝑗=1 β€’ Search for parameter values that score the gold trees higher than others: βˆ€π‘§ ∈ π”π¬πŸπŸ 𝑦 𝑗 : 𝑇 𝑦 𝑗 , 𝑧 𝑗 β‰₯ 𝑇 𝑦 𝑗 , 𝑧 + 𝑧 𝑗 βˆ’ 𝑧 βˆ’ 𝜊 𝑗 Non-negative loss β€’ The training objective: unsatisfied constraints are penalized against 𝜊 𝑗 + 𝑉 2 + π‘Š 2 + 𝑋 2 + πœ„ 2 min 𝐷 πœ„,𝑉,π‘Š,𝑋,𝜊 𝑗 β‰₯0 𝑗 Training loss Regularization Calculating the loss requires to solve the expensive maximization problem; Following common practices, adopt online learning framework. 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend