Structures Tao Lei Yu Xin, Yuan Zhang, Regina Barzilay, Tommi - PowerPoint PPT Presentation

Low-Rank Tensors for Scoring Dependency Structures Tao Lei Yu Xin, Yuan Zhang, Regina Barzilay, Tommi Jaakkola CSAIL, MIT 1

Dependency Parsing ROOT I ate cake with a fork today PRON VB NN IN DT NN NN • Dependency parsing as maximization problem: 𝑧 ∗ = argmax 𝑇 𝑦, 𝑧; 𝜄 y∈𝑈(𝑦) • Key aspects of a parsing system: 𝑇(𝑦, 𝑧; 𝜄) Our Goal 1. Accurate scoring function argmax 2. Efficient decoding procedure 2

Finding Expressive Feature Set Traditional view: requires a rich, expressive set of manually-crafted feature templates ROOT I ate cake with a fork today PRON VB NN IN DT NN NN High-dim. sparse vector 𝜚 𝑦, 𝑧 ∈ ℝ 𝑀 … … 1 0 1 1 0 0 0 0 Feature Template: Feature Example: “VB ⨁ NN ⨁ 2” head POS, modifier POS and length 3

Finding Expressive Feature Set Traditional view: requires a rich, expressive set of manually-crafted feature templates ROOT I ate cake with a fork today PRON VB NN IN DT NN NN High-dim. sparse vector 𝜚 𝑦, 𝑧 ∈ ℝ 𝑀 … … 1 0 1 1 0 0 0 0 Feature Template: Feature Example: “ate ⨁ cake” head word and modifier word 4

Finding Expressive Feature Set Traditional view: requires a rich, expressive set of manually-crafted feature templates ROOT I ate cake with a fork today PRON VB NN IN DT NN NN High-dim. sparse vector 𝜚 𝑦, 𝑧 ∈ ℝ 𝑀 … … 1 0 2 1 2 0 0 0 ⋅ Parameter vector 𝜄 ∈ ℝ 𝑀 … … 0.1 0.3 2.2 1.1 0 0.1 0.9 0 𝑇 𝜄 𝑦, 𝑧 = 𝜄, 𝜚 𝑦, 𝑧 5

Traditional Scoring Revisited • Features and templates are manually-selected concatenations of atomic features, in traditional vector-based scoring: ROOT I ate cake with a fork today PRON VB NN IN DT NN NN Arc Features: Attach Modifier Head Length? ate ⨁ cake ⨁ 2 ate cake HW_MW_LEN: Word: Yes NN VB POS: ⨁ ⨁ NN+cake VB+ate POS+Word: VB PRON Left POS: No IN NN Right POS: 6

Traditional Scoring Revisited • Features and templates are manually-selected concatenations of atomic features, in traditional vector-based scoring: ROOT I ate cake with a fork today PRON VB NN IN DT NN NN Arc Features: Attach Modifier Head Length? ate ⨁ cake ⨁ 2 ate cake HW_MW_LEN: Word: Yes ate ⨁ cake NN VB POS: HW_MW: ⨁ ⨁ NN+cake VB+ate POS+Word: VB PRON Left POS: No IN NN Right POS: 7

Traditional Scoring Revisited • Features and templates are manually-selected concatenations of atomic features, in traditional vector-based scoring: ROOT I ate cake with a fork today PRON VB NN IN DT NN NN Arc Features: Attach Modifier Head Length? ate ⨁ cake ⨁ 2 ate cake HW_MW_LEN: Word: Yes ate ⨁ cake NN VB POS: HW_MW: ⨁ ⨁ NN+cake VB+ate POS+Word: VB ⨁ NN ⨁ 2 HP_MP_LEN: VB PRON Left POS: No VB ⨁ NN HP_MP: IN NN Right POS: … … 8

Traditional Scoring Revisited • Problem: very difficult to pick the best subset of concatenations Too few templates Lose performance Too many templates Too many parameters to estimate Features are correlated Searching the best set? Choices are exponential • Our approach: use low-rank tensor (i.e. multi-way array)  Capture a whole range of feature combinations  Keep the parameter estimation problem in control 9

Low-Rank Tensor Scoring: Formulation • Formulate ALL possible concatenations as a rank-1 tensor 𝜚 ℎ,𝑛 𝜚 ℎ 𝜚 𝑛 atomic head atomic modifier atomic arc feature vector feature vector feature vector Attach Head Modifier Length? cake ate Yes VB NN VB+ate NN+cake VB PRON No IN NN 10

Low-Rank Tensor Scoring: Formulation • Formulate ALL possible concatenations as a rank-1 tensor ∈ ℝ 𝑜×𝑜×𝑒 𝜚 ℎ,𝑛 𝜚 ℎ 𝜚 𝑛 ⊗ ⊗ atomic head atomic modifier atomic arc feature vector feature vector feature vector 𝑦⨂𝑧⨂𝑨 𝑗𝑘𝑙 = 𝑦 𝑗 𝑧 𝑘 𝑨 𝑙 tensor product Each entry indicates the occurrence of one feature concatenation 11

Low-Rank Tensor Scoring: Formulation • Formulate ALL possible concatenations as a rank-1 tensor ∈ ℝ 𝑜×𝑜×𝑒 𝜚 ℎ,𝑛 𝜚 ℎ 𝜚 𝑛 ⊗ ⊗ atomic head atomic modifier atomic arc feature vector feature vector feature vector • Formulate the parameters as a tensor as well 𝜄 ∈ ℝ 𝑀 : 𝑇 𝜄 ℎ → 𝑛 = 𝜄, 𝜚 ℎ→𝑛 (vector-based) 𝐵 ∈ ℝ 𝑜×𝑜×𝑒 : 𝑇 𝑢𝑓𝑜𝑡𝑝𝑠 ℎ → 𝑛 = 𝐵, 𝜚 ℎ ⊗ 𝜚 𝑛 ⊗ 𝜚 ℎ,𝑛 (tensor-based) Involves features not in 𝜄 Can be huge. On English: 𝑜 × 𝑜 × 𝑒 ≈ 10 11 12

Low-Rank Tensor Scoring: Formulation • Formulate ALL possible concatenations as a rank-1 tensor ∈ ℝ 𝑜×𝑜×𝑒 𝜚 ℎ,𝑛 𝜚 ℎ 𝜚 𝑛 ⊗ ⊗ atomic head atomic modifier atomic arc feature vector feature vector feature vector • Formulate the parameters as a low-rank tensor 𝜄 ∈ ℝ 𝑀 : 𝑇 𝜄 ℎ → 𝑛 = 𝜄, 𝜚 ℎ→𝑛 (vector-based) 𝐵 ∈ ℝ 𝑜×𝑜×𝑒 : 𝑇 𝑢𝑓𝑜𝑡𝑝𝑠 ℎ → 𝑛 = 𝐵, 𝜚 ℎ ⊗ 𝜚 𝑛 ⊗ 𝜚 ℎ,𝑛 (tensor-based) 𝑉, 𝑊 ∈ ℝ 𝑠×𝑜 , 𝑋 ∈ ℝ 𝑠×𝑒 : 𝐵 = 𝑉 𝑗 ⨂𝑊 𝑗 ⨂𝑋(𝑗) Low-rank tensor 13 r rank-1 tensors

Low-Rank Tensor Scoring: Formulation 𝑇 𝑢𝑓𝑜𝑡𝑝𝑠 ℎ → 𝑛 = 𝐵, 𝜚 ℎ ⨂𝜚 𝑛 ⨂𝜚 ℎ,𝑛 𝐵 = 𝑉 𝑗 ⨂𝑊 𝑗 ⨂𝑋(𝑗) ⟹ 𝑠 = 𝑉𝜚 ℎ 𝑗 𝑊𝜚 𝑛 𝑗 𝑋𝜚 ℎ,𝑛 𝑗 𝑗=1 𝑠 𝑉𝜚 ℎ 𝑗 𝑊𝜚 𝑛 𝑗 𝑋𝜚 ℎ,𝑛 𝑗 ∈ ℝ 𝑠 Dense low-dim representations: 𝑗=1 = × dense dense sparse 14

Low-Rank Tensor Scoring: Formulation 𝑇 𝑢𝑓𝑜𝑡𝑝𝑠 ℎ → 𝑛 = 𝐵, 𝜚 ℎ ⨂𝜚 𝑛 ⨂𝜚 ℎ,𝑛 𝐵 = 𝑉 𝑗 ⨂𝑊 𝑗 ⨂𝑋(𝑗) ⟹ 𝑠 = 𝑉𝜚 ℎ 𝑗 𝑊𝜚 𝑛 𝑗 𝑋𝜚 ℎ,𝑛 𝑗 𝑗=1 𝑠 𝑉𝜚 ℎ 𝑗 𝑊𝜚 𝑛 𝑗 𝑋𝜚 ℎ,𝑛 𝑗 ∈ ℝ 𝑠 Dense low-dim representations: 𝑗=1 𝑠 𝑉𝜚 ℎ 𝑗 𝑊𝜚 𝑛 𝑗 𝑋𝜚 ℎ 𝑛 𝑗 Element-wise products: , 𝑗=1 𝑠 𝑉𝜚 ℎ 𝑗 𝑊𝜚 𝑛 𝑗 𝑋𝜚 ℎ 𝑛 𝑗 Sum over these products: , 𝑗=1 15

Intuition and Explanations Example: Collaborative Filtering Approximate user-ratings via low-rank “price” 𝑉 2×𝑜 : preferences “quality” ?? “price” 𝑊 2×𝑛 : properties “quality” user-rating sparse matrix A  Ratings not completely independent  Items share hidden properties (“price” and “quality”)  Users have hidden preferences over properties 16

Intuition and Explanations Example: Collaborative Filtering Approximate user-ratings via low-rank V ( 1 ) V ( r ) ≈ + ⋯ + ?? “price” “quality” U ( 1 ) U ( r ) user-rating sparse matrix A 𝐵 = 𝑉 T 𝑊 = ∑𝑉 𝑗 ⊗ 𝑊(𝑗) 𝑜 × 𝑛 𝑜 + 𝑛 𝑠 # of parameters: Intuition: Data and parameters can be approximately characterized by a small number of hidden factors 17

Intuition and Explanations Our Case: Approximate parameters (feature weights) via low-rank parameter tensor A ?? ... 2 … 4 ?? ... 2 … 4 ?? ... 2 … 4 … 0 0 … … … 0 0 … … … 0 0 … … + ⋯ + ≈ … 0 0 … … 0 0 … … 0 0 … … 1 0.9 … 5 … 1 0.9 … 5 … 1 0.9 … 5 similar values because … 0.1 0.1 … … … 0.1 0.1 … … … 0.1 0.1 … … “apple” and “banana” have similar syntactic behavior 𝐵 = ∑𝑉 𝑗 ⊗ 𝑊 𝑗 ⊗ 𝑋 𝑗  Hidden properties associated with each word  Share parameter values via the hidden properties 18

Low-Rank Tensor Scoring: Summary • Naturally captures full feature expansion (concatenations) -- Without mannually specifying a bunch of feature templates • Controlled feature expansion by low-rank (small r ) -- better feature tuning and optimization Head Atomic ate VB VB+ate • Easily add and utilize new, auxiliary features PRON NN -- Simply append them as atomic features person:I number:singular Emb[1]: -0.0128 19 Emb[2]: 0.5392

Combined Scoring • Combining traditional and tensor scoring in 𝑇 𝛿 (𝑦, 𝑧) : 𝛿 ⋅ 𝑇 𝜄 𝑦, 𝑧 + 1 − 𝛿 ⋅ 𝑇 𝑢𝑓𝑜𝑡𝑝𝑠 𝑦, 𝑧 𝛿 ∈ [0,1] Set of manual Full feature expansion selected features controlled by low-rank Similar “ sparse+low-rank ” idea for matrix decomposition: Tao and Yuan, 2011; Zhou and Tao, 2011; Waters et al., 2011; Chandrasekaran et al., 2011 • Final maximization problem given parameters 𝜄, 𝑉, 𝑊, 𝑋 : 𝑧 ∗ = argmax 𝑇 𝛿 𝑦, 𝑧; 𝜄, 𝑉, 𝑊, 𝑋 y∈𝑈(𝑦) 20

Learning Problem 𝑂 • Given training set D = 𝑦 𝑗 , 𝑧 𝑗 𝑗=1 • Search for parameter values that score the gold trees higher than others: ∀𝑧 ∈ 𝐔𝐬𝐟𝐟 𝑦 𝑗 : 𝑇 𝑦 𝑗 , 𝑧 𝑗 ≥ 𝑇 𝑦 𝑗 , 𝑧 + 𝑧 𝑗 − 𝑧 − 𝜊 𝑗 Non-negative loss • The training objective: unsatisfied constraints are penalized against 𝜊 𝑗 + 𝑉 2 + 𝑊 2 + 𝑋 2 + 𝜄 2 min 𝐷 𝜄,𝑉,𝑊,𝑋,𝜊 𝑗 ≥0 𝑗 Training loss Regularization Calculating the loss requires to solve the expensive maximization problem; Following common practices, adopt online learning framework. 21

Structures Tao Lei Yu Xin, Yuan Zhang, Regina Barzilay, Tommi - PowerPoint PPT Presentation

Low-Rank Tensors for Scoring Dependency Structures Tao Lei Yu Xin, Yuan Zhang, Regina Barzilay, Tommi Jaakkola CSAIL, MIT 1 Dependency Parsing ROOT I ate cake with a fork today PRON VB NN IN DT NN NN Dependency parsing as

Hypo contact and Sasakian SU ( 2 ) -structures in 5-dimensions structures on Lie groups Sasakian

Contact manifolds and SU ( 2 ) -structures in 5-dimensions SU ( n ) -structures Sasaki-Einstein

Targeting Text Structures to Improve Reading What are Text Structures? Text Structures are

Structures Research Stephen Hallett Theme 2 The Structures Academic Team Stephen Alberto

NRA Structures Standards Update Fergal Cahill (NRA) Project Manager April 2015 Structures

Structures Research Theme Stephen Hallett 2 The Structures Academic Team Stephen Hallett

Nondestructive Evaluation of Historic Hakka Rammed Earth Structures Hakka Rammed Earth Structures

Data Structures 1 / 27 Built-in Data Structures Values can be collected in data structures:

CS 310 - Advanced Data Structures and Algorithms Basic Data Structures May 31, 2018 Mohammad

Data Structures Data Structures Lists Trees Trees Graphs CSE 680 Review basic

Rapid Strength Concrete for Transportation Structures and Pavements for Transportation Structures

Synchronizing Data Structures 1 / 78 Synchronizing Data Structures Overview caches and

Geometric structures on the Figure Eight Structures Martin Deraux Knot Complement The figure

Research at the Ocean Structures Nucleus Research at the Ocean Structures Nucleus Murilo A.

Control Structures CS2253, Owen Kaser Control Structures Implementing familiar HLL control

+ Decision Structures & Boolean Logic CSCI-UA.002 + Sequence Structures n What we have been

Incremental Randomized Sketching for Online Kernel Learning Shizhong Liao Xiao Zhang College

FGAN FGAN FGAN FGAN Content 2 1. An overview of BML-related research in Germany 2. Grammar

Optionality in Verb-Cluster Formation Markus Bader, Tanja Schmid & Jana H aussler

Weighting the Contraints on Word-order Variation in German Markus Bader & Jana H aussler

Last Time Foundations of Computer Science Lecture 6 Strong Induction Strengthening the

5: Old English Adjectives Sightreading lc Cristen man sceal cunnan his paternoster and his

ARTIFICIAL INTELLIGENCE Edge Matching Puzzle Anirudha Sahu Gangaprasad Koturwar Mentor- Prof.

CSSS 569 Visualizing Data and Models Lab 1: Intro to labs, R, and R Markdown Kai Ping (Brian)

Structures Tao Lei Yu Xin, Yuan Zhang, Regina Barzilay, Tommi - PowerPoint PPT Presentation

Low-Rank Tensors for Scoring Dependency Structures Tao Lei Yu Xin, Yuan Zhang, Regina Barzilay, Tommi Jaakkola CSAIL, MIT 1 Dependency Parsing ROOT I ate cake with a fork today PRON VB NN IN DT NN NN Dependency parsing as

Hypo contact and Sasakian SU ( 2 ) -structures in 5-dimensions structures on Lie groups Sasakian

Contact manifolds and SU ( 2 ) -structures in 5-dimensions SU ( n ) -structures Sasaki-Einstein

Targeting Text Structures to Improve Reading What are Text Structures? Text Structures are

Structures Research Stephen Hallett Theme 2 The Structures Academic Team Stephen Alberto

NRA Structures Standards Update Fergal Cahill (NRA) Project Manager April 2015 Structures

Structures Research Theme Stephen Hallett 2 The Structures Academic Team Stephen Hallett

Nondestructive Evaluation of Historic Hakka Rammed Earth Structures Hakka Rammed Earth Structures

Data Structures 1 / 27 Built-in Data Structures Values can be collected in data structures:

CS 310 - Advanced Data Structures and Algorithms Basic Data Structures May 31, 2018 Mohammad

Data Structures Data Structures Lists Trees Trees Graphs CSE 680 Review basic

Rapid Strength Concrete for Transportation Structures and Pavements for Transportation Structures

Synchronizing Data Structures 1 / 78 Synchronizing Data Structures Overview caches and

Geometric structures on the Figure Eight Structures Martin Deraux Knot Complement The figure

Research at the Ocean Structures Nucleus Research at the Ocean Structures Nucleus Murilo A.

Control Structures CS2253, Owen Kaser Control Structures Implementing familiar HLL control

+ Decision Structures &amp; Boolean Logic CSCI-UA.002 + Sequence Structures n What we have been

Incremental Randomized Sketching for Online Kernel Learning Shizhong Liao Xiao Zhang College

FGAN FGAN FGAN FGAN Content 2 1. An overview of BML-related research in Germany 2. Grammar

Optionality in Verb-Cluster Formation Markus Bader, Tanja Schmid &amp; Jana H aussler

Weighting the Contraints on Word-order Variation in German Markus Bader &amp; Jana H aussler

Last Time Foundations of Computer Science Lecture 6 Strong Induction Strengthening the

5: Old English Adjectives Sightreading lc Cristen man sceal cunnan his paternoster and his

ARTIFICIAL INTELLIGENCE Edge Matching Puzzle Anirudha Sahu Gangaprasad Koturwar Mentor- Prof.

CSSS 569 Visualizing Data and Models Lab 1: Intro to labs, R, and R Markdown Kai Ping (Brian)

+ Decision Structures & Boolean Logic CSCI-UA.002 + Sequence Structures n What we have been

Optionality in Verb-Cluster Formation Markus Bader, Tanja Schmid & Jana H aussler

Weighting the Contraints on Word-order Variation in German Markus Bader & Jana H aussler