Interpretable and Compositional Relation Learning by Joint Training - PowerPoint PPT Presentation

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder Ryo Takahashi* 1 Ran Tian* 1 Kentaro Inui 1,2 (* equal contribution) 1 Tohoku University 2 RIKEN, Japan

Task: Knowledge Base Completion • Knowledge Bases (KBs) store a large amount of facts in the form of <head entity, relation, tail entity> triples: tail entity relation head entity Australia The Matrix country_of_film United States • The Knowledge Base Completion (KBC) task aims to predict missing parts of an incomplete triple: Finding Nemo country_of_film ? • Help discover missing facts in a KB July 18, 2018 2

マスタータイトルの書式設定 Vector Based Approach A common approach to KBC is to model triples with a low dimension vector space, where Entity : represented by a Relation : represented as low dimension vector (so transformation of the vector that similar entities are space, which can be: close to each other) Vector Translation • Linear map • Australia Non-linear map • US Up to design choice Finding Nemo The Matrix July 18, 2018 3

マスタータイトルの書式設定 2 Popular Types of Representations for Relation TransE [Bordes+’13] Bilinear [Nickel+’11] Relation as vector translation Relation as linear • • transformation (matrix) 𝒗 ℎ 𝒔 𝒘 𝑢 ⊤ 𝒗 ℎ 𝑵 𝑠 𝒘 𝑢 + ≈ 𝑒 𝑒 𝑒 ・・ 𝑒 2 𝑒 𝑒 Intuitively suitable for 1-to-1 • relation Flexibly modeling N-to-N • relation currency AUD Australia of_country country_of_film USD US Australia The Matrix US Finding Nemo same number of entities same distances within We follow July 18, 2018 4

Matrices are Difficult to Train • More parameters compared to entity vector entity relation vector matrix High dimension 𝑒 vs. 𝑒 2 Low dimension • Objective is highly non-convex ⊤ 𝒗 ℎ 𝑵 𝑠 𝒘 𝑢 ・・ 𝑒 2 𝑒 𝑒 July 18, 2018 5

In this work: ① Propose jointly training relation matrices with an autoencoder: • In order to reduce the high dimensionality ② Modified SGD with separated learning rates: • In order to handle the highly non-convex training objective ③ Use modified SGD to enhance joint training with autoencoder ④ Other techniques for training relation matrices Achieve SOTA on standard KBC datasets July 18, 2018 6

TRAINING TECHNIQUES July 18, 2018 7

① Joint Training with an Autoencoder マスタータイトルの書式設定 Base Model Proposed Represent relations as matrices in Train an autoencoder to a bilinear model , can be reconstruct relation matrix from extended with compositional low dimension coding training [Nickel+’11, Guu+’15, Tian+’16] reconstructed original ⊤ ′ 𝒗 ℎ 𝑵 𝑠 1 𝑵 𝑠 2 𝒘 𝑢 𝒘 𝑢 𝑵 𝑠 𝑵 𝑠 ・・・ 𝑒 2 𝑒 2 𝑒 2 𝑒 2 𝑒 2 𝑒 2 𝑒 𝑒 𝑑 𝑒 Train jointly Different from usual autoencoders in which the Finding original input is not updated 1. Reduce the high dimensionality of relation matrices 2. Help learn composition of relations July 18, 2018 8

① Joint Training with an Autoencoder マスタータイトルの書式設定 Base Model Proposed Represent relations as matrices in Train an autoencoder to a bilinear model , can be reconstruct relation matrix from extended with compositional low dimension coding training [Nickel+’11, Guu+’15, Tian+’16] reconstructed original ⊤ ′ 𝒗 ℎ 𝑵 𝑠 1 𝑵 𝑠 2 𝒘 𝑢 𝒘 𝑢 𝑵 𝑠 𝑵 𝑠 ・・・ 𝑒 2 𝑒 2 𝑒 2 𝑒 2 𝑒 2 𝑒 2 𝑒 𝑒 𝑑 𝑒 Train jointly Not easy to carry out Training objective is highly non-convex → Easily fall into local minimums July 18, 2018 9

② Modified SGD (Separated Learning Rates) マスタータイトルの書式設定 Our strategy Different learning rates for different parts of our model Previous Modified The common practice for setting Different parts in a neural network learning rates of SGD [Bottou, 2012] : may have different learning rates 𝜃 KB 𝛽 KB 𝜐 𝑠 ≔ 𝜃 1 + 𝜃 KB 𝜇 KB 𝜐 𝑠 𝛽 𝜐 ≔ 1 + 𝜃𝜇𝜐 𝜃 AE 𝛽 AE 𝜐 𝑠 ≔ 1 + 𝜃 AE 𝜇 AE 𝜐 𝑠 𝜃 : initial learning rate 𝜃 KB : 𝜃 for KB-learning objective 𝜃 AE : 𝜃 for autoencoder objective 𝜇 : coefficient of L2-regularizer 𝜇 KB : 𝜇 for KB-learning objective 𝜇 AE : 𝜇 for autoencoder objective 𝜐 : counter of trained examples 𝜐 𝑓 : counter of each entity 𝑓 𝜐 𝑠 : counter of each relation 𝑠 July 18, 2018 10

② Modified SGD (Separated Learning Rates) マスタータイトルの書式設定 Our strategy Different learning rates for different parts of our model Previous Modified The common practice for setting Different parts in a neural network learning rates of SGD [Bottou, 2012] : may have different learning rates 𝜃 KB 𝛽 KB 𝜐 𝑠 ≔ 𝜃 1 + 𝜃 KB 𝜇 KB 𝜐 𝑠 𝛽 𝜐 ≔ 1 + 𝜃𝜇𝜐 𝜃 AE 𝛽 AE 𝜐 𝑠 ≔ 1 + 𝜃 AE 𝜇 AE 𝜐 𝑠 𝜃 : initial learning rate 𝜃 KB : 𝜃 for KB-learning objective 𝜃 AE : 𝜃 for autoencoder objective 𝜇 : coefficient of L2-regularizer 𝜇 KB : 𝜇 for KB-learning objective 𝜇 AE : 𝜇 for autoencoder objective Learning rates for frequent entities 𝜐 𝑓 : counter of each entity 𝑓 and relations can decay more quickly 𝜐 𝑠 : counter of each relation 𝑠 July 18, 2018 11

② Modified SGD (Separated Learning Rates) マスタータイトルの書式設定 Our strategy Different learning rates for different parts of our model Modified Different parts in a neural network Rationale may have different learning rates NN usually can be decomposed 𝜃 KB into several parts, each one is 𝛽 KB 𝜐 𝑠 ≔ 1 + 𝜃 KB 𝜇 KB 𝜐 𝑠 convex when other parts are fixed 𝜃 AE ↓ 𝛽 AE 𝜐 𝑠 ≔ 1 + 𝜃 AE 𝜇 AE 𝜐 𝑠 NN ≈ joint co-training of many 𝜃 KB : 𝜃 for KB-learning objective simple convex models 𝜃 AE : 𝜃 for autoencoder objective ↓ 𝜇 KB : 𝜇 for KB-learning objective Natural to assume different 𝜇 AE : 𝜇 for autoencoder objective learning rate for each part 𝜐 𝑓 : counter of each entity 𝑓 𝜐 𝑠 : counter of each relation 𝑠 July 18, 2018 12

③ Learning Rates for Joint Training Autoencoder KB objective trying 𝜃 KB 𝛽 KB 𝜐 𝑠 ≔ to predict entities 1 + 𝜃 KB 𝜇 KB 𝜐 𝑠 Autoencoder (AE) 𝜃 AE objective trying to fit to 𝛽 AE 𝜐 𝑠 ≔ 1 + 𝜃 AE 𝜇 AE 𝜐 𝑠 low dimension coding Beginning of training 𝛽(𝜐 𝑠 ) AE is initialized randomly • Does not make much sense As the training proceeds • 𝜃 KB to fit matrices to AE 𝛽 KB and 𝛽 AE should • balance 𝜃 AE 1/(𝜇 AE 𝜐 𝑠 ) 1/(𝜇 KB 𝜐 𝑠 ) 𝜐 𝑠 0 July 18, 2018 13

④ Other Training Techniques +2.6 Normalization in Hits@10 normalize relation on FB15k-237 matrices to 𝑵 𝑠 = 𝑒 𝑵 𝑠 = 𝑒 during training +1.2 in Hits@10 Regularization push 𝑵 𝑠 toward an 1 Minimize 𝑵 𝑠 ⊤ 𝑵 𝑠 − ⊤ 𝑵 𝑠 𝐽 𝑒 tr 𝑵 𝑠 orthogonal matrix +0.4 in Hits@10 Initialization 𝑵 𝑠 𝑵 𝑠 initialize 𝑵 𝑠 as (𝐽 + 𝐻)/2 instead of pure Gaussian July 18, 2018 14

EXPERIMENTS July 18, 2018 15

Datasets for Knowledge Base Completion Dataset #Entity #Relation #Train #Valid #Test WN18RR 40,943 11 86,835 3,034 3,134 [Dettmers+’18] FB15k-237 [ Toutanova&Chen’15] 14,541 237 272,115 17,535 20,466 • WN18RR : subset of WordNet [Miller ’95] • FB15k-237 : subset of Freebase [Bollacker+’08] • The previous WN18 and FB15k have an information leakage issue (refer our paper for test results) • Evaluate models by how high the model ranks the gold test triples. July 18, 2018 16

Base Model vs. Joint Training with Autoencoder Model WN18RR FB15k-237 MR MRR H10 MR MRR H10 BASE 2447 .310 54.1 203 .328 51.5 JOINT with AE 2268 .343 54.8 197 .331 51.6 Models: Metrics: BASE : The bilinear model MR (Mean Rank): • • [Nickel+’11] lower is better Proposed JOINT Training : MRR (Mean Reciprocal Rank): • • Jointly train relation matrices higher is better with an autoencoder H10 (Hits at 10): • higher is better Joint training with an autoencoder improves upon the base bilinear model July 18, 2018 17

Compared to Previous Research Model WN18RR FB15k-237 • Normalization MR MRR H10 MR MRR H10 • Regularization Ours • Initialization BASE 2447 .310 54.1 203 .328 51.5 JOINT with AE 2268 .343 54.8 197 .331 51.6 Re-experiments TransE [Bordes+’13] 4311 .202 45.6 278 .236 41.6 RESCAL [Nickel+’11] 9689 .105 20.3 457 .178 31.9 HolE [Nickel+’16] 8096 .376 40.0 1172 .169 30.9 Published results ComplEx [Trouillon+’16] 5261 .440 51.0 339 .247 42.8 ConvE [Dettmers+’18] 5277 .460 48.0 246 .316 49.1 • Base model is competitive enough • Our models achieved state-of-the-art results July 18, 2018 18

Interpretable and Compositional Relation Learning by Joint Training - PowerPoint PPT Presentation

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder Ryo Takahashi* 1 Ran Tian* 1 Kentaro Inui 1,2 (* equal contribution) 1 Tohoku University 2 RIKEN, Japan Task: Knowledge Base Completion Knowledge Bases

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Bruno Gavranovi c SYCO2 Compositional Deep Learning December 18, 2018 1 / 36 Compositional

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Unusual compositional dependence of the Unusual compositional dependence of the exciton reduced

Compositional Analysis of Compositional Analysis of Soluble Salts in Bresle Bresle Extraction

A Compositional Logic A Compositional Logic for Control Flow for Control Flow Gang Tan, Boston

Relation between things vs. a relation between people Lenin: Where the bourgeois economists

Part I: Soil Mechanics Volume-Volume relation Mass-Mass relation Mass-Volume relation

Relation Schema Given domains D 1 , D 2 , . D n a relation r is a subset of D 1 x D 2 x

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

Two-level Authoring of Computer- Interpretable Guidelines David Buenestado, Juan M. Pikatza, Unai

Learning Compositional Semantics for Introduction Open Domain Semantic Parsing Meaning

October 6, 2016 Began in 1941 as Central Utah Vocational School Became an official

Matrix Groups GAP examples Matrix groups in GAP Schreier-Sims Max Neunhffer Problems Group

in CS ADAPTING AN OPEN-SOURCE WEB- BASED ASSESSMENT SYSTEM FOR THE AUTOMATED ASSESSMENT OF

Distributed algorithms for edge dominating sets Jukka Suomela Helsinki Institute for Information

TRANSVERSE FLOW DURING IMPREGNATION OF FABRICS WITH THERMOPLASTIC MATRICES R. Gennaro, M.

THE NATURE OF RANDOM SYSTEM MATRICES IN STRUCTURAL DYNAMICS S. Adhikari Department of

Improvement to Hessenberg Reduction Shankar, Yang, Hao MA 5629 Numerical

matrix Police Services Analysis Phase 1 Draft Report Kensington Police Protection and

Interpretable and Compositional Relation Learning by Joint Training - PowerPoint PPT Presentation

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder Ryo Takahashi* 1 Ran Tian* 1 Kentaro Inui 1,2 (* equal contribution) 1 Tohoku University 2 RIKEN, Japan Task: Knowledge Base Completion Knowledge Bases

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Bruno Gavranovi c SYCO2 Compositional Deep Learning December 18, 2018 1 / 36 Compositional

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan&gt; Shrikumar, Peyton

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Unusual compositional dependence of the Unusual compositional dependence of the exciton reduced

Compositional Analysis of Compositional Analysis of Soluble Salts in Bresle Bresle Extraction

A Compositional Logic A Compositional Logic for Control Flow for Control Flow Gang Tan, Boston

Relation between things vs. a relation between people Lenin: Where the bourgeois economists

Part I: Soil Mechanics Volume-Volume relation Mass-Mass relation Mass-Volume relation

Relation Schema Given domains D 1 , D 2 , . D n a relation r is a subset of D 1 x D 2 x

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

Two-level Authoring of Computer- Interpretable Guidelines David Buenestado, Juan M. Pikatza, Unai

Learning Compositional Semantics for Introduction Open Domain Semantic Parsing Meaning

October 6, 2016 Began in 1941 as Central Utah Vocational School Became an official

Matrix Groups GAP examples Matrix groups in GAP Schreier-Sims Max Neunhffer Problems Group

in CS ADAPTING AN OPEN-SOURCE WEB- BASED ASSESSMENT SYSTEM FOR THE AUTOMATED ASSESSMENT OF

Distributed algorithms for edge dominating sets Jukka Suomela Helsinki Institute for Information

TRANSVERSE FLOW DURING IMPREGNATION OF FABRICS WITH THERMOPLASTIC MATRICES R. Gennaro, M.

THE NATURE OF RANDOM SYSTEM MATRICES IN STRUCTURAL DYNAMICS S. Adhikari Department of

Improvement to Hessenberg Reduction Shankar, Yang, Hao MA 5629 Numerical

matrix Police Services Analysis Phase 1 Draft Report Kensington Police Protection and

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton