Grounded Semantic Parsing of Claims and Questions Pascual - PowerPoint PPT Presentation

Grounded Semantic Parsing of Claims and Questions Pascual Martínez-Gómez Artifjcial Intelligence Research Center, AIST Tokyo, Japan January 22, 2018 1 / 16

Objective Convert a claim/question into a SPARQL query. Angelina Jolie’s net worth is above 1.5 million USD. 2 / 16 ↓ ASK WHERE { dbr : AgenlinaJolie dbp : NetWorth ? x . FILTER (? x > 1500000) . }

Requirements: 1 Able to process claims and questions. 2 Must be independent to the language. 3 Easily extensible to difgerent KBs and data stores. 4 Interpretable: Journalists may interact or inspect the process. 3 / 16 • Possibly extend to paragraphs. • Most pressing: English, Japanese (FIJ) and French (?).

Approach Modular pipeline (interpretable). 1 Identify mentions (e.g. Agenlina Jolie , net worth ). 2 Map mentions to KB nodes and relations. 3 Induce a grammar that describes the space of SPARQL queries. 4 Generate SPARQL queries in order of plausibility (with scores). 5 Execute (and evaluate). 4 / 16

Identify mentions Depending on the depth of linguistic annotation: 1 Character sequence (e.g. A, n, g, e, l, i, n, a, , J, o,... ). 2 Part of Speech tags (e.g. Angelina:NNP , net:ADJ , ...). 3 Syntax (e.g. “Angelina Jolie”:NP , ...) 5 / 16 4 Semantics (e.g. ∃ x : Person , angelina ( x ) ∧ jolie ( x ) ).

Identify mentions 6 / 16 • Semantics.

• Semantics for net worth is above 1.5M USD not accurate! • Errors tend to accumulate. Explore the use of less annotations. Identify mentions • z above z 1 5 z million z usd z . 7 / 16 • Semantics (e.g. ∃ x : Person , angelina ( x ) ∧ jolie ( x ) ...).

• Errors tend to accumulate. Explore the use of less annotations. Identify mentions 7 / 16 • Semantics (e.g. ∃ x : Person , angelina ( x ) ∧ jolie ( x ) ...). • Semantics for net worth is above 1.5M USD not accurate! • ∃ z . above ( z ) ∧ 1 . 5 ( z ) ∧ million ( z ) ∧ usd ( z ) .

Identify mentions 7 / 16 • Semantics (e.g. ∃ x : Person , angelina ( x ) ∧ jolie ( x ) ...). • Semantics for net worth is above 1.5M USD not accurate! • ∃ z . above ( z ) ∧ 1 . 5 ( z ) ∧ million ( z ) ∧ usd ( z ) . • Errors tend to accumulate. Explore the use of less annotations.

Identify mentions VBZ above , 1.5 million USD . Angelina NNP Jolie NNP ’s POS NP net JJ worth NN NP NP is above IN ROOT S . . VP PP NP NNP USD QP CD million CD 1.5 8 / 16 • Syntax • Mentions: Angelina Jolie’s , net worth , Angelina Jolie’s net worth , • It overgenerates mentions. • But it is simple and may have good coverage.

• Learn a regressor f • It generalizes easily to other KBs (“simply” retrain!). • Make f robust against spelling variations. • Flexibility and adaptability. Map mentions to KB nodes and relations. Problems with traditional IR approaches: Proposed solution: . • are labels of entities, relations or types. • are mentions (as identifjed earlier). 9 / 16 • Symbolic nature: tf-idf sensitive to lexical variations. • KB textual information is quite short (i.e. labels, aliases, names). • Scoring functions are adhoc.

Map mentions to KB nodes and relations. Problems with traditional IR approaches: Proposed solution: 9 / 16 • Symbolic nature: tf-idf sensitive to lexical variations. • KB textual information is quite short (i.e. labels, aliases, names). • Scoring functions are adhoc. • Learn a regressor f θ : L × M → R . • L are labels of entities, relations or types. • M are mentions (as identifjed earlier). • It generalizes easily to other KBs (“simply” retrain!). • Make f θ robust against spelling variations. • Flexibility and adaptability.

D with Enc • Encoding parameters • At the moment, I use only cosine l m • This is an autoencoder with noise contrastive estimation. • Uses positive and negative examples. Map mentions to KB nodes and relations. Enc m . 5 Estimation: arg max cosine Enc l cosine Enc Enc l cosine Enc l Enc l l k l arg max 2 Enc. mention m into a vector m D . and might be equal. 3 Use vector similarity function between l and m . l m l m . 4 Linking results are: Approach: metric learning. 10 / 16 1 Encode a label l ∈ L into a vector ⃗ l ∈ R D with Enc θ ′ : L → R D .

• At the moment, I use only cosine l m • This is an autoencoder with noise contrastive estimation. • Uses positive and negative examples. Map mentions to KB nodes and relations. l l Enc l cosine Enc l Enc l arg max cosine Enc 5 Estimation: m . Enc l cosine Enc Approach: metric learning. arg max k 4 Linking results are: . m l l m 3 Use vector similarity function between l and m . 10 / 16 1 Encode a label l ∈ L into a vector ⃗ l ∈ R D with Enc θ ′ : L → R D . 2 Enc. mention m ∈ M into a vector ⃗ m ∈ R D with Enc θ ′′ : M → R D . • Encoding parameters θ ′ and θ ′′ might be equal.

• This is an autoencoder with noise contrastive estimation. • Uses positive and negative examples. Map mentions to KB nodes and relations. 5 Estimation: arg max l cosine Enc l Enc m . l arg max cosine Enc 4 Linking results are: Enc l cosine Enc l Enc l k 10 / 16 Approach: metric learning. m m . 1 Encode a label l ∈ L into a vector ⃗ l ∈ R D with Enc θ ′ : L → R D . 2 Enc. mention m ∈ M into a vector ⃗ m ∈ R D with Enc θ ′′ : M → R D . • Encoding parameters θ ′ and θ ′′ might be equal. 3 Use vector similarity function between ⃗ l and ⃗ ⃗ ⃗ l ⊺ ⃗ • At the moment, I use only cosine ( l , ⃗ m ) = ⃗ || l || 2 ·|| ⃗ m || 2 .

• This is an autoencoder with noise contrastive estimation. • Uses positive and negative examples. Map mentions to KB nodes and relations. l k arg max l 5 Estimation: arg max cosine Enc l Enc Approach: metric learning. cosine Enc l Enc l 4 Linking results are: 10 / 16 m m . 1 Encode a label l ∈ L into a vector ⃗ l ∈ R D with Enc θ ′ : L → R D . 2 Enc. mention m ∈ M into a vector ⃗ m ∈ R D with Enc θ ′′ : M → R D . • Encoding parameters θ ′ and θ ′′ might be equal. 3 Use vector similarity function between ⃗ l and ⃗ ⃗ ⃗ l ⊺ ⃗ • At the moment, I use only cosine ( l , ⃗ m ) = ⃗ || l || 2 ·|| ⃗ m || 2 . cosine ( Enc θ ′ ( l ) , Enc θ ′′ ( m )) .

Map mentions to KB nodes and relations. Approach: metric learning. arg max 5 Estimation: l arg max k 4 Linking results are: m m . 10 / 16 1 Encode a label l ∈ L into a vector ⃗ l ∈ R D with Enc θ ′ : L → R D . 2 Enc. mention m ∈ M into a vector ⃗ m ∈ R D with Enc θ ′′ : M → R D . • Encoding parameters θ ′ and θ ′′ might be equal. 3 Use vector similarity function between ⃗ l and ⃗ ⃗ ⃗ l ⊺ ⃗ • At the moment, I use only cosine ( l , ⃗ m ) = ⃗ || l || 2 ·|| ⃗ m || 2 . cosine ( Enc θ ′ ( l ) , Enc θ ′′ ( m )) . cosine ( Enc θ ′ ( l 1 ) , Enc θ ′′ ( l 1 )) − cosine ( Enc θ ′ ( l 1 ) , Enc θ ′′ ( l 2 )) θ ′ ,θ ′′ • This is an autoencoder with noise contrastive estimation. • Uses positive and negative examples.

Map mentions to KB nodes and relations. Examples (see gsemparse github repo): Example 1: no misspellings: 1 Mention: angelina jolie . 2 Labels: Angelina jolie, Angelina Jolie Trapdoor Spider, Angelina Jolie Voight, Angelina Jolie cancer treatment, Angelina Jolie Pitt, Angelina Joli, Angelina Jolie Filmography, Anjelina Jolie. Example 2: misspellings (2 char substitutions, 1 char deletion): 1 Mention: angeline yoli 2 Labels: Angeline Jolie, Uncle Willie, Parmelia (lichen), Uriele Vitolo, Ding Lieyun, Earl of Loudon, Angeline Myra Keen, Angel Negro, Angeline Malik, Angeline of Marsciano. 11 / 16

Map mentions to KB nodes and relations. Show the CNN over characters (on a whiteboard). 12 / 16

Map mentions to KB nodes and relations. Resources for KB: http://wiki.dbpedia.org/downloads-2016-10 1 Infobox relations. 2 Infobox property defjnitions. 3 Ontology (types/classes and relations). 4 Labels (NIF) 5 Contexts (NIF) Resources for questions: Hobbit data - Scalable QA challenge. http://hobbitdata.informatik.uni-leipzig.de/SQAOC 1 Many questions with annotations on mentions and SPARQL queries. 2 Hopefully easy to evaluate. 13 / 16

Grounded Semantic Parsing of Claims and Questions Pascual - PowerPoint PPT Presentation

Grounded Semantic Parsing of Claims and Questions Pascual Martnez-Gmez Artifjcial Intelligence Research Center, AIST Tokyo, Japan January 22, 2018 1 / 16 Objective Convert a claim/question into a SPARQL query. Angelina Jolies net worth

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations 1

Semantics for Semantic Parsing Mark Steedman ( with Mike Lewis, Siva Reddy, and Mirella Lapata) 26

Type-driven Incremental Semantic Parsing with Polymorphism Kai Zhao and Liang Huang City

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Response-based Learning for Grounded Grounded SMT Riezler, Machine Translation Simianer, Haas

Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu Grounded Linguistic istic

Claims & Underwriting Claims Accuracy: Claims Accuracy: Striking a balance between accurate

Claims 1. Common law 2. Ex gratia 3. Contractual 1. Common law claims 2. Ex gratia claims

Sequence-to-Action: End-to-End Semantic Graph Generation for Semantic Parsing Bo Chen , Le Sun,

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

Outline Introduction Definition History Features When should Grounded Theory be used? Types

TAKE TAKE GROUNDED GROUNDED DECISIONS DECISIONS Farm Modelling Statistic based, gamification

Semantic Parsing via Paraphrasing Mateusz Malinowski Based on: J. Berant and P. Liang

One Year Solving Infrastructure Management with FusionDirectory and OpenLDAP This work is

Prepared by ClubIntel April 19, 2015 Index Study Methodology and Sample Plan 3 Overarching

SSRS to PowerBI Monica Jones TRFT Presentation to Apha Workshop Associate Director of

GPU Computing: A VFX Plugin Developer's Perspective Stephen Bash, GenArts Inc. GPU Technology

Literacy with Mrs Booth Is Is there a place for English in 2019? Humans beat machines in the

OF BUILT URBAN ENVIRONMENTS Geography 4771 Oct 25, 2017 Presentation by Dr. T. Randall Assoc.

NEW ZEALAND AT ITS PEAK www.wanaka.co.nz 6:55am Fox Glacier Milford Sound LOCATION

Econom omis ist By: Anirban an Basu Sage e Policy cy Group, Inc . On Behalf of Direct

Sambuz

Useful Links

Newsletter

Mail Us

Grounded Semantic Parsing of Claims and Questions Pascual - PowerPoint PPT Presentation

Grounded Semantic Parsing of Claims and Questions Pascual Martnez-Gmez Artifjcial Intelligence Research Center, AIST Tokyo, Japan January 22, 2018 1 / 16 Objective Convert a claim/question into a SPARQL query. Angelina Jolies net worth

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations 1

Semantics for Semantic Parsing Mark Steedman ( with Mike Lewis, Siva Reddy, and Mirella Lapata) 26

Type-driven Incremental Semantic Parsing with Polymorphism Kai Zhao and Liang Huang City

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Response-based Learning for Grounded Grounded SMT Riezler, Machine Translation Simianer, Haas

Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu Grounded Linguistic istic

Claims &amp; Underwriting Claims Accuracy: Claims Accuracy: Striking a balance between accurate

Claims 1. Common law 2. Ex gratia 3. Contractual 1. Common law claims 2. Ex gratia claims

Sequence-to-Action: End-to-End Semantic Graph Generation for Semantic Parsing Bo Chen , Le Sun,

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --&gt; Scene Parsing Scene

Outline Introduction Definition History Features When should Grounded Theory be used? Types

TAKE TAKE GROUNDED GROUNDED DECISIONS DECISIONS Farm Modelling Statistic based, gamification

Semantic Parsing via Paraphrasing Mateusz Malinowski Based on: J. Berant and P. Liang

One Year Solving Infrastructure Management with FusionDirectory and OpenLDAP This work is

Prepared by ClubIntel April 19, 2015 Index Study Methodology and Sample Plan 3 Overarching

SSRS to PowerBI Monica Jones TRFT Presentation to Apha Workshop Associate Director of

GPU Computing: A VFX Plugin Developer's Perspective Stephen Bash, GenArts Inc. GPU Technology

Literacy with Mrs Booth Is Is there a place for English in 2019? Humans beat machines in the

OF BUILT URBAN ENVIRONMENTS Geography 4771 Oct 25, 2017 Presentation by Dr. T. Randall Assoc.

NEW ZEALAND AT ITS PEAK www.wanaka.co.nz 6:55am Fox Glacier Milford Sound LOCATION

Econom omis ist By: Anirban an Basu Sage e Policy cy Group, Inc . On Behalf of Direct

Sambuz

Useful Links

Newsletter

Mail Us

Claims & Underwriting Claims Accuracy: Claims Accuracy: Striking a balance between accurate

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene