Exploring Multi-level Distributional Semantics for Cross-lingual - PowerPoint PPT Presentation

Exploring Multi-level Distributional Semantics for Cross-lingual Entity Discovery and Linking Boliang Zhang, Xiaoman Pan, Lifu Huang, Ying Lin, Heng Ji jih@rpi.edu �

Noisy Training Data Acquisition 1: Chinese Room 2

Noisy Training Data Acquisition 1: Chinese Room 3

Noisy Training Data Acquisition 2: Wikipedia Mining § Generate “silver-standard” training data automa4cally § Apply self-training to make training data for complete and consistent 4

Exploit Non-traditional Universal Linguistic Resources • Grammar books from Lori Levin’s bookshelf and CIA Names from DARPA PM’s bookshelf Unicode Common Locale Data Repository, Wiki4onary, Panlex, Mul4lingual WordNet, • GeoNames, JRC Names, phrase pairs mined � from Wikipedia Phrase Books from Language Survival Kits and � • Elicita4on Corpus Ignored by NLP community • 5

Linguistic Structure from WALS database and Syntactic Structures of the World's Languages� WALS and SSWL • Universal Morphology Analyzer based on Wikipedia Markups o Kıta Fransası, güneyde [[Akdeniz]] den kuzeyde [[Manş Denizi]] ve [[Kuzey Denizi]] ne , doğuda [[Ren Nehri]] nden ba@da [[Atlas Okyanusu]] na kadar yayılan topraklarda yer alır. (ConGnental France is located in the south [[Mediterranean Sea]] in the north [[English Sea]] and [[North Sea]] in the east [[Rhine River]] to the west [[AtlanGc Ocean]].) 6

Character-Aware Word Embeddings Mo4va4on: men4ons of the same concept across languages may share a set of • similar characters, e.g., SemseSn Gunaltay (English) = ŞemseSn Günaltay (Turkish) = Semse4n Ganoltey (Somali) Compose word embeddings from shared character embeddings using • Convolu4onal Neural networks Further op4mized by language model based on Recurrent Neural Networks • maximize the predic4on of the current word based on previous words § 7

Feed Non-traditional Linguistic Resources into DNN B/I/O CRF networks Hidden Layer LSTMs Left Right Left Right Hidden Layer LSTMs LSTMs LSTMs LSTMs Input Word Linguistic Feature Embedding Embedding Word Embedding Linguistic Features Left Right CNN - English and Low-resource Language LSTMs LSTMs Patterns - Low-resource Language to English Lexicons - Gazetteers Character Character - Low-resource Language Grammar Rules Embedding Embedding 8

Common Semantic Space Construction 9

Construct a Common Semantic Space for Thousands of Languages § Mo4va4ons § There are 3000+ languages with electronic record § NLP training data only available for several dominant languages § Goals § Build a common seman4c space across thousands of languages for resource sharing and richer seman4c con4nuous representa4on for words, concepts and en44es § Limita4ons of Previous A_empts (e.g., Upadhyay et al., 2016, Cho et al., 2017) § Mostly English-anchored, cannot capture all linguis4c phenomena § Heavily relied on bilingual dic4onaries and parallel data which are not always available § Only limited to dozens of languages 10

Multi-Level Multi-lingual Alignment • When bilingual word dic4onaries are not available, back-off to shared linguis4c structures e.g., apposi4on, conjunc4on, plural suffix (English (-s / - es), Turkish (- § lar / -ler), Somali (-o)) Generalized from language universal resources such as WALS database § and SyntacGc Structures of the World's Languages Classify languages according to a large number of topological proper4es § (phonological, lexical, gramma4cal) 2,676 languages, 58,000+ (language, feature, feature value) tuples, e.g., § (English, canonical word order, SVO) • Project monolingual word embeddings into a common seman4c space, and align both representa4ons of words and linguis4c- structures in the common space 11

Model Training • Model training o Language model predic4on loss o Mul4lingual alignment loss: o Overall loss: 12

Linguistic Features MaNer:� More Robust to Noise Uzbek (Zhang et al., 2017) 13

Impact of Character-Aware Word Embeddings Name Tagging F-Score (%) § Models Chinese English Spanish Before 64.1 67.4 64.6 Aoer 68.0 70.9 68.9 14

Impact of Common Semantic Space • Chechen Name Tagging Models P (%) R (%) F (%) Randomly ini4alized 46.3 45.31 45.8 Pre-trained 54.8 41.3 47.1 + Common seman4c 62.1 50.1 55.4 space word embedding 15

Something Old: Hierarchical Brown Clustering Languages w/o BC (%) with BC (%) Languages w/o BC (%) with BC (%) Albanian 72.4 74.6 Northern Sotho 90.2 90.8 Chechen 53.1 55.4 Polish 49.6 53.2 Chinese 66.3 68.0 Somali 76.9 78.5 English 69.5 70.9 Spanish 67.1 68.9 Kannada 51.9 56.0 Swahili 64.3 67.8 Kikuyu 84.2 88.7 Yoruba 46.1 49.5 Nepali 41.6 43.9 16

� � Joint Learning of Word and Entity Embeddings from Wikipedia Consider all Wikipedia anchor links as en4ty annota4ons, a training corpus can • be created by replacing anchor links with unique en4ty IDs. e.g., [[en/Apple|apple]] is a fruit � [[en/Apple_Inc.|apple]] is a company � apple is a fruit � en/Apple is a fruit � apple is a company � en/Apple_Inc. is a company � Mul4-lingual • 17

Joint Learning of Word and Entity Embeddings from Wikipedia e 1 Entity Representation Learning Philadelphia Fireworks o n s b r a t i e l e C Independence   born Day (US) P ( N ( e j ) | e j ) country N ( · ) e Independence Day ( film ) inlink Observed by Will   category Smith e Independence Day ( US ) , , e j , e N ( · ) g n i r r a United   e Memorial Day t s outlink Public holidays in States e 2 the United States word embeddings N ( · ) Independence   O Day (film) b y s ⇤ , e 3 e r o r g v e e a t d Knowledge Space c Memorial   b y outlink i n Day l i n k Mention Representation Learning Knowledge Base C ( · ) P ( e j |C ( m l ) , s ⇤ j ) bands played it during public events, played it during public Mention Sense C ( · ) such as events, such as   Mapping , , e j , e s ∗ s ⇤ [[Independence Day Independence Day ( film ) [[ ]] Independence Day ( US ) C ( · ) g ( July 4 th, e 1 ) (US)|July 4th]] celebrations w film celebrations , s ⇤ j , w Anchor ) , s ∗ … In the 1996 action film [[Independence Day Independence Day ( US ) Text Representation Learning d 1 (film)|Independence Day]], the United States P ( C ( w i ) | w i ) · P ( C ( m l ) | s ⇤ w celebrations j ) s ⇤ military uses alien technology captured … Memorial Day C ( · ) … holds annual [[Independence Day (US)| Text Space , d 2 , d Independence Day]] celebrations and other ⇤ , w i /s ⇤ festivals … C ( · ) j … early Confederate [[Memorial Day]] , d 3 , s celebrations were simple, somber occasions for C ( · ) veterans and their families to honor the dead … Text Representation Learning 18

� � � � Learning Entity Embeddings from DBpedia Construct a weighted undirected graph G = (E, D) from DBpedia, where E • is a set of all en44es in DBpedia and d ij ∈ D indicates that two en44es e i and e j share some DBpedia proper4es. The weight of d ij , w ij is computed as: � | p i \ p j | w ij = max( | p i | , | p j | ) where p i , p j are the sets of DBpedia proper4es of e i and e j respec4vely. � Apply the graph embedding framework proposed by (Tang et al., 2015) to • generate knowledge representa4ons for all en44es 19

Impact of Joint Embeddings on Entity Linking • Unsupervised en4ty linking based on salience, similarity and coherence • Tested on EDL16 perfect English NAM men4ons CEAFm P CEAFm R CEAFm F1 Baseline 0.762 0.843 0.801 + Joint word and en4ty 0.791 0.875 0.831 embeddings from Wikipedia + En4ty embedding from 0.812 0.897 0.852 DBpedia 20

Resources and Demos 21

Systems, Data and Resources Publicly Available § Re-trainable Systems: § h_p://blender02.cs.rpi.edu:3300/elisa_ie/api § Source code base available for government users upon requests § Tri-lingual EDL is being integrated into CoreNLP and hope to release in 2017 § Data and Resources: § h_p://nlp.cs.rpi.edu/wikiann/ § Demos: § h_p://blender02.cs.rpi.edu:3300/elisa_ie § h_p://blender02.cs.rpi.edu:3300/elisa_ie/heatmap 22

Demo 1: Cross-lingual Entity Discovery and Linking for 282 Languages § h_p://blender02.cs.rpi.edu:3300/elisa_ie § h_p://blender02.cs.rpi.edu:3300/elisa_ie/heatmap 23

Cross-lingual Entity Discovery and Linking for 282 Languages (Cont’) 24

Cross-lingual Entity Discovery and Linking for 282 Languages (Cont’) 25

IE Application Example: Disaster Relief 26

Cross-lingual Entity Discovery and Linking for 282 Languages (Cont’) § h_p://blender02.cs.rpi.edu:3300/elisa_ie/heatmap 27 27

Exploring Multi-level Distributional Semantics for Cross-lingual - PowerPoint PPT Presentation

Exploring Multi-level Distributional Semantics for Cross-lingual Entity Discovery and Linking Boliang Zhang, Xiaoman Pan, Lifu Huang, Ying Lin, Heng Ji jih@rpi.edu Noisy Training Data Acquisition 1: Chinese Room 2 Noisy Training Data

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Logic and Natural Language Semantics: Distributional Semantics R affaella B ernardi DISI, U

Modelling constructional change with distributional semantics Florent Perek Overview o Applying

Synonymy in an approach to combined distributional and compositional semantics Ann Copestake and

Cross-Cutting Models of Lexical Semantics Joseph Reisinger and Raymond Mooney Distributional

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Distributional Semantics Crash Course September 11, 2018 CSCI 2952C: Computational Semantics

Distributional Semantics Joo Sedoc IntroHLT class November 4, 2019 Intuition of

JoBimText Framework for Distributional Semantics Alexander Panchenko TU Darmstadt FG

Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz 2017 c

PROCESS OPTIMIZATION OF COMPOSITE PANELS WITH COMPRESSION MOLDING Moosun KIM 1, 2 , Woo-Suck HAN 1

Endoscopic Carpal Tunnel Release Dr David Hildreth , Houston, TX. Dr Stuart Kirkham MBBS FRACS

Corporate Presentation March 2020 El Mochito Mine, Honduras w w w . a s c e n d a n t r e s o u

Coca-Cola HBC DB Consumer Conference June 2019 Zoran Bogdanovic CEO Forward looking statements

Cross-lingual Predicate Cluster Acquisition to Improve Bilingual Event Extraction by Inductive

PEXit TM The Integrated Multi-Lingual Media Disclaimer: Contents in this presentation are

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. Rob van der Goot, Nikola

Mul&lingualism @ ECUAD Debora O & Tara Wren

Exploring Multi-level Distributional Semantics for Cross-lingual - PowerPoint PPT Presentation

Exploring Multi-level Distributional Semantics for Cross-lingual Entity Discovery and Linking Boliang Zhang, Xiaoman Pan, Lifu Huang, Ying Lin, Heng Ji jih@rpi.edu Noisy Training Data Acquisition 1: Chinese Room 2 Noisy Training Data

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Logic and Natural Language Semantics: Distributional Semantics R affaella B ernardi DISI, U

Modelling constructional change with distributional semantics Florent Perek Overview o Applying

Synonymy in an approach to combined distributional and compositional semantics Ann Copestake and

Cross-Cutting Models of Lexical Semantics Joseph Reisinger and Raymond Mooney Distributional

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Distributional Semantics Crash Course September 11, 2018 CSCI 2952C: Computational Semantics

Distributional Semantics Joo Sedoc IntroHLT class November 4, 2019 Intuition of

JoBimText Framework for Distributional Semantics Alexander Panchenko TU Darmstadt FG

Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz 2017 c

PROCESS OPTIMIZATION OF COMPOSITE PANELS WITH COMPRESSION MOLDING Moosun KIM 1, 2 , Woo-Suck HAN 1

Endoscopic Carpal Tunnel Release Dr David Hildreth , Houston, TX. Dr Stuart Kirkham MBBS FRACS

Corporate Presentation March 2020 El Mochito Mine, Honduras w w w . a s c e n d a n t r e s o u

Coca-Cola HBC DB Consumer Conference June 2019 Zoran Bogdanovic CEO Forward looking statements

Cross-lingual Predicate Cluster Acquisition to Improve Bilingual Event Extraction by Inductive

PEXit TM The Integrated Multi-Lingual Media Disclaimer: Contents in this presentation are

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. Rob van der Goot, Nikola

Mul&amp;lingualism @ ECUAD Debora O &amp; Tara Wren

Mul&lingualism @ ECUAD Debora O & Tara Wren