CS11-747 Neural Networks for NLP
Learning From/For Knowledge Bases
Graham Neubig
Site https://phontron.com/class/nn4nlp2019/
Learning From/For Knowledge Bases Graham Neubig Site - - PowerPoint PPT Presentation
CS11-747 Neural Networks for NLP Learning From/For Knowledge Bases Graham Neubig Site https://phontron.com/class/nn4nlp2019/ Knowledge Bases Structured databases of knowledge usually containing Entities (nodes in a graph) Relations
CS11-747 Neural Networks for NLP
Graham Neubig
Site https://phontron.com/class/nn4nlp2019/
neural networks?
bases to improve neural representations?
(see also semantic parsing class)
parts of speech, semantic relations
Image Credit: NLTK
all common sense knowledge, 30 years in the making
Image Credit: NLTK
Structured data
sources such as WordNet and Wikipedia, but augmented with multi-lingual information
large scale
are by nature incomplete
“date of birth” (West et al. 2014)
information for knowledge bases?
2013)
extractors, equivalent to projections in space
perceptron that predicts whether a relation exists
embeddings based on their relation
modification only, intentionally simpler than NTN
relevant to a particular relation
that relation, then verify relation
represented by a limited number of “concepts”
concepts, with sparse mixture vector α
that matches this and use it to train
train a system
minimal constituent containing both words
aggregate over multiple instances of links in dependency tree
want to model it
amount of noise expected in the data
be clean, then phase in on full data
headquarters of United Continental Holdings"
Holdings}
abstract
2011)
must be noun phrases, etc.
find common, and therefore potentially reliable, extractions
(He et al. 2015):
supervised neural BIO tagger (Stanovsky et al. 2018)
context, which is indicative of KB traits
are also indicative
containing entity/relation/entity tuples
(Riedel et al. 2013)
have a knowledge base, and text from OpenIE extractions?
embed relations from multiple schema in the same space
individual relations
publish the paper”
we still get the correct entity
where the rule weight is α
word embeddings
transformation of embeddings
neighbors, and close to original embedding
(select the most probable)