PRESENTATION ON:
“A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION EXTRACTION”
DEPENDENCY PARSING KERNEL METHODS PAPER
**Taken from CS388 by Raymond J. Mooney University of T exas at Austin
- Hypothesis
- Dependency Parsing - CFG vs CCG
- Kernel
- Evaluation
PRESENTATION ON: A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION - - PowerPoint PPT Presentation
DEPENDENCY PARSING KERNEL METHODS PAPER PRESENTATION ON: A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION EXTRACTION Hypothesis Dependency Parsing - CFG vs CCG Kernel Evaluation **Taken from CS388 by Raymond J. Mooney
DEPENDENCY PARSING KERNEL METHODS PAPER
**Taken from CS388 by Raymond J. Mooney University of T exas at Austin
DEPENDENCY PARSING KERNEL METHODS PAPER
*Taken from CS388 by Raymond J. Mooney University of T exas at Austin From: Wikipedia
For sentence: John liked the dog in the pen.
S NP VP John V NP liked the dog in the pen
Sliked-VBD VP VP PP DT Nominal liked IN NP in the dog NN DT Nominal NN the pen NP John
pen-NN pen-NN in-IN dog-NN dog-NN liked-VBD John-NNP
NP VBD
liked-VBD
liked John dog pen
in
the the
nsubj dobj
det
det
DEPENDENCY PARSING KERNEL METHODS PAPER Parse Trees* Typed Dependency Parse Trees*
DEPENDENCY PARSING KERNEL METHODS PAPER
Parse Trees using CFG – with heads and without.
Sliked-VBD VP VP PP DT Nominal liked IN NP in the dog NN DT Nominal NN the pen NP John
pen-NN pen-NN in-IN dog-NN dog-NN liked-VBD John-NNP
NP VBD
liked-VBD
liked John dog pen
in
the the
nsubj dobj
det
det
Can convert a phrase structure parse to a dependency tree by making the head of each non-head child of a node depend on the head of the head child.* Example PCFG rules (no heads) with weights
DEPENDENCY PARSING KERNEL METHODS PAPER
Combinatory Categorial Grammars
from Steedman, and wikipedia Bunescu and Mooney, 2005 Unclear how to go from phrasal parse to dependency parse Gagan: Interesting Rishab: Better CCG parser Modern day parser?
DEPENDENCY PARSING KERNEL METHODS PAPER
Polynomial kernel 2D to 3D
RBF kernel DEPENDENCY PARSING KERNEL METHODS PAPER
ACL-04 Tutorial
ACL-04 Tutorial
DEPENDENCY PARSING KERNEL METHODS PAPER
is a similarity measure defined by an implicit mapping f, from the original space to a vector space (feature space) such that: 𝑙(𝑦, 𝑧)=f(𝑦)•f(𝑧)
Simpler structure (linear representation of the data) Possibly infinite dimension (hypothesis space for learning) … but still computational efficiency when computing 𝑙(𝑦, 𝑧)
ACL-04 Tutorial
DEPENDENCY PARSING KERNEL METHODS PAPER
DEPENDENCY PARSING KERNEL METHODS PAPER
ACL-04 Tutorial
DEPENDENCY PARSING KERNEL METHODS PAPER
Tree Kernels
𝑙 𝜐1, 𝜐2 = #𝑑𝑝𝑛𝑛𝑝𝑜 𝑡𝑣𝑐𝑢𝑠𝑓𝑓𝑡 𝑐𝑓𝑢𝑥𝑓𝑓𝑜 𝜐1, 𝜐2
𝑙 𝑈
1, 𝑈2 = 𝑜1∈𝑈
1
𝑜2∈𝑈
2
𝑙𝑑𝑝−𝑠𝑝𝑝𝑢𝑓𝑒(𝑜1, 𝑜2) 𝑙𝑑𝑝−𝑠𝑝𝑝𝑢𝑓𝑒 = 0, 𝑜1𝑝𝑠 𝑜2 𝑗𝑡 𝑏 𝑚𝑓𝑏𝑔 𝑝𝑠 𝑜1 ≠ 𝑜2
𝑗∈𝑑ℎ𝑗𝑚𝑒𝑠𝑓𝑜
1 + 𝑙𝑑𝑝−𝑠𝑝𝑝𝑢𝑓𝑒(𝑑ℎ 𝑜1, 𝑗 , 𝑑ℎ(𝑜2, 𝑗)) , 𝑝𝑥
From ACL tutorial 2004 wikipedia #common subtrees = 7 Similar ideas used in Culotta and Sorensen, 2004 Shantanu: not intuitive
DEPENDENCY PARSING KERNEL METHODS PAPER
Interesting:
example Multiclass SVM:
Given classes 0,1,2 … 𝑀 One-vs-all: learn 𝑗 𝑗
𝑀 which are functions on input space, and assign label that
gives maximum 𝑗 - 𝑃(𝑀) classifiers One-vs-one: learn 𝑗𝑘 𝑗,𝑘
𝑀,𝑀, one function for each pair of classes. Assign label
with most “votes”. - 𝑃(𝑀2) classifiers How does hierarchy help? (think S1 vs S2) Anshul: Multiclass SVM blows up for many classes. No finer relations.
DEPENDENCY PARSING KERNEL METHODS PAPER HYPOTHESIS: If 𝑓1 and 𝑓2 are entities in a sentence related by 𝑆, then hypothesize that contribution of sentence dependency graph to establishing 𝑆(𝑓1, 𝑓2) is almost exclusively concentrated in the shortest path between 𝑓1 and 𝑓2 in the undirected dependency graph. Arindam: over simplified Nupur: didn’t verify hypothesis Barun: Useful. No statistical backing Swarnadeep: More examples/backing for hypothesis Dhruvin: When does it fail? Happy: intuition All figures and tables from Bunescu and Mooney, 2005
DEPENDENCY PARSING KERNEL METHODS PAPER
Akshay: Limited ontology Surag: no temporality
DEPENDENCY PARSING KERNEL METHODS PAPER
Syntactic knowledge helps with IE.
Different levels of syntactic knowledge have been used. Paper states the hypothesis that most of the information useful for Relation Extraction is concentrated in shortest path in undirected dependency graph between entities. Amount of syntactic knowledge
Assumptions:
All relations are intra-sentence. Sentences are independent of each other. Relations are known, entities are known.
How do we use syntactic knowledge
Ray and Craven, 2001: PoS and Chunking Zelenko et al, 2003 : Shallow parse trees based kernel methods Culotta and Sorensen, 2004 : Dependency trees Anshul: Mines implicit relations??? No strong reasons Himanshu: dependency is hard Arindam: likes deep syntactic knowledge. Nupur: likes idea. Barun: is classification even useful? Gagan: dislike sentence assumption
DEPENDENCY PARSING KERNEL METHODS PAPER
Original Training Data: Articles, with entities and relations Processed Training data: (𝒚𝑗, 𝑧𝑗) 𝑗=0
𝑂
where 𝒚𝑗 is shortest path with types. – Will go through SVM in a bit. 𝑧𝑗 - 5+1 top level relations; 24 fine relation types. 𝑧𝑗 top level = {ROLE, PART, LOCATED, NEAR, SOCIAL} Handles negation – Attach (-) suffix to words modified by negative determiners. 𝒚𝑗 𝜚(𝒚𝑗)? Nupur, Gagan: nots! Anshul: more nots? Yashoteja: general – add more features! Prachi: what happened to 𝑓3? Happy: verb semantics? Use markov logic? Anshul: intrasentence context ; no novel relations ;
DEPENDENCY PARSING KERNEL METHODS PAPER 𝒚, 𝒛 are vectors Comments:
Yeah, but doesn’t matter because different set of features entirely… and hence lower SVM weights?
SVM:
Himanshu, Barun: Other similarity metrics? Semantics? Rishab: Theoretically SVMs are awesome; efficient ; synonyms? Nupur,Akshay,…: antonyms, synonyms etc? 0 if m!=n? Ankit: use redundancy? … : word2vec too? Prachi: Words + classes against sparsity ; Happy: RBF kernel? Shantanu: It isn’t using feature space fully. Weight it? Anshul: handle multiple shortest paths Surag: Lexical and unlexical features. Additional layer of inference?
DEPENDENCY PARSING KERNEL METHODS PAPER
ACE Corpus 422 documents + 97 test documents 6k training relation instances + 1.5k test Dependency Parsers:
SVMs:
Kernel:
Comparision:
RECALL: Assuming that independent sentences and only intra-sentential relations Akshay, Ankit: Limited dataset Rishab: something between CCG and CFG?; 5 classes ; likes 2 step. Gagan: CCGs are interesting. Why 10 Ankit: likes hierarchy Yashoteja: A lot of dot products are 0. Mess around with optimizer?
DEPENDENCY PARSING KERNEL METHODS PAPER
CFG dependency parsing does better than CCG – paper attributes it to Collins’ finding local dependencies better. Intuition between shortest path similar to Richards and Mooney, 1992: “In most relational domains, important concepts will be represented by a small number of fixed paths among the constants defining a positive instance – for example, the grandparent relation by a single path consisting of two parent relations.” Arindam: CCG vs CFG Akshay: Other papers? Surag, Dinesh, Shantanu: No error analysis when low recall Anshul: Confusion matrix?
A summary of a lot of people’s comments
DEPENDENCY PARSING KERNEL METHODS PAPER