 
              A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign What Is Relation Extraction? … hundreds of Palestinians converged on the square … person bounded-area relation (entity type) ? (entity type) Apr 23, 2007 NAACL-HLT 2 1
What Is Relation Extraction? … hundreds of Palestinians converged on the square … person bounded-area located (entity type) (entity type) (relation type) Apr 23, 2007 NAACL-HLT 3 Existing Methods • Rule-based [Califf & Mooney 98] • Generative-model-based [Miller et al. 00] • Discriminative-model-based – Feature-based [Zhou et al. 05] – Kernel-based [Bunescu & Mooney 05b] [Zhang et al. 06] Apr 23, 2007 NAACL-HLT 4 2
Feature-Based Methods … hundreds of Palestinians arg1 located converged on the square arg2 … • Entity info – arg 1 is a Person entity & arg 2 is a Bounded-Area entity • POS tagging Other features? – there is a preposition between arg 1 and arg 2 • Syntactic parsing – arg 2 is inside a prepositional phrase following arg 1 • Dependency parsing – arg 2 is dependent on a preposition, which in turn is dependent on a verb Apr 23, 2007 NAACL-HLT 5 Kernel-Based Methods • Define a kernel function to measure the similarity between two relation instances • Convolution kernels – Defined on sequence or tree representation of relation instances – Corresponding to a feature space, where features are sub-structures such as sub- sequences and sub-trees Apr 23, 2007 NAACL-HLT 6 3
Convolution Tree Kernel (sub-tree features) S VP NP PP PP NPB NPB NPB NNS IN NNP VBD IN DT NN hundreds of Palestinians converged on the square Apr 23, 2007 NAACL-HLT 7 Convolution Tree Kernel (sub-tree features) S VP NP PP PP NPB NPB NPB NNS IN NNP VBD IN DT NN hundreds of Palestinians converged on the square Apr 23, 2007 NAACL-HLT 8 4
Convolution Tree Kernel (sub-tree features) S VP NP PP PP NPB NPB NPB VBD NNS IN NNP IN DT NN hundreds of Palestinians on the square converged Apr 23, 2007 NAACL-HLT 9 Convolution Tree Kernel (sub-tree features) S VP NP PP PP NPB NPB NPB DT NN NNS IN NNP VBD IN hundreds of Palestinians converged on the square Apr 23, 2007 NAACL-HLT 10 5
Convolution Tree Kernel (sub-tree features) S VP NP PP PP NPB NPB NPB VBD NNS IN NNP IN DT NN hundreds of Palestinians on the square converged Apr 23, 2007 NAACL-HLT 11 Convolution Tree Kernel (sub-tree features) NOT included by S original definition! VP Useful? NP PP PP Yes Choices of features are also critical in kernel methods! NPB NPB NPB IN NN NNS IN NNP VBD DT hundreds of Palestinians converged on the square Apr 23, 2007 NAACL-HLT 12 6
Is it possible to define the complete set of potentially useful features ? Apr 23, 2007 NAACL-HLT 13 Outline of Our Work • Defined a graphic representation of relation instances • Presented a general definition of features • Proposed a bottom-up search strategy to explore the feature space • Evaluated different types of features Apr 23, 2007 NAACL-HLT 14 7
A Graphic Representation of Relation Instances hundreds of Palestinians converged on the square • Each node can have multiple labels – Word, POS tag, entity type, etc. Apr 23, 2007 NAACL-HLT 15 A Graphic Representation of Relation Instances NNS IN NNP VBD IN DT NN hundreds of Palestinians converged on the square Person Bounded-Area • Each node can have multiple labels – Word, POS tag, entity type, etc. • Each node has an argument tag set to 0, 1, 2, or 3 Apr 23, 2007 NAACL-HLT 16 8
A Graphic Representation of Relation Instances 0 0 1 0 0 0 2 NNS IN NNP VBD IN DT NN hundreds of Palestinians converged on the square Person Bounded-Area • Each node can have multiple labels – Word, POS tag, entity type, etc. • Each node has an argument tag set to 0, 1, 2, or 3 Apr 23, 2007 NAACL-HLT 17 Graphic Representation Based on Syntactic Parse Trees 3 S 2 VP 1 NP 2 1 PP PP 0 1 2 NPB NPB NPB 0 0 1 0 0 0 2 NNS IN NNP VBD IN DT NN hundreds of Palestinians converged on the square Person Bounded-Area Apr 23, 2007 NAACL-HLT 18 9
Graphic Representation Based on Dependency Parse Trees 0 0 1 0 0 0 2 NNS IN NNP VBD IN DT NN hundreds of Palestinians converged on the square Person Bounded-Area Apr 23, 2007 NAACL-HLT 19 A General Definition of Features 0 0 1 0 0 0 2 NNS IN NNP VBD IN DT NN hundreds of Palestinians converged on the square Person Bounded-Area • Sub-graphs Apr 23, 2007 NAACL-HLT 20 10
A General Definition of Features 2 0 0 1 0 0 0 NNS IN NNP VBD IN DT NN hundreds of Palestinians converged on the square Person Bounded-Area • Sub-graph • Subset of the original label set Apr 23, 2007 NAACL-HLT 21 A General Definition of Features 2 0 0 1 0 0 0 NNS IN NNP VBD IN DT NN hundreds of Palestinians converged on the square Bounded-Area Person • Sub-graph • Subset of the original label set Apr 23, 2007 NAACL-HLT 22 11
A General Definition of Features 2 0 0 1 0 0 0 NNS IN NNP VBD IN DT NN square hundreds of Palestinians converged on the Person Bounded-Area • Sub-graph Unigram Feature • Subset of the original label set Apr 23, 2007 NAACL-HLT 23 A General Definition of Features 1 0 0 0 0 0 2 NNS IN IN DT NN NNP VBD hundreds of Palestinians converged on the square Person Bounded-Area Apr 23, 2007 NAACL-HLT 24 12
A General Definition of Features 1 0 0 0 0 0 2 VBD NNS IN IN DT NN NNP hundreds of on the square Palestinians converged Person Bounded-Area Bigram Feature Apr 23, 2007 NAACL-HLT 25 More Examples 3 S 2 VP 1 NP 2 1 PP PP 0 1 2 NPB NPB NPB 0 0 1 0 0 0 2 NNS IN NNP VBD IN DT NN hundreds of Palestinians converged on the square Person Bounded-Area Apr 23, 2007 NAACL-HLT 26 13
More Examples 3 S Production Feature 2 VP 1 2 NP 1 PP PP 2 0 1 NPB NPB NPB 0 0 0 1 0 0 2 NNS IN NNP VBD DT NN IN on hundreds of Palestinians converged the square Person Bounded-Area Apr 23, 2007 NAACL-HLT 27 More Examples 3 S 2 VP 1 2 NP 1 PP PP 2 0 1 NPB NPB NPB 0 0 2 0 0 1 0 IN DT NNS IN NNP VBD NN hundreds of Palestinians converged on the square Bounded-Area Person Apr 23, 2007 NAACL-HLT 28 14
More Examples 3 S 2 VP 1 2 NP 1 PP PP 2 0 1 NPB NPB NPB 0 2 0 0 1 0 0 IN NNS IN NNP VBD DT NN hundreds of Palestinians converged the on square Bounded-Area Person Apr 23, 2007 NAACL-HLT 29 More Examples 3 S 2 VP 1 NP 2 1 PP PP 1 2 0 NPB NPB NPB 1 2 0 0 0 0 0 NNS IN VBD IN DT NNP NN hundreds of converged on the Palestinians square Person Bounded-Area Apr 23, 2007 NAACL-HLT 30 15
More Examples 3 S 2 VP 1 2 NP 1 PP PP 2 0 1 NPB NPB NPB 0 0 1 0 0 0 2 NNS IN NNP VBD IN DT NN hundreds of Palestinians converged on the square Person Bounded-Area Apr 23, 2007 NAACL-HLT 31 Coverage of the Feature Definition • Entity attributes [Zhao & Grishman 05] [Zhou et al. 05] – Unigram features with entity attributes 2 0 0 1 0 0 0 NNS IN NNP VBD IN DT NN hundreds of Palestinians converged on the square Bounded-Area Person Apr 23, 2007 NAACL-HLT 32 16
Coverage of the Feature Definition • Bag-of-word features [Zhao & Grishman 05] [Zhou et al. 05] – Unigram features with words 2 0 0 1 0 0 0 NNS IN NNP VBD IN DT NN square hundreds of Palestinians converged on the Person Bounded-Area Apr 23, 2007 NAACL-HLT 33 Coverage of the Feature Definition • Bigram features [Zhao & Grishman 05] – Bigram features with words 1 0 0 0 0 0 2 NNS IN IN DT NN NNP VBD Palestinians converged hundreds of on the square Person Bounded-Area Apr 23, 2007 NAACL-HLT 34 17
Coverage of the Feature Definition • Grammar production features [Zhang et al. 06] – Production features • Dependency relation and dependency path features [Bunescu & Mooney 05a] [Zhao and Grishman 05] [Zhou et al. 05] – Bigram and n-gram features with words 0 1 0 0 0 0 2 NNS VBD IN DT NN IN NNP of Palestinians hundreds converged on the square Bounded-Area Person Apr 23, 2007 NAACL-HLT 35 Exploring the Feature Space • We consider three feature subspaces: – Sequence, syntactic parse tree, dependency parse tree • A bottom-up strategy – Start with unigram features, and gradually increase the size/complexity of the features – First search in each subspace, then merge features from different subspaces Apr 23, 2007 NAACL-HLT 36 18
Recommend
More recommend