Partha Pratim Talukdar
Computer & Information Science Department University of Pennsylvania, Philadelphia partha@cis.upenn.edu
Joint work with Thorsten Brants (Google), Mark Liberman (Penn) and Fernando Pereira (Penn).
A Context Pattern Induction Method for Named Entity Extraction - - PowerPoint PPT Presentation
A Context Pattern Induction Method for Named Entity Extraction Partha Pratim Talukdar Computer & Information Science Department University of Pennsylvania, Philadelphia partha@cis.upenn.edu Joint work with Thorsten Brants (Google), Mark
Computer & Information Science Department University of Pennsylvania, Philadelphia partha@cis.upenn.edu
Joint work with Thorsten Brants (Google), Mark Liberman (Penn) and Fernando Pereira (Penn).
We have identified a transcriptional repressor , Nrg1, in a genetic screen designed to reveal negative factors involved in the expression of STA1. We have identified a transcriptional repressor , Nrg1, in a genetic screen designed to reveal negative factors involved in the expression of STA1.
CHOP (Penn) Gene List
Unlabeled Data Seed
Morgan-Stanley Google
. . . . .
Morgan Stanley Google Goldman-Sachs Sun
. . .
. analyst at <ENT NT> . companies such as <ENT NT> , joint venture between <ENT NT> ( .
Seed Unlabeled Data Extract Context Find Triggers Induce & Prune Automata Automata as Extractor Extended List RANK RANK
** One automaton induced for each trigger word.
Entity Tagger
an an increased increased expression expression of
## adenosine adenosine deaminase deaminase ## ## in in vad vad mic mic e expression expression of
murine ## ## adenosine adenosine deaminase deaminase ## ## gene gene in in rhesus rhesus monkey monkey contrast contrast the the expression expression of
# # apolipoprotein apolipoprotein e e ## ## mrna mrna was was greater greater than than
showed showed an an increased increased expression expression of
vivo vivo expression expression of
murine <ENT> gene in rhesus monkey hematopoietic plasmodium plasmodium falciparum falciparum expression expression of
the <ENT> gene in mouse l cells in in contrast contrast the the expression expression of
…
Dominating Frequency expression 2 murine 1 falciparum 1
n = 1
41 42
a
43
the the the the
expression expression of
expression expression of
murine -<ENT>- … expression expression of
the -<ENT>- … expression expression of
D D K K D D K
Positive Seed (ORG) Negative Seed (PER) Negative Seed (LOC)
ORG Pattern to be Ranked
Good Pattern 1 Good Pattern 2 Good Pattern 3 Good Pattern 4 Good Pattern 5 Good Pattern n
. . .
Entity_60 Entity_8
…
Test Data Sizes: Test-a 51362 tokens, Test-b 46435 tokens
PER, LOC, ORG, MISC PER, LOC, ORG
42 41
(20)
(20)
(20)
43 1
a a (40)
(40)
an an (2)
(2)
the the (18)
(18)
the the (80)
(80)
an an (5)
(5)
…(98)
98)
…(40)
(40)
…(7 )
7 )
?