Information Extraction Using the Structured Language Model Ciprian - - PowerPoint PPT Presentation

information extraction using the structured language
SMART_READER_LITE
LIVE PREVIEW

Information Extraction Using the Structured Language Model Ciprian - - PowerPoint PPT Presentation

Information Extraction Using the Structured Language Model Ciprian Chelba, Milind Mahajan Information Extraction from Text Structured Language Model (SLM) SLM for Information Extraction Experiments and Error Analysis Conclusions


slide-1
SLIDE 1

Information Extraction Using the Structured Language Model Ciprian Chelba, Milind Mahajan

Information Extraction from Text Structured Language Model (SLM) SLM for Information Extraction Experiments and Error Analysis Conclusions and Future Directions

Microsoft Research Speech.Net

slide-2
SLIDE 2

Information Extraction from Text

Data driven approach with minimal annotation effort: clearly identifiable semantic

slots and frames

Information extraction viewed as the recovery of a two level semantic parse S for a

given word sequence

W Sentence independence assumption: the sentence W is sufficient for identifying the

semantic parse

S

Person Schedule meeting with Megan Hokins about internal lecture at two thirty p.m. Time Calendar Task Subject FRAME LEVEL SLOT LEVEL

Microsoft Research Speech.Net

slide-3
SLIDE 3

Syntactic Parsing Using the Structured Language Model

Generalize trigram modeling (local) by taking advantage of sentence structure (influ-

ence by more distant past)

Develop hidden syntactic structure T i for a given word prefix W i, with headword

assignment

Assign a probability P (W i ; T i )

ended_VBD cents_NNS after cents_NP

  • f_PP

loss_NP loss_NP ended_VP’ with_PP with_IN a_DT loss_NN of_IN 7_CD the_DT contract_NN contract_NP

Microsoft Research Speech.Net

slide-4
SLIDE 4

ended_VBD loss_NP with_IN a_DT loss_NN of_IN the_DT contract_NN contract_NP 7_CD cents_NNS cents_NP

  • f_PP

loss_NP with_PP ended_VP’

: : :; null; predict cents; POStag cents; adjoin-right-NP; adjoin-left-PP; : : :; adjoin-

left-VP’; null;

: : :;

Microsoft Research Speech.Net

slide-5
SLIDE 5

ended_VBD loss_NP with_IN a_DT loss_NN of_IN the_DT contract_NN contract_NP 7_CD cents cents_NP

  • f_PP

loss_NP with_PP ended_VP’ _NNS

: : :; null; predict cents; POStag cents; adjoin-right-NP; adjoin-left-PP; : : :; adjoin-

left-VP’; null;

: : :;
slide-6
SLIDE 6

ended_VBD loss_NP with_IN a_DT loss_NN of_IN the_DT contract_NN contract_NP 7_CD cents

  • f_PP

loss_NP with_PP ended_VP’ _NNS cents_NP

: : :; null; predict cents; POStag cents; adjoin-right-NP; adjoin-left-PP; : : :; adjoin-

left-VP’; null;

: : :;
slide-7
SLIDE 7

ended_VBD loss_NP with_IN a_DT loss_NN of_IN the_DT contract_NN contract_NP 7_CD cents loss_NP with_PP ended_VP’ _NNS cents_NP

  • f_PP
: : :; null; predict cents; POStag cents; adjoin-right-NP; adjoin-left-PP; : : :; adjoin-

left-VP’; null;

: : :;
slide-8
SLIDE 8

ended_VBD loss_NP with_IN a_DT loss_NN of_IN the_DT contract_NN contract_NP 7_CD cents_NNS cents_NP

  • f_PP

with_PP ended_VP’ loss_NP

PREDICTOR TAGGER PARSER predict word tag word adjoin_{left,right} null

: : :; null; predict cents; POStag cents; adjoin-right-NP; adjoin-left-PP; : : :; adjoin-

left-VP’; null;

: : :;
slide-9
SLIDE 9

Word and Structure Generation

P (T n+1 ; W n+1 ) = n+1 Y i=1 P (w i jh 2 ; h 1 ) | {z }

predictor

P (g i jw i ; h 1 :tag ; h 2 :tag ) | {z }

tagger

P (T i jw i ; g i ; T i1 ) | {z }

parser

The predictor generates the next word w i with probability P (w i = v jh 2 ; h 1 ) The tagger attaches tag g i to the most recently generated word w i with probability P (g i jw i ; h 1 :tag ; h 2 :tag ) The parser builds the partial parse T i from T i1 ; w i, and g i in a series of moves

ending with null, where a parser move

a is made with probability P (ajh 2 ; h 1 ); a 2 f(adjoin-left, NTtag), (adjoin-right, NTtag), null g

Microsoft Research Speech.Net

slide-10
SLIDE 10

Model Parameter Reestimation

Need to re-estimate model component probabilities such that we decrease the model perplexity.

P (w i = v jh 2 ; h 1 ); P (g i jw i ; h 1 :tag ; h 2 :tag ) ; P (ajh 2 ; h 1 )

N-best variant of the Expectation-Maximization(EM) algorithm:

We seed re-estimation process with parameter estimates gathered from manually or

automatically parsed sentences

We retain the N “best” parses fT 1 ; : : : ; T N g for the complete sentence W The hidden events in the EM algorithm are restricted to those occurring in the N

“best” parses

Microsoft Research Speech.Net

slide-11
SLIDE 11

SLM for Information Extraction

☞ Training:

initialization Initialize SLM as a syntactic parser from treebank syntactic parsing Train SLM as a matched constrained parser and parse the train- ing data: boundaries of semantic constituents are matched augmentation Enrich the non/pre-terminal labels in the resulting treebank with se- mantic tags syntactic+semantic parsing Train SLM as an L-matched constrained parser: bound- aries and tags of the semantic constituents are matched

☞ Test:

– syntactic+semantic parsing of test sentences; retrieve the semantic parse by taking the semantic projection of the most likely parse:

S = S E M (a rg max T i P (T i ; W ))

Microsoft Research Speech.Net

slide-12
SLIDE 12

Constrained Parsing Using the SLM

a semantic parse S is equivalent to a set of constraints each constraint is a 3-tuple =< l ; r ; Q >: l/r is the left/right boundary of the semantic constituent to be matched and Q is the

set of allowable non-terminal tags for the constituent

☞ Match parsing (syntactic parsing stage):

  • 1. parses match the constraint boundaries
:l ; :r ; 8 for a given sentence

☞ L-Match parsing (syntactic+semantic parsing stage):

  • 1. parses match the constraint boundaries and the set of labels
Q: :l ; :r ; :Q; 8
  • 2. the semantic projection of the parse trees must have exactly two levels

✔ Both Match and L-Match parsing can be efficiently implemented in the left-to-right,

bottom-up, binary parsing strategy of the SLM

✔ On test sentences the only constraint available is the identity of the semantic tag at

the root node

Microsoft Research Speech.Net

slide-13
SLIDE 13

Experiments

MiPad data (personal information management)

training set: 2,239 sentences (27,119 words) and 5,431 slots test set: 1,101 sentences (8,652 words) and 1,698 slots vocabulary: 1,035wds, closed over test data

Training Iteration Error Rate (%) Training Test Stage 2 Stage 4 Slot Frame Slot Frame Baseline 43.41 7.20 57.36 14.90 0, MiPad/NLPwin 9.78 1.65 37.87 21.62 1, UPenn Trbnk 8.44 2.10 36.93 16.08 1, UPenn Trbnk 1 7.82 1.70 36.98 16.80 1, UPenn Trbnk 2 7.69 1.50 36.98 16.80

baseline is a semantic grammar developed manually that makes no use of syntactic

information

initialize the syntactic SLM from in-domain MiPad treebank (NLPwin) and out-of-

domain Wall Street Journal treebank (UPenn)

3 iterations of N-best EM parameter reestimation algorithm

Microsoft Research Speech.Net

slide-14
SLIDE 14

Would More Data Help?

big difference in performance between training and test suggests over training studied the performance of the model with decreasing amounts of training data

Training Training Iteration Error Rate (%) Corpus Training Test Size Stage 2 Stage 4 Slot Frame Slot Frame Baseline 43.41 7.20 57.36 14.90 all 1, UPenn Trbnk 8.44 2.10 36.93 16.08 1/2 all 1, UPenn Trbnk — — 43.76 18.44 1/4 all 1, UPenn Trbnk — — 49.47 22.98

✔ performance degradation w/ training data size is severe ✔ more training data and model parameterization that makes more effective use of the

training data is likely to help

Microsoft Research Speech.Net

slide-15
SLIDE 15

Error Analysis

investigated the correlation between the semantic frame/slot accuracy and the num-

ber of semantic slots in a sentence Error Rate (%)

  • No. slots/sent

Slot Frame

  • No. Sent

1 43.97 18.01 755 2 39.23 16.27 209 3 26.44 5.17 58 4 26.50 4.00 50 5+ 21.19 6.90 29

✔ Sentences containing more semantic slots are less ambiguous from an information

extraction point of view

Microsoft Research Speech.Net

slide-16
SLIDE 16

Conclusions

✔ Presented a data driven approach to information extraction that outperforms a man-

ually written semantic grammar

✔ Coupling of syntactic and semantic information improves information extraction ac-

curacy, as shown previously by Miller et al., NAACL 2000

Future Work

✘ Use a statistical modeling technique that makes better use of limited amounts of

training data and rich conditioning information — maximum entropy

✘ Aim at information extraction from speech: treat the word sequence as a hidden

variable, thus finding the most likely semantic parse given a speech utterance

Microsoft Research Speech.Net