learning from and about context in semantic parsing
play

Learning From and About Context in Semantic Parsing Luke - PowerPoint PPT Presentation

Learning From and About Context in Semantic Parsing Luke Zettlemoyer Computer Science & Engineering University of Washington joint work with Yoav Artzi, Michael Collins, Tom Kwiatkowski, Sharon Goldwater, Mark Steedman


  1. An Example Analysis Context: Current sentence: λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Step 1: Context-independent parse

  2. An Example Analysis Context: Current sentence: λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Step 1: Context-independent parse

  3. An Example Analysis Context: Current sentence: λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Step 1: Context-independent parse Step 2: Resolve reference

  4. An Example Analysis Context: Current sentence: λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) Step 1: Context-independent parse Step 2: Resolve reference

  5. An Example Analysis Context: Current sentence: λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Step 1: Context-independent parse Step 2: Resolve reference

  6. Derivations λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Three step process: • Step 1: Context-independent parsing • Step 2: Resolve all references • Step 3: Optionally, perform an elaboration

  7. Derivations λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Three step process: • Step 1: Context-independent parsing • Step 2: Resolve all references • Step 3: Optionally, perform an elaboration

  8. Step 1: Referential lexical items List ones from Newark on Friday. λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI )

  9. Step 1: Referential lexical items List ones from Newark on Friday. λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) First extension: Add referential lexical items ones N λ x. ! f ( x) NP ! e it ...

  10. Step 1: Type-shifting operations Second extension: Add type-shifting operators for elliptical expressions the cheapest NP/N λ g.argmin ( g , λ y.cost ( y ))

  11. Step 1: Type-shifting operations Second extension: Add type-shifting operators for elliptical expressions the cheapest NP/N λ g.argmin ( g , λ y.cost ( y )) NP argmin ( λ x. ! f ( x ), λ y.cost ( y ))

  12. Step 1: Type-shifting operations Second extension: Add type-shifting operators for elliptical expressions the cheapest NP/N λ g.argmin ( g , λ y.cost ( y )) NP argmin ( λ x. ! f ( x ), λ y.cost ( y )) A/B : g => A : g( λ x. ! f ( x ) ) where g is a function with input type <e,t>

  13. Derivations λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Three step process: • Step 1: Context-independent parsing • Step 2: Resolve all references • Step 3: Optionally, perform an elaboration

  14. Step 2: Resolving References λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) For each reference: 1. Select a (sub)expression from the context 2. Substitute into current analysis

  15. Step 2: Selecting from Context For each logical form in context, enumerate e and <e,t> type subexpressions: Context: λ x.flight ( x ) ∧ from ( x,NYC ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,BOS ), λ y.depart ( y ))

  16. Step 2: Selecting from Context For each logical form in context, enumerate e and <e,t> type subexpressions: Context: SEA λ x.flight ( x ) ∧ from ( x,NYC ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,BOS ), λ y.depart ( y ))

  17. Step 2: Selecting from Context For each logical form in context, enumerate e and <e,t> type subexpressions: Context: SEA λ x.flight ( x ) ∧ from ( x,NYC ) ∧ to ( x,SEA ) NYC λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,BOS ), λ y.depart ( y ))

  18. Step 2: Selecting from Context For each logical form in context, enumerate e and <e,t> type subexpressions: Context: SEA λ x.flight ( x ) ∧ from ( x,NYC ) ∧ to ( x,SEA ) NYC λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ from ( x,NYC ) ∧ to ( x,SEA ) argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,BOS ), λ y.depart ( y ))

  19. Step 2: Selecting from Context For each logical form in context, enumerate e and <e,t> type subexpressions: Context: SEA λ x.flight ( x ) ∧ from ( x,NYC ) ∧ to ( x,SEA ) NYC λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ from ( x,NYC ) ∧ to ( x,SEA ) argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,BOS ), λ x.from ( x,NYC ) λ y.depart ( y )) λ x.flight ( x ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ to ( x,SEA ) ∧ from ( x,NYC )

  20. Step 2: Selecting from Context For each logical form in context, enumerate e and <e,t> type subexpressions: Context: SEA λ x.flight ( x ) ∧ from ( x,NYC ) ∧ to ( x,SEA ) NYC λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ from ( x,NYC ) ∧ to ( x,SEA ) argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,BOS ), λ x.from ( x,NYC ) λ y.depart ( y )) λ x.flight ( x ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ to ( x,SEA ) ∧ from ( x,NYC ) λ x.to ( x,SEA ) λ x.from ( x,NYC ) λ x.flight ( x )

  21. Step 2: Resolving References λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) For each reference: • Select an expression from the context • Substitute into current analysis

  22. Derivations λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Three step process: • Step 1: Context-independent parsing • Step 2: Resolve all references • Step 3: Optionally, perform an elaboration

  23. Step 3: Elaboration operations Show me the latest flight from New York to Seattle. argmax ( λ x.flight ( x ) ∧ from ( x,NYC) ∧ to ( x,SEA ) , λ y.time ( y )) on Friday

  24. Step 3: Elaboration operations Show me the latest flight from New York to Seattle. argmax ( λ x.flight ( x ) ∧ from ( x,NYC) ∧ to ( x,SEA ) , λ y.time ( y )) on Friday argmax ( λ x.flight ( x ) ∧ from ( x,NYC) ∧ to ( x,SEA ) ∧ day ( x,FRI ), λ y.time ( y ))

  25. Step 3: Elaboration operations on Friday argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NYC ), λ y.time ( y )) λ x. day ( x,FRI )

  26. Step 3: Elaboration operations on Friday argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NYC ), λ y.time ( y )) λ x. day ( x,FRI ) λ f.argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NYC ) ∧ f ( x ) , λ y.time ( y ))

  27. Step 3: Elaboration operations on Friday argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NYC ), λ y.time ( y )) λ x. day ( x,FRI ) λ f.argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NYC ) ∧ f ( x ) , λ y.time ( y ))

  28. Step 3: Elaboration operations on Friday argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NYC ), λ y.time ( y )) λ x. day ( x,FRI ) λ f.argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NYC ) ∧ f ( x ) , λ y.time ( y )) argmax ( λ x.flight ( x ) ∧ from ( x,NYC) ∧ to ( x,SEA ) ∧ day ( x,FRI ), λ y.time ( y ))

  29. Step 3: Elaboration operations on Friday argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NYC ), λ y.time ( y )) λ x. day ( x,FRI ) λ f.argmax ( λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NYC ) ∧ f ( x ) , λ y.time ( y )) argmax ( λ x.flight ( x ) ∧ from ( x,NYC) ∧ to ( x,SEA ) ∧ day ( x,FRI ), λ y.time ( y )) Possible elaborations: • Potentially expand any embedded variable • Can do deletions on elaboration function

  30. Derivations λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Three step process: • Step 1: Context-independent parsing • Step 2: Resolve all references • Step 3: Optionally, perform an elaboration

  31. Scoring Derivations λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) d λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI )

  32. Scoring Derivations λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) d λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Weighted linear model: • Introduce features: f ( d ) • Compute scores for derivations: w · f ( d )

  33. Features for Derivations: f ( d ) λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Parsing features: set from Zettlemoyer and Collins (2007) Context features: • Distance indicators, for integers (0,1,2,...) • Copy indicators, for all predicates { flight , from , to , ...} • Deletion indicators, for all pairs of predicates {( from , flight ), ( from , from ), ( from , to ), ...}

  34. Features for Derivations: f ( d ) λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Parsing features: set from Zettlemoyer and Collins (2007) Context features: • Distance indicators, for integers (0,1,2,...) • Copy indicators, for all predicates { flight , from , to , ...} • Deletion indicators, for all pairs of predicates {( from , flight ), ( from , from ), ( from , to ), ...}

  35. Features for Derivations: f ( d ) λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Parsing features: set from Zettlemoyer and Collins (2007) Context features: • Distance indicators, for integers (0,1,2,...) • Copy indicators, for all predicates { flight , from , to , ...} • Deletion indicators, for all pairs of predicates {( from , flight ), ( from , from ), ( from , to ), ...}

  36. Features for Derivations: f ( d ) λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Parsing features: set from Zettlemoyer and Collins (2007) Context features: • Distance indicators, for integers (0,1,2,...) • Copy indicators, for all predicates { flight , from , to , ...} • Deletion indicators, for all pairs of predicates {( from , flight ), ( from , from ), ( from , to ), ...}

  37. Features for Derivations: f ( d ) λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Parsing features: set from Zettlemoyer and Collins (2007) Context features: • Distance indicators, for integers (0,1,2,...) • Copy indicators, for all predicates { flight , from , to , ...} • Deletion indicators, for all pairs of predicates {( from , flight ), ( from , from ), ( from , to ), ...}

  38. Features for Derivations: f ( d ) λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Parsing features: set from Zettlemoyer and Collins (2007) Context features: • Distance indicators, for integers (0,1,2,...) • Copy indicators, for all predicates { flight , from , to , ...} • Deletion indicators, for all pairs of predicates {( from , flight ), ( from , from ), ( from , to ), ...}

  39. Features for Derivations: f ( d ) λ x.flight ( x ) ∧ from ( x,NYC ) List ones from Newark on Friday. ∧ to ( x,SEA ) λ x. ! f ( x ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) λ x.flight ( x ) ∧ to ( x,SEA ) λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) Parsing features: set from Zettlemoyer and Collins (2007) Context features: • Distance indicators, for integers (0,1,2,...) • Copy indicators, for all predicates { flight , from , to , ...} • Deletion indicators, for all pairs of predicates {( from , flight ), ( from , from ), ( from , to ), ...}

  40. Inference and Learning Two computations: Best derivation: • Best derivation with final logical form z : • We use a beam search algorithm.

  41. Inference and Learning Two computations: Best derivation: • Best derivation with final logical form z : • We use a beam search algorithm. Learning: Hidden variable version of the structured perceptron algorithm • [Liang et al., 2006] [Zettlemoyer & Collins, 2007]

  42. Inputs: Training set { I i | i =1... n } of interactions. Each interaction I = {( w i,j ,z i,j ) | j = 1 ...n i } is a sequence of sentences and logical forms. Initial parameters w . Number of iterations T . Output: Parameters w .

  43. Inputs: Training set { I i | i =1... n } of interactions. Each interaction I = {( w i,j ,z i,j ) | j = 1 ...n i } is a sequence of sentences and logical forms. Initial parameters w . Number of iterations T . Computation: For t =1... T, i =1... n : (Iterate interactions) Set C = {} (Reset Context) For j =1... n i : (Iterate training examples) Output: Parameters w .

  44. Inputs: Training set { I i | i =1... n } of interactions. Each interaction I = {( w i,j ,z i,j ) | j = 1 ...n i } is a sequence of sentences and logical forms. Initial parameters w . Number of iterations T . Computation: For t =1... T, i =1... n : (Iterate interactions) Set C = {} (Reset Context) For j =1... n i : (Iterate training examples) Step 3: Update context: Append z i,j to C Output: Parameters w .

  45. Inputs: Training set { I i | i =1... n } of interactions. Each interaction I = {( w i,j ,z i,j ) | j = 1 ...n i } is a sequence of sentences and logical forms. Initial parameters w . Number of iterations T . Computation: For t =1... T, i =1... n : (Iterate interactions) Set C = {} (Reset Context) For j =1... n i : (Iterate training examples) Step 1: Check Correctness • Find best analysis: • If correct: L ( d *) == z i,j , go to the Step 3. Step 3: Update context: Append z i,j to C Output: Parameters w .

  46. Inputs: Training set { I i | i =1... n } of interactions. Each interaction I = {( w i,j ,z i,j ) | j = 1 ...n i } is a sequence of sentences and logical forms. Initial parameters w . Number of iterations T . Computation: For t =1... T, i =1... n : (Iterate interactions) Set C = {} (Reset Context) For j =1... n i : (Iterate training examples) Step 1: Check Correctness • Find best analysis: • If correct: L ( d *) == z i,j , go to the Step 3. Step 2: Update Parameters • Find best correct analysis: • Update parameters: w = w + f ( d ′ ) � f ( d *) Step 3: Update context: Append z i,j to C Output: Parameters w .

  47. Inputs: Training set { I i | i =1... n } of interactions. Each interaction I = {( w i,j ,z i,j ) | j = 1 ...n i } is a sequence of sentences and logical forms. Initial parameters w . Number of iterations T . Computation: For t =1... T, i =1... n : (Iterate interactions) Set C = {} (Reset Context) For j =1... n i : (Iterate training examples) Step 1: Check Correctness • Find best analysis: • If correct: L ( d *) == z i,j , go to the Step 3. Step 2: Update Parameters • Find best correct analysis: • Update parameters: w = w + f ( d ′ ) � f ( d *) Step 3: Update context: Append z i,j to C Output: Parameters w .

  48. Inputs: Training set { I i | i =1... n } of interactions. Each interaction I = {( w i,j ,z i,j ) | j = 1 ...n i } is a sequence of sentences and logical forms. Initial parameters w . Number of iterations T . Computation: For t =1... T, i =1... n : (Iterate interactions) Set C = {} (Reset Context) For j =1... n i : (Iterate training examples) Step 1: Check Correctness • Find best analysis: • If correct: L ( d *) == z i,j , go to the Step 3. Step 2: Update Parameters • Find best correct analysis: • Update parameters: w = w + f ( d ′ ) � f ( d *) Step 3: Update context: Append z i,j to C Output: Parameters w .

  49. Evaluation • Domain: ATIS travel database queries • 399 training interactions (3813 sentences) • 127 test interactions (826 sentences) • Comparison: previous state-of-the-art [Miller et al. 1996] • requires full annotation of all syntactic, semantic, and context-resolution decisions • decision tree learning

  50. probability along the path corresponding to T. resulting in an augmented tree structure. For example, the instructions needed to create the frame shown in Figure 3 are: 1. Create an Air-Transportation frame. 3.2 Training the Parsing Model 2. Fill the Show slot with Arrival-Time. Transition probabilities are estimated directly by observing occurrence and transition frequencies in a training corpus of 3. Fill the Origin slot with (City "Boston") annotated parse trees. These estimates are then smoothed to 4. Fill the Destination slot with (City "Atlanta") overcome sparse data limitations. The semantic/syntactic parse labels, described above, provide a further advantage in These instructions are attached to the parse tree at the points terms of smoothing: for cases of undertrained probability indicated by the circled numbers (see estimates, the model backs off to independent syntactic and Directly modeling P(Mo I W,/-/) is difficult because the gap 0 = argmax EP(MD IH, M s) P(Ms,T) P(WI Ms,T). M semantic probabilities as follows: that the model must span is large. A common approach in Figure 2). The probability P(Ms,T ) is then simply the Mo M s ,T non-statistical natural language systems is to bridge this gap by introducing intermediate representations such as parse We now introduce a third independence assumption: prior probability of producing the augmented tree structure. structure and pre-discourse sentence meaning. Introducing Ps(semlsyn n I semlsynn_ 1 ,semlsyn up) = 3. The probability of words W does not depend on meaning these intermediate levels into the statistical framework gives: Ms, given that parse Tis known. 0 =argmax EP(MD IW, H, Ms,T)P(Ms,TIW, H) M ~.( semlsyn n I semlsynn_ l ,seral syn up) This assumption is justified because the word tags in our MD M s,T parse representation specify both semantic and syntactic class 4.1 Statistical Interpretation Model information. Under this assumption: where T denotes a semantic parse tree, and Ms denotes pre- x P(semlsyn n I semlsynn_ 1 ,sem/syn up) discourse sentence meaning. This expression can be 0 = argmax EP(Mo IH, M s) P(Ms,T) P(WIT) M simplified by introducing two independence assumptions: MD M s ,T Meanings Ms are decomposed into two parts: the frame type + (1 - ,].(semlsyn n I semlsynn_ ! ,semlsyn up) 1. Neither the parse tree T, nor the pre-discourse meaning Finally, we assume that most of the probability mass for each Ms, depends on the discourse history H. FT, and the slot fillers S. The frame type is always attached discourse-dependent meaning is focused on a single parse X P(sem n I semup) P(syn n I synn_l,synup) The post-discourse meaning Mo does not depend on the tree and on a single pre-discourse meaning. Under this 2. to the topmost node in the augmented parse tree, while the (Viterbi) assumption, the summation operator can be words W or the parse structure T, once the pre-discourse Ms is determined. replaced by the maximization operator, yielding: meaning slot filling instructions are attached to nodes lower down in Under these assumptions, where Z is estimated as in (Placeway et al. 1993). Backing Mo = arg max( max ( P( M o l H, M s ) P( M s,T) P(W I T) ) ] the tree. Except for the topmost node, all parse nodes are M D ~.Ms,T 0 = argmax EP(MD IH'Ms) P(Ms'TIW) " M off to independent semantic and syntactic probabilities Mo M s ,T required to have some slot filling operation. For nodes that This expression corresponds to the computation actually potentially provides more precise estimates than the usual 3. The constrained space of candidate pre-discourse performed by our system which is shown in Figure 1. process: semantic labels identify the basic units of Next, the probability P(Ms,TIW) can be rewritten using do not directly trigger any slot fill operation, the special meanings Ms (received from the semantic interpretation meaning, while syntactic structures help identify Bayes rule as: strategy of backing off directly form bigram to unigram model), combined with the full space of possible post- relationships between those units. Processing proceeds in three stages: operation null is attached. The probability P(Ms, T) is then: P( M s ,T) P(W I M S ,T) discourse meanings Mo, is searched for the single P(M s,T I W) = models. 1. Word string W arrives at the parsing model. The full P(W) candidate that maximizes space of possible parses T is searched for n-best 3.1 Statistical Parsing Model P( M o I H, M s) P( M s,T) P(W I T), conditioned on the candidates according to the measure P(T)P(WIT). P( Ms,T) = P( FT, S,T)= P( FT) P(T I FT) P(S I FT, T). leading to: The parsing model is a probabilistic recursive transition current history H. The discourse history is then updated These parses, together with their probability scores, are P(MD IH'Ms) P(Ms'T) P(WI Ms,T) network similar to those described in (Miller et ai. 1994) and M 0 = argmax E and the post-discourse meaning is returned. passed to the semantic interpretation model. 3.3 Searching the Parsing Model (Seneff 1992). The probability of a parse tree T given a word MD Ms,r P(W) Obviously, the prior probabilities P(FT) can be obtained 2. The constrained space of candidate parses T (received We now proceed to a detailed discussion of each of these string Wis rewritten using Bayes role as: from the parsing model), combined with the full space Now, since P(W) is constant for any given word string, the three stages, beginning with parsing. directly from the training data. To compute P(T I FT), each of possible pre-discourse meanings Ms, is searched for 34o that maximizes problem of finding meaning P(T) P(W I T) In order to explore the space of possible parses efficiently, P(TIW) = n-best candidates according to the measure P(W) P(M S,T) P(WI M S,T) of the state transitions from the previous parsing model are P(M s,T) P(W I T). These pre-discourse meanings, E P(M D IH, M s) the parsing model is searched using a decoder based on an 3. Parsing P(W) together with their associated probability scores, are Since P(W) is constant for any given word string, candidate ,T M s simply rescored conditioned on the frame type. The new passed to the discourse model. Our parse representation is essentially syntactic in form, parses can be ranked by considering only the product adaptation of the Earley parsing algorithm (Earley 1970). P(T) Mo that maximizes is equivalent to finding patterned on a simplified head-centered theory of phrase P(W I 7"). The probability P(T) is modeled by state transition state transition probabilities are: This adaptation, related to that of (Stolcke 1995), involves structure. In content, however, the parse trees are as much probabilities in the recursive transition network, and P(W I T) E P(M D I H, Ms) P(Ms ,T) P(WI M S,T). semantic as syntactic. Specifically, each parse node indicates is modeled by word transition probabilities. M s ,T reformulating the Earley algorithm to work with probabilistic both a semantic and a syntactic class (excepting a few types P(state n I staten_ t, stateup, FT) . * State transition probabilities have the form Thus, that serve purely syntactic functions). Figure 2 shows a recursive transition networks rather than with deterministic P(state n I staten_l, stateup) . For example, sample parse of a typical ATIS sentence. The semantic/syntactic character of this representation offers P(location/pp I arrival/vp-head, arrival/vp) is the production rules. For details of the decoder, see (Miller To compute P(S I FT, T) , we make the independence Miller et al. [1996] several advantages: probability of a location/pp following an arrival/vp- 1996). assumption that slot filling operations depend only on the head within an arrival/vp constituent. ___ Parsing ~ lnterpretati°n I f[ Model 1. Annotation: Well-founded syntactic principles provide Model j \ Model y \ a framework for designing an organized and consistent frame type, the slot operations already performed, and on the • Word transition probabilities have the form / / / annotation schema. P(word n I wordn_ l,tag) . For example, local parse structure around the operation. This local P("class" I "first", class-of-service/npr) is the probability 2. Decoding: Semantic and syntactic constraints are P(T)P(WIT) P(Ms,T)P(WIT) P(MolMs,H)P(Ms,T)P(WIT) of the word sequence "first class" given the tag simultaneously available during the decoding process; neighborhood consists of the parse node itself, its two left 4. Semantic Interpretation Figure 1: Overview of statistical processing. the decoder searches for parses that are both class-of-service/npr. siblings, its two right siblings, and its four immediate syntactically and semantically coherent. Step 1: Semantic parsing Each parse tree T corresponds directly with a path through Step 2: Select frame and fill Both pre-discourse and post-discourse meanings in our 3. Semantic Interpretation: Semantic/syntactic parse trees the recursive transition network. The probability ancestors. Further, the syntactic and semantic components of are immediately useful to the semantic interpretation P(T) P(W I 1") is simply the product of each transition 56 current system are represented using a simple frame slot values these nodes are considered independently. Under these representation. Figure 3 shows a sample semantic frame assumptions, the probability of a slot fill operation is: corresponding to the parse in Figure 2. /wh-question P(slot n I FT, Sn_l,semn_ 2 ..... sem n ..... semn+2, Air-Transportation Synn-2 ..... synn ..... Synn+2, /vp /vp // Show: (Arrival-Time) semupl ..... semup4, Synupl ..... synup4 ) // // Origin: (City "Boston") / / / / 1 / / ~v~P a~re and the probability P(S I FT, T) is simply the product of all Destination: (City "Atlanta") / I / such slot fill operations in the augmented tree. ation p Q Figure 3: A sample semantic frame. arrival location city 4.2 Training the Semantic Interpretation /vp-head /prep /npr /wh-head /aux /det /np-head /comp /vp-head /prep /apt J J I I I I I I I I I Recall that the semantic interpreter is required to compute Model arrive in Atlanta When do the flights that leave from Boston The conditional word probability P(Ms,T) P(WIT ). Figure 2: A sample parse tree. Transition probabilities are estimated from a training corpus P(WIT) has already been computed during the parsing Step 3: Optionally copy slot values from previous frames of augmented trees. Unlike probabilities in the parsing 57 phase and need not be recomputed. The current problem, model, there obviously is not sufficient training data to then, is to compute the prior probability of meaning Ms and estimate slot fill probabilities directly. Instead, these parse T occurring together. Our strategy is to embed the probabilities are estimated by statistical decision trees similar instructions for constructing Ms directly into parse T o 58

  51. Evaluation • Domain: ATIS travel database queries • 399 training interactions (3813 sentences) • 127 test interactions (826 sentences) • Comparison: previous state-of-the-art [Miller et al. 1996] • Metric: accuracy recovering fully correct meanings

  52. Evaluation • Domain: ATIS travel database queries • 399 training interactions (3813 sentences) • 127 test interactions (826 sentences) • Comparison: previous state-of-the-art [Miller et al. 1996] • Metric: accuracy recovering fully correct meanings • Results: • improved accuracy: 78.4% => 83.7% • less engineering effort: only annotated final meanings

  53. Varying the Length of a Context Window M ATIS Development Set: Context Length Accuracy M= 0 45.4 M= 1 79.8 M= 2 81.0 M= 3 82.1 M= 4 81.6 M= 10 81.4

  54. Example Learned Feature Weights Negative weights: • Distance features: (1,2,3,...) Positive weights: • Copy features: flight , from , to • Deletion features: ( from, from ), ( nonstop, connect ), ( during-day, time )

  55. Summary λ x.flight ( x ) ∧ from ( x,NYC ) ∧ to ( x,SEA ) Show me the cheapest? λ x.flight ( x ) ∧ to ( x,SEA ) ∧ from ( x,NEW ) ∧ day ( x,FRI ) F argmax ( λ x.flight ( x ) ∧ from ( x,NEW) ∧ to ( x,SEA ) ∧ day ( x,FRI ), λ y.cost ( y )) Key challenges: • Structured input and output, hidden structure not annotated Solution: • Analysis: two-stage approach • Learn: how to incorporate meaning from the context

  56. Bootstrapping'Seman=c'Parsers' from'Conversa=ons [Artzi*&*Ze9lemoyer,*EMNLP*2011]

  57. Task'Oriented'Dialog SYSTEM how'can'i'help'you'? USER i'''d'like'to'go'from'richmond'to'atlan=c'city'on'june'seventh SYSTEM leaving'richmond'on'thursday,'june'seventh'arriving'in'atlan=c'city'leaving'at' what'=me'? USER eight'a.'m. SYSTEM leaving'in'the'morning'what'airline'would'you'prefer'? USER u.'s.'air' SYSTEM i'currently'have'you'leaving'richmond'on'thursday,'june'seventh'in'the' morning'arriving'in'atlan=c'city'flying'on'u.'s.'air'.'is'this'correct'? USER yes' SYSTEM o'.'k'.'then'.'where'would'you'like'to'go'next'? USER richmond [conversa=o sa=on'con=nues]

  58. Dialog'and'Seman=cs • Many!dialog!systems!use!seman)c!parsers. • Supervised!learning!of!such!parsers!is! expensive. • Can!computers!learn!through!raw! conversa)ons?

  59. Learning'from'Conversa=ons • Human9machine!dialogs!aim!to!achieve!a! goal . • A!well9defined!goal!allows!the!system!to! recover !from!failures.

  60. System'Recovery SYSTEM:' ' how'can'i'help'you? USER:' ' i'would'like'to'fly'from'atlanta'georgia'to'london'england' on'september'twenty'fourth'in'the'early'evening'i'would' like'to'return'on'october'first'depar=ng'from'london'in' the'late'morning'

  61. System'Recovery SYSTEM:' ' how'can'i'help'you? USER:' ' i'would'like'to'fly'from'atlanta'georgia'to'london'england' on'september'twenty'fourth'in'the'early'evening'i'would' like'to'return'on'october'first'depar=ng'from'london'in' the'late'morning' System fails

  62. System'Recovery SYSTEM:' ' how'can'i'help'you? USER:' ' i'would'like'to'fly'from'atlanta'georgia'to'london'england' on'september'twenty'fourth'in'the'early'evening'i'would' like'to'return'on'october'first'depar=ng'from'london'in' the'late'morning' SYSTEM:' ' leaving'what'city? System fails

  63. System'Recovery SYSTEM:' ' how'can'i'help'you? USER:' ' i'would'like'to'fly'from'atlanta'georgia'to'london'england' on'september'twenty'fourth'in'the'early'evening'i'would' like'to'return'on'october'first'depar=ng'from'london'in' the'late'morning' SYSTEM:' ' leaving'what'city? USER:' ' atlanta'georgia'

  64. System'Recovery SYSTEM:' ' how'can'i'help'you? USER:' ' i'would'like'to'fly'from'atlanta'georgia'to'london'england' on'september'twenty'fourth'in'the'early'evening'i'would' like'to'return'on'october'first'depar=ng'from'london'in' the'late'morning' SYSTEM:' ' leaving'what'city? USER:' ' atlanta'georgia' SYSTEM:' ' leaving'atlanta.'going'to'which'city?'

  65. System'Recovery SYSTEM:' ' how'can'i'help'you? USER:' ' i'would'like'to'fly'from'atlanta'georgia'to'london'england' on'september'twenty'fourth'in'the'early'evening'i'would' like'to'return'on'october'first'depar=ng'from'london'in' the'late'morning' SYSTEM:' ' leaving'what'city? USER:' ' atlanta'georgia' SYSTEM:' ' leaving'atlanta.'going'to'which'city?' USER:' ' london'

  66. System'Recovery SYSTEM:' ' how'can'i'help'you? USER:' ' i'would'like'to'fly'from'atlanta'georgia'to'london'england' on'september'twenty'fourth'in'the'early'evening'i'would' like'to'return'on'october'first'depar=ng'from'london'in' the'late'morning' SYSTEM:' ' leaving'what'city? USER:' ' atlanta'georgia' SYSTEM:' ' leaving'atlanta.'going'to'which'city?' USER:' ' london' SYSTEM:' ' arriving'in'london'england.'what'date'would'you'like'to' depart'atlanta?

  67. System'Recovery SYSTEM:' ' how'can'i'help'you? USER:' ' i'would'like'to'fly'from'atlanta'georgia'to'london'england' on'september'twenty'fourth'in'the'early'evening'i'would' like'to'return'on'october'first'depar=ng'from'london'in' the'late'morning' SYSTEM:' ' leaving'what'city? USER:' ' atlanta'georgia' SYSTEM:' ' leaving'atlanta.'going'to'which'city?' USER:' ' london' SYSTEM:' ' arriving'in'london'england.'what'date'would'you'like'to' depart'atlanta? USER:' ' september'twenty'fourth'in'the'early'evening' [conversa=on'con=nues]

  68. Automa=cally'Annotated'Logs SYSTEM:' ' how'can'i'help'you?'(OPEN_TASK) USER:' ' i'would'like'to'fly'from'atlanta'georgia'to'london'england' on'september'twenty'fourth'in'the'early'evening'i'would' like'to'return'on'october'first'depar=ng'from'london'in' the'late'morning' SYSTEM:' ' leaving'what'city?'(ASK:'λ x.from ( fl ,' x )) USER:' ' atlanta'georgia' SYSTEM:' ' leaving'atlanta.'(CONFIRM:' from ( fl ,' ATL ))'going'to'which' city?'(ASK:'λ x . to ( fl ,' x )) USER:' ' london' [conversa=on'con=nues]

  69. Automa=cally'Annotated'Logs SYSTEM:' ' how'can'i'help'you?'(OPEN_TASK) USER:' ' i'would'like'to'fly'from'atlanta'georgia'to'london'england' on'september'twenty'fourth'in'the'early'evening'i'would' like'to'return'on'october'first'depar=ng'from'london'in' the'late'morning' SYSTEM:' ' leaving'what'city?'(ASK:'λ x.from ( fl ,' x )) USER:' ' atlanta'georgia' SYSTEM:' ' leaving'atlanta.'(CONFIRM:' from ( fl ,' ATL ))'going'to'which' city?'(ASK:'λ x . to ( fl ,' x )) USER:' ' london' [conversa=on'con=nues] System'u]erance

  70. Automa=cally'Annotated'Logs SYSTEM:' ' how'can'i'help'you?'(OPEN_TASK) USER:' ' i'would'like'to'fly'from'atlanta'georgia'to'london'england' on'september'twenty'fourth'in'the'early'evening'i'would' like'to'return'on'october'first'depar=ng'from'london'in' the'late'morning' SYSTEM:' ' leaving'what'city?'(ASK:'λ x.from ( fl ,' x )) USER:' ' atlanta'georgia' SYSTEM:' ' leaving'atlanta.'(CONFIRM:' from ( fl ,' ATL ))'going'to'which' city?'(ASK:'λ x . to ( fl ,' x )) USER:' ' london' [conversa=on'con=nues] System'u]erance Natural'Language

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend