AMR-to-Text Generation via GF Normunds Grztis University of - - PowerPoint PPT Presentation

amr to text generation via gf
SMART_READER_LITE
LIVE PREVIEW

AMR-to-Text Generation via GF Normunds Grztis University of - - PowerPoint PPT Presentation

AMR-to-Text Generation via GF Normunds Grztis University of Latvia, Institute of Mathematics and Computer Science National information agency LETA GF Summer School 2017, Rga, Latvia This work has received funding in part from the


slide-1
SLIDE 1

AMR-to-Text Generation via GF

Normunds Grūzītis

University of Latvia, Institute of Mathematics and Computer Science National information agency LETA

GF Summer School 2017, Rīga, Latvia

This work has received funding in part from the Latvian State research programs SOPHIS and NexIT, the EU Horizon 2020 project SUMMA (grant No. 688139), and the European Regional Development Fund (grant No. 1.1.1.1/16/A/219).

slide-2
SLIDE 2
  • Frame semantics

– FrameNet – PropBank

  • AMR
  • Text-to-AMR parsing, AMR-to-text generation

– SemEval 2016 – SemEval 2017

Agenda

slide-3
SLIDE 3

Semantic Role Labelling (SRL)

TurboParser + SEMAFOR: http://demo.ark.cs.cmu.edu/parse FrameNet LTH parser: http://barbar.cs.lth.se:8081/ PropBank

slide-4
SLIDE 4

FrameNet (https://framenet.icsi.berkeley.edu)

slide-5
SLIDE 5

FrameNet (https://framenet.icsi.berkeley.edu)

slide-6
SLIDE 6

FrameNet (FN)

  • A lexico-semantic resource based on the theory of frame

semantics (Fillmore et al. 2003)

– A semantic frame represents a cognitive, prototypical situation (scenario) characterized by frame elements (FE) – semantic valence – Frames are “evoked” in sentences by target words – lexical units (LU) – FEs are mapped based on the syntactic valence of the LU

  • The syntactic valence patterns are derived from FN-annotated corpora

(for an increasing number of languages, incl. Latvian)

– FEs are split into core and non-core ones

  • Core FEs uniquely characterize the frame and syntactically tend to

correspond to verb arguments

  • Non-core FEs are not specific to the frame and typically are adjuncts
slide-7
SLIDE 7

Berkeley FrameNet as Interlingua

want.v..6412 känna_för.vb..1 Introduced in BFN, reused in SweFN

e.g. “[I]Experiencer do n't WANT [to deceive anyone]Event” | an embedded frame

Some valence patterns found in SweFN Some valence patterns found in BFN

e.g. “[Jag]Experiencer KÄNNER FÖR [en tur på landet]Focal_participant”

slide-8
SLIDE 8

FrameNet and GF

  • Existing FNs are not entirely formal and computational

– A limited but computational FN-based GF grammar and lexicon

  • Grammatical Framework:

– Separates between an abstract syntax and concrete syntaxes – Provides a general-purpose resource grammar library (RGL)

  • The language-independent layer of FrameNet (frames and FEs) –

the abstract syntax

– The language-specific layers (surface realization of frames and FEs; LUs) – concrete syntaxes

  • RGL can be used for unifying the syntactic types used in different

FNs and for the concrete implementation of frames

– FrameNet allows for abstracting over RGL

slide-9
SLIDE 9

Use case (1)

  • Provide a semantic API on top of RGL to facilitate the development
  • f GF application grammars

– In combination with the syntactic API of RGL – Hiding the comparatively complex construction of verb phrases

mkCl person (mkVP (mkVP live_V) (mkAdv in_Prep place))

  • - mkCl : NP -> VP -> Cl
  • - mkVP : V -> VP
  • - mkVP : VP -> Adv -> VP
  • - mkAdv : Prep -> NP -> Adv

Residence

  • - Residence : NP -> Adv -> V -> Cl

person

  • - NP (Resident)

(mkAdv in_Prep place) -- Adv (Location) live_V_Residence

  • - V (LU)
slide-10
SLIDE 10

Use case (2)

  • FN-annotated knowledge bases à multilingual verbalization

Imants Ziedonis ir dzimis 1933. gada 3. maijā Slokas pagastā. Imants Ziedonis was born in Sloka parish on 3 May 1933.

slide-11
SLIDE 11
  • Frame valence patterns are represented by functions

– Taking one or more core FEs (A-Z) and one LU as arguments – Returning an object of type Clause whose linearization type is

{np: NP; vp: VP}

  • FEs are declared as semantic categories subcategorized by

the syntactic RGL types

– NP, VP, Adv (includes prepositional objects), S (embedded sentences), QS

FrameNet-based grammar: abstract

cat Event_VP cat Focal_participant_NP cat Experiencer_NP cat Focal_participant_Adv fun Desiring_V : Experiencer_NP -> Focal_participant_Adv -> V -> Clause fun Desiring_V2 : Experiencer_NP -> Focal_participant_NP -> V2 -> Clause fun Desiring_V2_Pass : Experiencer_NP -> Focal_participant_NP -> V2 -> Clause fun Desiring_VV : Event_VP -> Experiencer_NP -> VV -> Clause

slide-12
SLIDE 12
  • The mapping from the semantic FrameNet types to the syntactic RGL

types is shared for all languages

– Linearization types are of type Maybe to allow for optional (empty) FEs

  • To implement the frame functions, RGL constructors are applied to the

arguments depending on their types and syntactic roles, and the voice

FrameNet-based grammar: concrete

lincat Focal_participant_NP = Maybe NP lincat Focal_participant_Adv = Maybe Adv lin Desiring_V2 experiencer focal_participant v2 = { np = fromMaybe NP experiencer ; vp = mkVP v2 (fromMaybe NP focal_participant) } lin Desiring_V2_Pass experiencer focal_participant v2 = { np = fromMaybe NP focal_participant ; vp = mkVP (passiveVP v2) (mkAdv by8agent_Prep (fromMaybe NP experiencer)) }

slide-13
SLIDE 13

http://grammaticalframework.org/framenet/

slide-14
SLIDE 14

Semantic Role Labelling (SRL)

TurboParser + SEMAFOR: http://demo.ark.cs.cmu.edu/parse FrameNet LTH parser: http://barbar.cs.lth.se:8081/ PropBank

slide-15
SLIDE 15

PropBank (http://propbank.github.io)

slide-16
SLIDE 16

AMR (Abstract Meaning Representation)

  • From SRL to whole-sentence meaning representation

– Incl. PropBank SRL, NER and NEL, treatment of modality, negation, etc.

  • Simple and compact data structure

– PENMAN notation: directed labeled graph encoded in a tree-like form – Easy to read and write (for a human), and traverse (for a program) – Langkilde and Knight (1998) à Banarescu et al. (2013)

  • Aimed at large-scale human annotation and semantic parsing

– Practical, replicable amount of abstraction – An actual sembank of 40K+ sentences

  • Captures many aspects of meaning

– Aims to abstract away from (English) syntax

slide-17
SLIDE 17
  • Nodes are variables labelled by concepts

– Entities, events, states, properties – s / soldier: s is an instance of soldier

  • Edges are semantic relations
  • AMR abstracts in numerous ways by assigning the same

conceptual structure to different surface realizations

AMR (Abstract Meaning Representation)

f / fear-01 d / die-01 s / soldier

  • pol arity

ARG1

(Pust et al., 2015)

slide-18
SLIDE 18

Schneider N., Flanigan J., O’Gorman T. AMR Tutorial at NAACL 2015 https://github.com/nschneid/amr-tutorial/

  • AMR is still biased

towards English or other source languages

  • Meanwhile, AMR is

agnostic about how to derive meanings from strings, and vice versa

  • Xue N., Bojar O., Hajič J.,

Palmer M., Uresova Z., Zhang X. Not an Interlingua, but close: Comparison of English AMRs to Chinese and

  • Czech. LREC 2014

AMR (Abstract Meaning Representation)

slide-19
SLIDE 19

Text-to-AMR: human annotation

https://amr.isi.edu/editor.html

slide-20
SLIDE 20

AMR-to-text: human evaluation

slide-21
SLIDE 21

Sample AMR (1)

# ::snt A fourth member, Jean-Marc Rouillan, remains behind bars.

(r / remain-01 :ARG1 (p / person :wiki - :name (n / name :op1 "Jean-Marc" :op2 "Rouillan") :mod (p2 / person :ARG0-of (h / have-org-role-91 :ARG2 (m / member)) :ord (o / ordinal-entity :value 4))) :ARG3 (b / behind :op1 (b2 / bar)))

slide-22
SLIDE 22

Sample AMR (1)

# ::snt A fourth member, Jean-Marc Rouillan, remains behind bars.

(r / remain-01 :ARG1 (p / person :wiki - :name (n / name :op1 "Jean-Marc" :op2 "Rouillan") :mod (p2 / person :ARG0-of (h / have-org-role-91 :ARG2 (m / member)) :ord (o / ordinal-entity :value 4))) :ARG3 (b / behind :op1 (b2 / bar)))

Remaining members person 4 jean-marc rouillan – behind bar. Jean-Marc Rouillan, that is the 4th member, is remained behind a bar.

slide-23
SLIDE 23

Sample AMR (1)

# ::snt A fourth member, Jean-Marc Rouillan, remains behind bars.

(r / remain-01 :ARG1 (p / person :wiki - :name (n / name :op1 "Jean-Marc" :op2 "Rouillan") :mod (p2 / person :ARG0-of (h / have-org-role-91 :ARG2 (m / member)) :ord (o / ordinal-entity :value 4))) :ARG3 (b / behind :op1 (b2 / bar)))

Remaining members person 4 jean-marc rouillan – behind bar. JAMR Jean-Marc Rouillan, that is the 4th member, is remained behind a bar. GF

slide-24
SLIDE 24

Sample AMR (1)

# ::snt A fourth member, Jean-Marc Rouillan, remains behind bars.

(r / remain-01 :ARG1 (p / person :wiki - :name (n / name :op1 "Jean-Marc" :op2 "Rouillan") :mod (p2 / person :ARG0-of (h / have-org-role-91 :ARG2 (m / member)) :ord (o / ordinal-entity :value 4))) :ARG3 (b / behind :op1 (b2 / bar)))

Remaining members person 4 jean-marc rouillan – behind bar. JAMR Jean-Marc Rouillan, that is the 4th member, is remained behind a bar. GF

slide-25
SLIDE 25

Sample AMR (2)

# ::snt They should have been expelled from school at a minimum.

(r / recommend-01 :ARG1 (e / expel-01 :ARG1 (t / they) :ARG2 (s / school) :degree (a / at-a-minimum)))

Should they at-a-minimum expel school. JAMR It is recommended that they are expelled to a school at a minimum. GF

expel-01 ARG0=PAG (prototypical agent) ARG1=PPT (prototypical patient) ARG2=DIR (direction) à DIR_Prep à to_Prep

ToDo: based on statistics from PropBank and FrameNet corpora, “reconstruct” Prep-s, depending on frame/verb valency, ARG role, or NP head

slide-26
SLIDE 26

Sample AMR (3)

# ::snt Texas criminal courts and prosecutors do not coddle to anyone. (c / coddle-01 :polarity – :ARG0 (a / and :op1 (c2 / court :ARG0-of (c4 / criminal-03) :location (s / state :wiki "Texas" :name (n / name :op1 "Texas"))) :op2 (p / person :ARG0-of (p2 / prosecute-01) :location s)) :ARG1 (a2 / anyone)) No texas texas criminal court and prosecutors coddle anyone. JAMR A criminal court in Texas and a person that prosecutes do not coddle anyone. GF

person that prosecutes à prosecutor

  • rganization that governs à government
slide-27
SLIDE 27

Sample AMR (4)

# ::snt How Long are We Going to Tolerate Japan?

(t / tolerate-01 :ARG0 (w / we) :ARG1 (c / country :wiki "Japan" :name (n / name :op1 "Japan")) :duration (a / amr-unknown))

We have tolerated the japan amr-unknown. JAMR How long do we tolerate Japan? GF

if ':mode expressive' in amr: amr = amr.replace(':mode expressive', ' ') + ' !' if ':mode imperative' in amr: amr = amr.replace(':mode imperative', ' ') + ' !' if ':mode interrogative' in amr: amr = amr.replace(':mode interrogative', ' ') + ' ?' if 'cause-01:ARG0(amr-unknown)' in amr: amr = 'why ' + amr.replace('cause-01:ARG0(amr-unknown)', ' ') + ' ?' if ':location(amr-unknown)' in amr: amr = 'where ' + amr.replace(':location(amr-unknown)', ' ') + ' ?' if ':ARG1(amr-unknown)' in amr: amr = 'who ' + amr.replace(':ARG1(amr-unknown)', ' ') + ' ?' if ':mod(amr-unknown)' in amr: amr = 'what ' + amr.replace(':mod(amr-unknown)', ' ') + ' ?' if ':duration(amr-unknown)' in amr: amr = 'how ' + amr.replace(':duration(amr-unknown)', ' ') + ' ?' if 'amr-unknown' in amr: amr = 'what ' + amr.replace('amr-unknown', ' ') + ' ?'

slide-28
SLIDE 28

Sample AMR (5)

# ::snt Xinhua News Agency, Tokyo, September 1st, by reporter Yiguo Yu

(b / byline-91 :ARG0 (p2 / publication :name (n / name :op1 "Xinhua" :op2 "News" :op3 "Agency")) :ARG1 (p / person :name (n2 / name :op1 "Yiguo" :op2 "Yu") :ARG0-of (r / report-01)) :location (c2 / city :name (n3 / name :op1 "Tokyo")) :time (d / date-entity :month 9 :day 1))

Xinhua news agency has reported yiguo yu byline in a tokyo 1 9. JAMR Xinhua News Agency by Yiguo Yu on 1 September in Tokyo. GF

slide-29
SLIDE 29

Sample AMR (6)

# ::snt Alliot-Marie arrived on Sunday.

(a / arrive-01 :ARG1 (p / person :name (n / name :op1 "Alliot-Marie")) :time (d / date-entity :weekday (s / sunday)))

Sunday 's arrival of alliot-marie michèle_alliot-marie. JAMR

unknown qualified constant L.arrive_V2

GF

(a / arrive-01 :ARG0 (p / person :name (n / name :op1 "Alliot-Marie")) :time (d / date-entity :weekday (s / sunday))) Alliot-marie michèle_alliot-marie arrived sunday. JAMR Alliot-Marie arrives on Sunday. GF

slide-30
SLIDE 30

SemEval 2017: Task 9

  • Subtask 1: Parsing Biomedical Data
  • Subtask 2: AMR-to-English Generation

JAMR (5-grams) AMRà(U)Dàlin UL / IMCS / LETA Tranducer à lin SMT: AMR à Eng Approaches:

  • “grammar-based”
  • SMT/NMT
  • end-to-end

SemEval ACL

slide-31
SLIDE 31

RIG-GOT-RIO à Trio from Riga with regards to GOT & RIO ;)

“We made the following resources available to participants: [..] The JAMR (Flanigan et al., 2016) generation system, as a strong generation baseline. [..]” (May & Priyadarshi, 2017)

à à à

slide-32
SLIDE 32

Under the hood

# mkVP : VV -> VP -> VP # (frame1 (:ARG1 (var frame2))) => (frame1 (mkVP frame2)) /VV_FRAME/ < (/:ARG1/=vp < (/VAR/=var < /FRAME/=v)) # Tregex [move v >1 vp] [relabel vp /^.+$/mkVP/] [delete var] # Tsurgeon

slide-33
SLIDE 33
  • Alastair Butler. Deterministic natural language generation from meaning

representations for machine translation. NAACL Workshop on Semantics-Driven Machine Translation, 2016

Inspired by Butler (2016)

slide-34
SLIDE 34

Multilingual AMR-to-Text: experiment

TestTrees: t01_girls_see_a_boy TestTreesEng: a girl sees a boy . TestTreesLav: meitene redz zēnu . TestTreesRus: девочка видит мальчика . TestTrees: t04_two_pretty_girls_see_a_boy TestTreesEng: 2 pretty girls see a boy . TestTreesLav: 2 jaukas meitenes redz zēnu . TestTreesRus: 2 хорошенькие девочки видят мальчика . TestTrees: t21_girls_who_see_the_game_like_the_boys_who_play TestTreesEng: a girl that sees a game likes a boy that plays . TestTreesLav: meitenei , kas redz spēli , patīk zēns , kas spēlē . TestTreesRus: девочка , которая видит игру нравдит мальчика , которого играет . TestTrees: t27_they_are_thugs_and_deserve_a_bullet TestTreesEng: they are a thug and it deserves a bullet . TestTreesLav: viņi ir slepkava , un pelna lodi .

slide-35
SLIDE 35

Under the hood

The overall AMR-to-text process:

1. The input AMR is rewritten from the PENMAN notation to the the LISP-like bracketing tree syntax. 2. In case of a multi-sentence AMR, the graph is split into two or more graphs to be processed separately. 3. For each AMR, a sequence of tree pattern-matching transformation rules is applied (Tregex + Tsurgeon), acquiring a fully or partially converted GF abstract syntax tree (AST). 4. In case of a partially converted AST, the pending subtrees are pruned.** 5. The resulting ASTs are passed to the GF interpreter for RGL-based linearization. 6. Since RGL supports many more languages (30+), this approach can be extended to multilingual AMR-to-text generation, given a large translation lexicon (15+).

slide-36
SLIDE 36

Under the hood

** Our SemEval submission:

Because the coverage of our hand-crafted AMR-to-AST transformation rules is currently far from complete, we used JAMR Generator (Flanigan et al., 2016) as a “fall-back” option for AMRs that are not fully covered by the current rule set (~200). However, we applied heuristic post-processing rules to the JAMR output, which might have influenced the human judgements:

  • Adding a full-stop, or question mark, or exclamation mark at the end
  • f the sentence, or a wh-word at the beginning, based on the AMR

constructs.

  • Removing the remaining (unresolved) AMR constructs and concepts.
  • Converting large numbers into words, adding some prepositions, etc.
slide-37
SLIDE 37
slide-38
SLIDE 38

The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring

slide-39
SLIDE 39

BBC monitoring journalists translate from 30 languages into English, follow 400 social media accounts every day. A monitoring journalist typically monitors 4 TV channels and several online sources

  • simultaneously. This is about the maximum that any person can cope with mentally and
  • physically. The required human effort thus scales linearly with the number of monitored

sources. Monitoring journalists constantly need to be on the lookout for more sources and follow important stories—but as it is, they are tied down with mundane, routine monitoring tasks. Monitoring 250 video channels results in a daily buffer of 2.5TB, a weekly buffer of 19Tb, and an annual buffer of 1Pb.

Large-scale media monitoring

slide-40
SLIDE 40

Identify people, places, events of interest Discover trends, emerging events, crucial new stories

H2020 grant No. 688139

SUMMA –

Scalable Understanding of Multilingual MediA

slide-41
SLIDE 41
  • Event-based multi-document summarization
  • Storyline highlights across a set of related stories

Storyline highlights

slide-42
SLIDE 42
  • Extractive summarization

selects representative sentences from the input documents

  • Abstractive summarization

builds a semantic representation from which a summary is generated

  • What semantic

representation?

– PropBank / FrameNet – AMR

Sentence A: I saw Joe’s dog, which was running in the garden. Sentence B: The dog was chasing a cat. Summary: Joe’s dog was chasing a cat in the garden.

Liu F., Flanigan J., Thomson S., Sadeh N., Smith N.A. Toward Abstractive Summarization Using Semantic Representations. NAACL 2015

Abstractive text summarization

slide-43
SLIDE 43
  • 1. Riga (U Latvia, IMCS / LETA): 0.6196
  • 2. CAMR (U Brandeis / Boulder Learning Inc. / Rensselaer Polytechnic Institute):

0.6195

  • 3. ICL-HD (Ruprecht-Karls-Universität Heidelberg): 0.6005
  • 4. UCL+Sheffield (University College London / U Sheffield): 0.5983
  • 5. M2L (Kyoto University): 0.5952
  • 6. CMU (Carnegie Mellon University / U Washington): 0.5636
  • 7. CU-NLP (OK Robot Go Ltd. / U Colorado): 0.5566
  • 8. UofR (U Rochester): 0.4985
  • 9. MeaningFactory (U Groningen): 0.4702*
  • 10. CLIP@UMD (U Maryland): 0.4370
  • 11. DynamicPower (National Institute for Japanese Language and Linguistics): 0.3706*

* Rule/grammar-based; did not use AMR training data

SemEval 2016 Task 8 on AMR parsing

slide-44
SLIDE 44
  • Unrestricted large-scale NLU is difficult for grammars

– SemEval 2016: few grammar-based systems – SemEval 2017: no grammar-based systems (Boxer gave up…)

  • For NLG, grammar-based systems are very competitive!
  • Scaling up AMR-to-AST:

– Add more Tregex/Tsurgeon rules – A more flexible and systematic graph/tree-transducer (like UD2GF) – Learning transformation rules (C6.0; training data?) – Seq-to-seq deep learning?

Conclusion

slide-45
SLIDE 45
  • Normunds Grūzītis, Pēteris Paikens, Guntis Bārzdiņš. FrameNet Resource Grammar Library

for GF. CNL 2012

  • Dana Dannélls, Normunds Grūzītis. Extracting a bilingual semantic grammar from

FrameNet-annotated corpora. LREC 2014

  • Dana Dannélls, Normunds Grūzītis. Controlled natural language generation from a

multilingual FrameNet-based grammar. CNL 2014

  • Normunds Grūzītis, Dana Dannélls, Benjamin Lyngfelt, Aarne Ranta. Formalising the

Swedish Constructicon in Grammatical Framework. GEAF 2015

  • Normunds Grūzītis, Guntis Bārzdiņš. The role of CNL and AMR in scalable abstractive

summarization for multilingual media monitoring. CNL 2016

  • Normunds Grūzītis, Dana Dannélls. A Multilingual FrameNet-based Grammar and Lexicon

for Controlled Natural Language. Language Resources and Evaluation, 51(1), 2017

  • Normunds Grūzītis, Didzis Goško, Guntis Bārzdiņš. RIGOTRIO at SemEval-2017 Task 9:

Combining Machine Learning and Grammar Engineering for AMR Parsing and

  • Generation. SemEval 2017

Publications