NLG, Wrap up Surface realizer Linearization SimpleNLG Lexicon - - PowerPoint PPT Presentation

nlg wrap up
SMART_READER_LITE
LIVE PREVIEW

NLG, Wrap up Surface realizer Linearization SimpleNLG Lexicon - - PowerPoint PPT Presentation

NLG, Wrap up Scott Farrar CLMA, University of Washington far- rar@u.washington.edu Statistical NLG NLG, Wrap up Surface realizer Linearization SimpleNLG Lexicon Scott Farrar Design ideas CLMA, University of Washington


slide-1
SLIDE 1

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

NLG, Wrap up

Scott Farrar CLMA, University of Washington farrar@u.washington.edu March 10, 2010

1/26

slide-2
SLIDE 2

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Today’s lecture

1

Statistical NLG

2

Surface realizer Linearization

3

SimpleNLG Lexicon

4

Design ideas

2/26

slide-3
SLIDE 3

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

NLG research

Methods

3/26

slide-4
SLIDE 4

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

NLG research

Methods

canned text: predefined utterances are returned based

  • n the string value

3/26

slide-5
SLIDE 5

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

NLG research

Methods

canned text: predefined utterances are returned based

  • n the string value

template-based: hard-coded templates are filled in w missing constituents

3/26

slide-6
SLIDE 6

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

NLG research

Methods

canned text: predefined utterances are returned based

  • n the string value

template-based: hard-coded templates are filled in w missing constituents statistical: corpus is used to construct a language model

3/26

slide-7
SLIDE 7

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

NLG research

Methods

canned text: predefined utterances are returned based

  • n the string value

template-based: hard-coded templates are filled in w missing constituents statistical: corpus is used to construct a language model hybrid approach: use templates, but select best candidate based on corpus.

3/26

slide-8
SLIDE 8

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Goals of Statistical NLG

In a fully statistical NLG system, a text is produced that is as close to a given language model as possible, with a couple

  • f goals in mind:

4/26

slide-9
SLIDE 9

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Goals of Statistical NLG

In a fully statistical NLG system, a text is produced that is as close to a given language model as possible, with a couple

  • f goals in mind:

fidelity: that the output is as faithfully representative to human text as possible

4/26

slide-10
SLIDE 10

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Goals of Statistical NLG

In a fully statistical NLG system, a text is produced that is as close to a given language model as possible, with a couple

  • f goals in mind:

fidelity: that the output is as faithfully representative to human text as possible fluency: that the language generated is more easily processed and naturalistic

4/26

slide-11
SLIDE 11

Which is the text produced from Bigram?

, armed in their stings , Make boot upon the summer ’ s velvet buds , Which pillage they with merry march bring home to the tent - royal of their emperor ; Who , busied in his majesty , surveys The singing masons building roofs of gold , The civil citizens kneading up the honey , The poor mechanic porters crowding in Their heavy burdens at his narrow gate , The sad - ey ’d justice , with his surly hum , Delivering o ’ er to executors pale The lazy yawning drone . retreat . By my hand into the clouds , as cold as any military man in divers functions , Setting endeavour in continual motion ; To - morrow is Saint Crispian . He trots the air , a subject for a sovereign ’ s ambition ! He was thinking of civil wars when he speaks , The King hath heard that men of mould . Abate thy rage . Use lenity , sweet chuck . NYM . They know your worthiness . My liege , as you shall read that my Nell is dead i ’ faith , my cousin Suffolk

slide-12
SLIDE 12

Which is the text produced from Bigram?

, armed in their stings , Make boot upon the summer ’ s velvet buds , Which pillage they with merry march bring home to the tent - royal of their emperor ; Who , busied in his majesty , surveys The singing masons building roofs of gold , The civil citizens kneading up the honey , The poor mechanic porters crowding in Their heavy burdens at his narrow gate , The sad - ey ’d justice , with his surly hum , Delivering o ’ er to executors pale The lazy yawning drone . retreat . By my hand into the clouds , as cold as any military man in divers functions , Setting endeavour in continual motion ; To - morrow is Saint Crispian . He trots the air , a subject for a sovereign ’ s ambition ! He was thinking of civil wars when he speaks , The King hath heard that men of mould . Abate thy rage . Use lenity , sweet chuck . NYM . They know your worthiness . My liege , as you shall read that my Nell is dead i ’ faith , my cousin Suffolk Bigram

slide-13
SLIDE 13

Which is the text produced from Trigram?

, armed in their stings , Make boot upon the summer ’ s velvet buds , Which pillage they with merry march bring home to the tent - royal of their emperor ; Who , busied in his majesty , surveys The singing masons building roofs of gold , The civil citizens kneading up the honey , The poor mechanic porters crowding in Their heavy burdens at his narrow gate , The sad - ey ’d justice , with his surly hum , Delivering o ’ er to executors pale The lazy yawning drone . HENRY . We are in God ’s peace ! I have an excellent armour ; but in loving me you should love the lovely bully . What mean have defeated the law ; Who when they were as cold as any ’ s ambition ! He was thinking of civil wars when he was a merry message . KING HENRY . Thou doest thy office fairly . Turn head and stop pursuit ; for we hear Your greeting is from him , you men of mould . Abate thy rage , abate they manly rage ; Abate thy rage ,

slide-14
SLIDE 14

Which is the text produced from Trigram?

, armed in their stings , Make boot upon the summer ’ s velvet buds , Which pillage they with merry march bring home to the tent - royal of their emperor ; Who , busied in his majesty , surveys The singing masons building roofs of gold , The civil citizens kneading up the honey , The poor mechanic porters crowding in Their heavy burdens at his narrow gate , The sad - ey ’d justice , with his surly hum , Delivering o ’ er to executors pale The lazy yawning drone . HENRY . We are in God ’s peace ! I have an excellent armour ; but in loving me you should love the lovely bully . What mean have defeated the law ; Who when they were as cold as any ’ s ambition ! He was thinking of civil wars when he was a merry message . KING HENRY . Thou doest thy office fairly . Turn head and stop pursuit ; for we hear Your greeting is from him , you men of mould . Abate thy rage , abate they manly rage ; Abate thy rage , Trigram

slide-15
SLIDE 15

Which is the text produced from Trigram?

, armed in their stings , Make boot upon the summer ’ s velvet buds , Which pillage they with merry march bring home to the tent - royal of their emperor ; Who , busied in his majesty , surveys The singing masons building roofs of gold , The civil citizens kneading up the honey , The poor mechanic porters crowding in Their heavy burdens at his narrow gate , The sad - ey ’d justice , with his surly hum , Delivering o ’ er to executors pale The lazy yawning drone . Shakespeare HENRY . We are in God ’s peace ! I have an excellent armour ; but in loving me you should love the lovely bully . What mean have defeated the law ; Who when they were as cold as any ’ s ambition ! He was thinking of civil wars when he was a merry message . KING HENRY . Thou doest thy office fairly . Turn head and stop pursuit ; for we hear Your greeting is from him , you men of mould . Abate thy rage , abate they manly rage ; Abate thy rage , Trigram

slide-16
SLIDE 16

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Fluency goals

Fluency

Achieved according to macroscopic properties, or those properties of text that describe non-content issues: sentence length vocabulary diversity use of certain syntactic structures (relatives, lists) surface stylistics (commas, punc., capitalization) All things being equal, text A and text B could be produced with mostly different macroscopic properties, yet they would both represent the same information.

7/26

slide-17
SLIDE 17

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Fluency goals

Fluency

Achieved according to macroscopic properties, or those properties of text that describe non-content issues: sentence length vocabulary diversity use of certain syntactic structures (relatives, lists) surface stylistics (commas, punc., capitalization) All things being equal, text A and text B could be produced with mostly different macroscopic properties, yet they would both represent the same information.

7/26

slide-18
SLIDE 18

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Fluency goals

Fluency

Achieved according to macroscopic properties, or those properties of text that describe non-content issues: sentence length vocabulary diversity use of certain syntactic structures (relatives, lists) surface stylistics (commas, punc., capitalization) All things being equal, text A and text B could be produced with mostly different macroscopic properties, yet they would both represent the same information.

7/26

slide-19
SLIDE 19

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Fluency goals

Fluency

Achieved according to macroscopic properties, or those properties of text that describe non-content issues: sentence length vocabulary diversity use of certain syntactic structures (relatives, lists) surface stylistics (commas, punc., capitalization) All things being equal, text A and text B could be produced with mostly different macroscopic properties, yet they would both represent the same information.

7/26

slide-20
SLIDE 20

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Fluency goals

Fluency

Achieved according to macroscopic properties, or those properties of text that describe non-content issues: sentence length vocabulary diversity use of certain syntactic structures (relatives, lists) surface stylistics (commas, punc., capitalization) All things being equal, text A and text B could be produced with mostly different macroscopic properties, yet they would both represent the same information.

7/26

slide-21
SLIDE 21

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Fluency goals

Fluency

Achieved according to macroscopic properties, or those properties of text that describe non-content issues: sentence length vocabulary diversity use of certain syntactic structures (relatives, lists) surface stylistics (commas, punc., capitalization) All things being equal, text A and text B could be produced with mostly different macroscopic properties, yet they would both represent the same information.

7/26

slide-22
SLIDE 22

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Comparing macroscopic properties

Example

Man, that was a sweet deal you made. What was that guy thinking?

Example

Dude, you really scored with that deal. He was a real sucker. Are these equal?

8/26

slide-23
SLIDE 23

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Comparing macroscopic properties

Example

Man, that was a sweet deal you made. What was that guy thinking?

Example

Dude, you really scored with that deal. He was a real sucker. Are these equal?

8/26

slide-24
SLIDE 24

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Comparing macroscopic properties

Example

Man, that was a sweet deal you made. What was that guy thinking?

Example

Dude, you really scored with that deal. He was a real sucker. Are these equal?

8/26

slide-25
SLIDE 25

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Pure statistical NLG

But, a pure fully statistical NLG engine would be of minimal

  • use. It would produce isolated utterances that sounded fine,

but might be odd. Why? Domain, subject matter is highly specific. Context is completely lost. Turn doesn’t match previous (in dialogue).

9/26

slide-26
SLIDE 26

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Pure statistical NLG

But, a pure fully statistical NLG engine would be of minimal

  • use. It would produce isolated utterances that sounded fine,

but might be odd. Why? Domain, subject matter is highly specific. Context is completely lost. Turn doesn’t match previous (in dialogue).

9/26

slide-27
SLIDE 27

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Pure statistical NLG

But, a pure fully statistical NLG engine would be of minimal

  • use. It would produce isolated utterances that sounded fine,

but might be odd. Why? Domain, subject matter is highly specific. Context is completely lost. Turn doesn’t match previous (in dialogue).

9/26

slide-28
SLIDE 28

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Pure statistical NLG

But, a pure fully statistical NLG engine would be of minimal

  • use. It would produce isolated utterances that sounded fine,

but might be odd. Why? Domain, subject matter is highly specific. Context is completely lost. Turn doesn’t match previous (in dialogue).

9/26

slide-29
SLIDE 29

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Hybrid NLG

Hybrid techniques, on the other hand, provide the best of both worlds: template-based NLG to ensure relevance (fidelity) corpus-based NLG to produce natural sounding utterances (fluency). For example, content planning can still be accomplished using symbolic techniques. But condition upon the domain/genre:

choose lexical items (heart vs. ticker) chose referring exp. syntax: the large black dog, that big dog, the black one match dialogue act with tense

10/26

slide-30
SLIDE 30

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Hybrid NLG

Hybrid techniques, on the other hand, provide the best of both worlds: template-based NLG to ensure relevance (fidelity) corpus-based NLG to produce natural sounding utterances (fluency). For example, content planning can still be accomplished using symbolic techniques. But condition upon the domain/genre:

choose lexical items (heart vs. ticker) chose referring exp. syntax: the large black dog, that big dog, the black one match dialogue act with tense

10/26

slide-31
SLIDE 31

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Hybrid NLG

Hybrid techniques, on the other hand, provide the best of both worlds: template-based NLG to ensure relevance (fidelity) corpus-based NLG to produce natural sounding utterances (fluency). For example, content planning can still be accomplished using symbolic techniques. But condition upon the domain/genre:

choose lexical items (heart vs. ticker) chose referring exp. syntax: the large black dog, that big dog, the black one match dialogue act with tense

10/26

slide-32
SLIDE 32

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Hybrid NLG

Hybrid techniques, on the other hand, provide the best of both worlds: template-based NLG to ensure relevance (fidelity) corpus-based NLG to produce natural sounding utterances (fluency). For example, content planning can still be accomplished using symbolic techniques. But condition upon the domain/genre:

choose lexical items (heart vs. ticker) chose referring exp. syntax: the large black dog, that big dog, the black one match dialogue act with tense

10/26

slide-33
SLIDE 33

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Hybrid NLG

Hybrid techniques, on the other hand, provide the best of both worlds: template-based NLG to ensure relevance (fidelity) corpus-based NLG to produce natural sounding utterances (fluency). For example, content planning can still be accomplished using symbolic techniques. But condition upon the domain/genre:

choose lexical items (heart vs. ticker) chose referring exp. syntax: the large black dog, that big dog, the black one match dialogue act with tense

10/26

slide-34
SLIDE 34

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Hybrid NLG

Hybrid techniques, on the other hand, provide the best of both worlds: template-based NLG to ensure relevance (fidelity) corpus-based NLG to produce natural sounding utterances (fluency). For example, content planning can still be accomplished using symbolic techniques. But condition upon the domain/genre:

choose lexical items (heart vs. ticker) chose referring exp. syntax: the large black dog, that big dog, the black one match dialogue act with tense

10/26

slide-35
SLIDE 35

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Templates in Hybrid NLG

Given a text, extract potential templates: I’d like to leave Houston at 5pm. Can you recommend a good wine ? I wanna order a sandwich . Now transform utterances into templates and fill with domain-specific items: I’d like to leave <CITY> at <TIME> . Can you recommend a good <PRONOUN> ? I wanna order <FOOD> .

11/26

slide-36
SLIDE 36

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Templates in Hybrid NLG

Given a text, extract potential templates: I’d like to leave Houston at 5pm. Can you recommend a good wine ? I wanna order a sandwich . Now transform utterances into templates and fill with domain-specific items: I’d like to leave <CITY> at <TIME> . Can you recommend a good <PRONOUN> ? I wanna order <FOOD> .

11/26

slide-37
SLIDE 37

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Templates in Hybrid NLG

Given a text, extract potential templates: I’d like to leave Houston at 5pm. Can you recommend a good wine ? I wanna order a sandwich . Now transform utterances into templates and fill with domain-specific items: I’d like to leave <CITY> at <TIME> . Can you recommend a good <PRONOUN> ? I wanna order <FOOD> .

11/26

slide-38
SLIDE 38

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Templates in Hybrid NLG

Given a text, extract potential templates: I’d like to leave Houston at 5pm. Can you recommend a good wine ? I wanna order a sandwich . Now transform utterances into templates and fill with domain-specific items: I’d like to leave <CITY> at <TIME> . Can you recommend a good <PRONOUN> ? I wanna order <FOOD> .

11/26

slide-39
SLIDE 39

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Templates in Hybrid NLG

Given a text, extract potential templates: I’d like to leave Houston at 5pm. Can you recommend a good wine ? I wanna order a sandwich . Now transform utterances into templates and fill with domain-specific items: I’d like to leave <CITY> at <TIME> . Can you recommend a good <PRONOUN> ? I wanna order <FOOD> .

11/26

slide-40
SLIDE 40

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Templates in Hybrid NLG

Given a text, extract potential templates: I’d like to leave Houston at 5pm. Can you recommend a good wine ? I wanna order a sandwich . Now transform utterances into templates and fill with domain-specific items: I’d like to leave <CITY> at <TIME> . Can you recommend a good <PRONOUN> ? I wanna order <FOOD> .

11/26

slide-41
SLIDE 41

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Templates in Hybrid NLG

Given a text, extract potential templates: I’d like to leave Houston at 5pm. Can you recommend a good wine ? I wanna order a sandwich . Now transform utterances into templates and fill with domain-specific items: I’d like to leave <CITY> at <TIME> . Can you recommend a good <PRONOUN> ? I wanna order <FOOD> .

11/26

slide-42
SLIDE 42

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Templates in Hybrid NLG

Given a text, extract potential templates: I’d like to leave Houston at 5pm. Can you recommend a good wine ? I wanna order a sandwich . Now transform utterances into templates and fill with domain-specific items: I’d like to leave <CITY> at <TIME> . Can you recommend a good <PRONOUN> ? I wanna order <FOOD> .

11/26

slide-43
SLIDE 43

Components in statistical NLG

slide-44
SLIDE 44

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Today’s lecture

1

Statistical NLG

2

Surface realizer Linearization

3

SimpleNLG Lexicon

4

Design ideas

13/26

slide-45
SLIDE 45

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Surface realizer: Purpose

To generate natural language strings from a fully specified input (deterministic); the inverse of certain kinds of parsing processes. determines the surface form of the text; adds inflectional endings of words;

  • rders constituents;
  • misc. markup (e.g., lists, paragraphs, punctuation)

14/26

slide-46
SLIDE 46

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Surface realizer: Inputs/Outputs

Input: phrase specifications Or for an entire text, a text specification Output: linearized sentences, texts

15/26

slide-47
SLIDE 47

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Incremental NLG

A surface realizer adds more and more grammatical detail:

1 lexical items 2 morphosyntactic info 3 surface form with inflection 4 punctuation, capitalization (intonation if spoken)

Example

1 request itinerary 2 2.SG POSS request INDEF.itinerary 3 you can request an itinerary 4 You can request an itinerary. 16/26

slide-48
SLIDE 48

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Incremental NLG

A surface realizer adds more and more grammatical detail:

1 lexical items 2 morphosyntactic info 3 surface form with inflection 4 punctuation, capitalization (intonation if spoken)

Example

1 request itinerary 2 2.SG POSS request INDEF.itinerary 3 you can request an itinerary 4 You can request an itinerary. 16/26

slide-49
SLIDE 49

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Incremental NLG

A surface realizer adds more and more grammatical detail:

1 lexical items 2 morphosyntactic info 3 surface form with inflection 4 punctuation, capitalization (intonation if spoken)

Example

1 request itinerary 2 2.SG POSS request INDEF.itinerary 3 you can request an itinerary 4 You can request an itinerary. 16/26

slide-50
SLIDE 50

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Incremental NLG

A surface realizer adds more and more grammatical detail:

1 lexical items 2 morphosyntactic info 3 surface form with inflection 4 punctuation, capitalization (intonation if spoken)

Example

1 request itinerary 2 2.SG POSS request INDEF.itinerary 3 you can request an itinerary 4 You can request an itinerary. 16/26

slide-51
SLIDE 51

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Incremental NLG

A surface realizer adds more and more grammatical detail:

1 lexical items 2 morphosyntactic info 3 surface form with inflection 4 punctuation, capitalization (intonation if spoken)

Example

1 request itinerary 2 2.SG POSS request INDEF.itinerary 3 you can request an itinerary 4 You can request an itinerary. 16/26

slide-52
SLIDE 52

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Surface realizer

main functions:

linguistic realization: uses rules of grammar (about morphology and syntax) to convert abstract representations of sentences into actual text. structure realization: converts abstract structures such as paragraphs and sentences into mark-up (punctuated text, HTML, etc.)

17/26

slide-53
SLIDE 53

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Linearization

The microplanner identifies and specifies the order of constituents, but does not put the constituents in the final

  • rder.

It’s left up to the surface realization component to carry out the instructions encoded in the phrase specification: English: adjectivals before nouns, e.g., giant tortoise Spanish: adjectivals after nouns, e.g., tortuga gigante

18/26

slide-54
SLIDE 54

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Today’s lecture

1

Statistical NLG

2

Surface realizer Linearization

3

SimpleNLG Lexicon

4

Design ideas

19/26

slide-55
SLIDE 55

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Library contents

Key coomponents of SimpleNLG

simplenlg.features: various morphosyntactic and discourse features simplenlg.framework: key NLG elements (documents, phrases, words) simplenlg.lexicon: the lexicon class simplenlg.realiser.english: the actual realiser

20/26

slide-56
SLIDE 56

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Library contents

Key coomponents of SimpleNLG

simplenlg.features: various morphosyntactic and discourse features simplenlg.framework: key NLG elements (documents, phrases, words) simplenlg.lexicon: the lexicon class simplenlg.realiser.english: the actual realiser

20/26

slide-57
SLIDE 57

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Library contents

Key coomponents of SimpleNLG

simplenlg.features: various morphosyntactic and discourse features simplenlg.framework: key NLG elements (documents, phrases, words) simplenlg.lexicon: the lexicon class simplenlg.realiser.english: the actual realiser

20/26

slide-58
SLIDE 58

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Library contents

Key coomponents of SimpleNLG

simplenlg.features: various morphosyntactic and discourse features simplenlg.framework: key NLG elements (documents, phrases, words) simplenlg.lexicon: the lexicon class simplenlg.realiser.english: the actual realiser

20/26

slide-59
SLIDE 59

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Library contents

Key coomponents of SimpleNLG

simplenlg.features: various morphosyntactic and discourse features simplenlg.framework: key NLG elements (documents, phrases, words) simplenlg.lexicon: the lexicon class simplenlg.realiser.english: the actual realiser

20/26

slide-60
SLIDE 60

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Process

Microplanning

PhraseElement myNP = phraseFactory.createNounPhrase(w); DocumentElement sentence2 = documentFactory.createSentence(); sentence2.addComponent(myNP);

Realization

Realiser realiser = new Realiser(); realiser.setLexicon(lexicon); NLGElement mydoc = realiser.realise(mydoc); System.out.println(mydoc.getRealisation());

21/26

slide-61
SLIDE 61

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Features and values available in SimpleNLG

Tense Fut, Past, Pres Person First, Second, Third Gender Feminine, Masculine, Neuter NumberAgr Both, Plural, Singular Pattern Regular, Irregular, Regular Double, ... Interrogative How, Where, Why, etc. ClauseStatus Matrix, Subordinate DiscourseFuction Cue Phrase, Post modifier, Complement, etc.

22/26

slide-62
SLIDE 62

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

SimpleNLG lexicons

Available lexicons

DefaultLexicon NIHLexicon

2010 Release: 432822 records 551454 baseForms 758153 forms Item BaseForms Forms

  • noun

350050 430625 614737 adj 61999 93135 95089 verb 11001 14274 57412 adv 9416 13044 13108 prep 155 170 170 pron 87 88 88 conj 65 69 69 det 38 38 38 modal 7 7 25 aux 3 3 30

23/26

slide-63
SLIDE 63

Lexical entries

WordElement: base=sell, category=VERB, {realisation=null, category=VERB, features={isDitransitive=true, presentParticiple=selling, present3s=sells, intransitive=true, transitive=true, pastParticiple=sold, past=sold}} WordElement: base=Franklin, category=NOUN, {realisation=null, category=NOUN, features={proper=true, nonCount=false}} WordElement: base=big, category=ADJECTIVE, {realisation=null, category=ADJECTIVE, features={isClassifyingAdj=false, comparative=bigger, predicative=true, superlative=biggest, isColourAdjective=false, isQualitativeAdjective=true}}

slide-64
SLIDE 64

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Today’s lecture

1

Statistical NLG

2

Surface realizer Linearization

3

SimpleNLG Lexicon

4

Design ideas

25/26

slide-65
SLIDE 65

NLG, Wrap up Scott Farrar CLMA, University

  • f Washington far-

rar@u.washington.edu Statistical NLG Surface realizer

Linearization

SimpleNLG

Lexicon

Design ideas

Possible class hierarchies

You’ll have 3 separate class hierarchies (with as much structure as you wish):

1 Messages, e.g., BirthMessage, DeathMessage 2 KB entities / things in the domain, e.g., Person,

Location, etc.

3 SimpleNLG entities (phrases, whole docs, etc)

Methods

Create methods in the various message classes to output instances of NLGElement.

26/26