Probabilistic Models of Language Processing and Acquisition Fuchs - - PowerPoint PPT Presentation

probabilistic models of language processing and
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Models of Language Processing and Acquisition Fuchs - - PowerPoint PPT Presentation

TU Graz - Signal Processing and Speech Communication Laboratory Probabilistic Models of Language Processing and Acquisition Fuchs Anna & L aer Andreas Signal Processing and Speech Communication Laboratory Advanced Signal Processing 2


slide-1
SLIDE 1

TU Graz - Signal Processing and Speech Communication Laboratory

Probabilistic Models of Language Processing and Acquisition

Fuchs Anna & L¨ aßer Andreas

Signal Processing and Speech Communication Laboratory

Advanced Signal Processing 2

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 1/40

slide-2
SLIDE 2

TU Graz - Signal Processing and Speech Communication Laboratory

Outline

Introduction Syntactic Parsing Formal Grammar Context-free Grammar Parsing as Search – Two Strategies Ambiguity Dynamic Programming Parsing Method - CKY Algorihtm Statistical Parsing Probabilistic Context-Free Grammar (PCFG) Where do the probabilities come from? – Tree Banks Probabilistic CKY PCFG – Solve Ambiguity Problems with PCFG Conclusion

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 2/40

slide-3
SLIDE 3

TU Graz - Signal Processing and Speech Communication Laboratory

Introduction

General Comments

◮ Language can be represented by a probabilistic model ◮ Language processing involves generating or interpreting this

model

◮ Language acquisition involves learning probabilistic models ◮ Main focus on Parsing and Learning grammar ◮ Chomskayan linguistics – language is internally represented as

a Grammar

◮ Grammar – a system of rules that specifies all and only

allowable sentences

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 3/40

slide-4
SLIDE 4

TU Graz - Signal Processing and Speech Communication Laboratory

Probability in Language

◮ Cognitive science of language can be described WITH and

WITHOUT probability

◮ Structural linguistics want to find regularities in language

corpora and focused on finding the abstract rules

◮ Development of sophisticated probabilistic models – specified

in terms of symbolic rules and representations

◮ Grammatical rules are associated with probabilities of what is

linguistically likely not just what is linguistically possible

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 4/40

slide-5
SLIDE 5

TU Graz - Signal Processing and Speech Communication Laboratory

Syntactic Parsing

Formal Grammar

◮ Grammar is a powerful tool for describing and analyzing

languages

◮ Grammar is a structured set of production rules by which valid

sentences in a language are constructed

◮ Most commonly used for syntactic description, but also useful

for (semantics, phonology,...)

◮ Defines syntactically legal sentences

Sandra ate an apple. (syntactically legal)

  • Sandra ate apple.

(not syntactically legal) x Sandra ate a building. (syntactically legal)

  • ◮ Sentences may be grammatically OK but not acceptable

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 5/40

slide-6
SLIDE 6

TU Graz - Signal Processing and Speech Communication Laboratory

Definition I

N a set of non-terminal symbols (or variables) Σ a set of terminal symbols (disjoint from N); an actual word in a language R a set of Rules or Productions, each of the form A →β, A is a non-terminal; β is any strings of terminals and non-terminals S is a designated start symbol

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 6/40

slide-7
SLIDE 7

TU Graz - Signal Processing and Speech Communication Laboratory

Definition II

◮ Production – A can be replace by β

◮ Strings containing nothing that can be expanded further will

consist of only terminals

◮ Such a string is called a sentence ◮ In the context of programming languages: a sentence is a

syntactically correct and complete program

◮ Derivation – a sequence of applications of the rules of a

grammar that produces a finished string of terminals

◮ Also called a parse Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 7/40

slide-8
SLIDE 8

TU Graz - Signal Processing and Speech Communication Laboratory

Chomsky Hierarchy

◮ Type 0: unrestricted grammar, no other constraints ◮ Type 1: Context-sensitive grammars ◮ Type 2: Context-Free Grammar (CFGs) ◮ Type 3: Regular grammar

Context-Free Grammar – CFG

◮ Declarative CFG – not specified how parse trees will be

constructed

◮ Non-terminal on the left-hand side of a rule is all by itself ◮ Context-free – each node is expanded independently ◮ e.g. A → B C means that A is replaced by B followed by a C

regardless of the context in which A is found

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 8/40

slide-9
SLIDE 9

TU Graz - Signal Processing and Speech Communication Laboratory

Parsing as Search – Two Strategies

◮ Best possible way to make an analysis of a sentence ◮ Process of taking a string and a grammar and returning a

(many?) parse tree(s) for that string

◮ Assigning correct trees to input strings ◮ Correct means a tree that covers all and only the elements of

the input and has an S at the top

◮ It does not mean that the system can select the correct tree

from among the possible trees

◮ Parsing – search which involves the making of choices

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 9/40

slide-10
SLIDE 10

TU Graz - Signal Processing and Speech Communication Laboratory

Derivation as Trees

◮ Syntactic parsing - searching through the space of possible

parse trees to find the correct parse tree for a given sentence

◮ E.g. Book that flight.

S VP Verb Book NP Det that Nominal Noun flight

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 10/40

slide-11
SLIDE 11

TU Graz - Signal Processing and Speech Communication Laboratory

Example

Rules Sentence → <Subject><Verb-Phrase><Object> Subject → This | Computers | I Verb-Phrase → <Adverb><Verb> | <Verb> Adverb → never Verb → is|run| am | tell Object → the <Noun> | a <Noun> | <Noun> Noun → university | world | cheese |lies

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 11/40

slide-12
SLIDE 12

TU Graz - Signal Processing and Speech Communication Laboratory

Example cont.

◮ Derive simple sentences with and without sense

This is a university. Computers run the world. I never tell lies. I am the cheese. Computers run cheese.

◮ Do not make semantic sense, but syntactically correct ◮ Formal grammars are a tool for SYNTAX not SEMANTICS

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 12/40

slide-13
SLIDE 13

TU Graz - Signal Processing and Speech Communication Laboratory

Two Strategies

◮ Find all trees, whose root is start symbol S and cover the

input words

◮ Two constraints (two search strategies):

  • 1. Grammar – goal-directed search (Top-Down)
  • 2. Data – data-directed search (Bottom-Up)

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 13/40

slide-14
SLIDE 14

TU Graz - Signal Processing and Speech Communication Laboratory

Top-Down Parsing

◮ Find trees rooted with an S start with the rules that give us

an S

◮ Work the way down from there to the words

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 14/40

slide-15
SLIDE 15

TU Graz - Signal Processing and Speech Communication Laboratory S S NP VP S Aux NP VP S VP S NP Det Nom VP S NP PropN VP S Aux NP Det Nom VP S Aux NP PropN VP S VP V NP S VP V Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 15/40

slide-16
SLIDE 16

TU Graz - Signal Processing and Speech Communication Laboratory

Bottom-Up Parsing

◮ Trees that cover the input words start with trees that link up

with the words in the right way

◮ Work the way up from there

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 16/40

slide-17
SLIDE 17

TU Graz - Signal Processing and Speech Communication Laboratory

Let’s do an example!

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 17/40

slide-18
SLIDE 18

TU Graz - Signal Processing and Speech Communication Laboratory

Grammar Lexicon S → NP VP Det → that | this S → Aux NP VP the | a S → VP Noun → book | flight NP → Pronoun meal | money NP → Proper-Noun Verb → book | include NP → Det Nominal prefer Nominal → Noun Pronoun → I | she | me Nominal → Nominal Noun Proper-Noun → Houston| NWA Nominal → Nominal PP Aux → does VP → Verb Preposition → from | to | on VP → Verb NP near | through VP → Verb NP PP VP → Verb PP VP → VP PP PP → Preposition NP

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 18/40

slide-19
SLIDE 19

TU Graz - Signal Processing and Speech Communication Laboratory

Top-Down vs Bottom-Up Search Top-Down

◮ Never considers derivations

that do not end up at root S

◮ Wastes a lot of time with

trees that are inconsistent with the input

Bottom-Up

◮ Generates many subtrees

that will never lead to an S

◮ Only considers trees that

cover some part of the input

◮ Combine TD and BU: Top-Down expectations with

Bottom-Up data to get more efficient searches

◮ One kind as the control and the other as a filter ◮ For both: How to explore the search space? Pursuing all

parses in parallel? Which rule to apply next? Which node to expand next?

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 19/40

slide-20
SLIDE 20

TU Graz - Signal Processing and Speech Communication Laboratory

Ambiguity I

◮ At least one string which has multiple parse trees ◮ E.g.1:

...old men and women...

◮ E.g.2:

I shot an elephant in my pajamas.

◮ Choose the correct parse from multitude of possible parses

through syntactic disambiguation

◮ Such algorithms require statistical, semantic and pragmatic

knowledge

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 20/40

slide-21
SLIDE 21

TU Graz - Signal Processing and Speech Communication Laboratory

Ambiguity II

◮ A grammar is ambiguous if there exists at least one string

which has multiple parse trees

◮ Structural ambiguity – More than one structural analysis for a

(partial) sentence

◮ E.g.1:

...[old[men and women]]...

◮ E.g.2:

I shot an elephant in my pajamas.

◮ Choose the correct parse from multitude of possible parses

through syntactic disambiguation

◮ Such algorithms require statistical, semantic and pragmatic

knowledge

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 21/40

slide-22
SLIDE 22

TU Graz - Signal Processing and Speech Communication Laboratory

Ambiguity III

◮ A grammar is ambiguous if there exists at least one string

which has multiple parse trees

◮ Structural ambiguity – More than one structural analysis for a

(partial) sentence

◮ E.g.1:

...[old men] and [women]...

◮ E.g.2:

I shot an elephant in my pajamas.

◮ Choose the correct parse from multitude of possible parses

through syntactic disambiguation

◮ Such algorithms require statistical, semantic and pragmatic

knowledge

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 22/40

slide-23
SLIDE 23

TU Graz - Signal Processing and Speech Communication Laboratory

S NP Pronoun I VP Verb shot NP Det an Nominal Nominal Noun elephant PP in my pajamas S NP Pronoun I VP VP Verb shot NP Det an Nominal Noun elephant PP in my pajamas

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 23/40

slide-24
SLIDE 24

TU Graz - Signal Processing and Speech Communication Laboratory

Dynamic Programming Parsing Method

◮ Problem: Exponential number of parse trees for a given

sentence

◮ Idea: Solving a task by breaking it up into smaller sub-tasks ◮ Parsing can be made tractable by dynamic programming

General Idea:

◮ Breaking up a problem into sub-problems ◮ Creating a table which will contain solutions to each

sub-problem

◮ Resolving each sub-problem and populating the table ◮ Reading off the complete solution from the table, by

combining the solutions to the sub-problems

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 24/40

slide-25
SLIDE 25

TU Graz - Signal Processing and Speech Communication Laboratory

CKY Parsing

◮ Classic, Bottom-Up dynamic programming algorithm

(Cocke-Kasami-Younger)

◮ Requires an input grammar based on Chomsky Normal Form

(CNF)

◮ A CNF grammar is a Context-Free Grammar in which:

◮ Every rule left is a non-terminal ◮ Every rule right consists of either a single terminal or two

non-terminals

◮ E.g. restriction to the form A → B C or A → w ◮ For any CFG, there is a corresponding CNF grammar which

accepts exactly the same set of strings as the original CFG

◮ Example Blackboard... Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 25/40

slide-26
SLIDE 26

TU Graz - Signal Processing and Speech Communication Laboratory

CKY Algorithm I

◮ Each non-terminal node above the part-of-speech level in a

parse tree will have exactly two daughters.

◮ Leads to a simple two dimensional matrix used to encode the

structure of an entire tree. → a sentence of the length n leads to an upper-triangle portion of an (n + 1) × (n + 1) matrix

◮ Example: ”Book the dinner flight”

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 26/40

slide-27
SLIDE 27

TU Graz - Signal Processing and Speech Communication Laboratory

CKY Algorithm II

◮ What makes the recognizer to a parser? (simply to find an S

in the cell [0, N])

  • 1. Every non-terminal is paired with a pointer on the table entries

which it was derived

  • 2. Permit multiple versions of the same non-terminal to entered

into the table

◮ The table contains all possible parses of the given input ◮ First choose an S [0, n] and then recursively retrieving its

component constituents from the table

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 27/40

slide-28
SLIDE 28

TU Graz - Signal Processing and Speech Communication Laboratory

Statistical Parsing

Probabilistic Context-free Grammars (PCFG)

◮ Probabilistic CFGs are the same as CFGs except that each

rule is associated with a probability

S → NP VP .80 S → Aux NP VP .15 S → VP .05 NP → Det N .20 NP → Det Adj N .35 NP → N .20 NP → Adj N .15 NP → Pro .10

◮ P for each set of rules sums to 1 ◮ Can be used to achieve disambiguation among parse structures

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 28/40

slide-29
SLIDE 29

TU Graz - Signal Processing and Speech Communication Laboratory

N a set of non-terminal symbols (or variables) Σ a set of terminal symbols (disjoint from N) R a set of Rules or productions, each of the form A →β[p], where A is a non-terminal, β is a string of symbols from the infinite set of strings (Σ ∪ N)∗, and p is a number between 0 and 1 expressing P(β | A) S is a designated start symbol

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 29/40

slide-30
SLIDE 30

TU Graz - Signal Processing and Speech Communication Laboratory

Grammar Lexicon S → NP VP [0.80] Det → that[0.10]|the[0.60] S → Aux NP VP [0.15] |a[0.30] S → VP [0.05] Noun → book[0.1]|light[0.3] NP → Pronoun [0.35] |meal[0.15]|dinner[0.1] NP → Proper-Noun [0.30] |flights[0.4]|money[0.05] NP → Det Nominal [0.20] Verb → book [0.3]|include [0.3] NP → Nominal [0.15] |prefer[0.4] Nominal → Noun [0.75] Pronoun → I[0.4]|she[0.05]|me[0.15]|you[0.4] Nominal → Nominal Noun [0.20] Proper- Noun → Houston[0.6]|NWA[0.4] Nominal → Nominal PP [0.05] Aux → does[0.6]|can[0.4] VP → Verb [0.35] Preposition → from[0.3]|through[0.05] VP → Verb NP [0.20] |near[0.15]|to[0.3] VP → Verb NP PP [0.10] |on[0.2] VP → Verb PP [0.15] VP → Verb NP NP [0.05] VP → VP PP [0.15] PP → Preposition NP [1.00]

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 30/40

slide-31
SLIDE 31

TU Graz - Signal Processing and Speech Communication Laboratory

Where do the probabilities come from?

◮ 1. Use a corpus of already parsed sentences: a ”tree bank” ◮ 2. Create your own tree bank

Tree-Bank – Parsed corpus

◮ Text corpus in which each sentence has been parsed, i.e.

annotated with syntactic structure

◮ Tree-Banks can be created completely manually - linguists

annotate each sentence with syntactic structure

◮ Or semi-automatically, where a parser assigns some syntactic

structure which linguists then check and, if necessary correct

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 31/40

slide-32
SLIDE 32

TU Graz - Signal Processing and Speech Communication Laboratory

Probabilistic CKY

◮ Adds to every cell in the (n + 1) × (n + 1) a third dimension

  • f the length V where each non-terminal is placed in the cell

and every value represents a probability

◮ Each cell [i, j, A] in this (n + 1) × (n + 1) × V matrix is the

probability of a constituent A that spans positions i through j

  • f the input

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 32/40

slide-33
SLIDE 33

TU Graz - Signal Processing and Speech Communication Laboratory

PCFG – Solve Ambiguity

◮ Probability of a parse tree T of a sentence S with n rules

where each rule i can be expressed as RHSi → LHSi: P(T, S) =

n

  • i

P(RHSi|LHSi)

◮ Pick the parse with highest probability

ˆ T(S) = arg max

T

P(T|S) = arg max

T

P(T, S) P(S) = arg max

T

P(T, S) = arg max

T

P(T)

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 33/40

slide-34
SLIDE 34

TU Graz - Signal Processing and Speech Communication Laboratory

Book the Dinner flight.

Rules P S → VP .05 VP → Verb NP .20 VP → Verb NP NP .10 NP → Det Nominal .20 NP → Nominal .15 Nominal → Nominal Noun .20 Nominal → Noun .75 Rules P Verb → book .30 Det → the .60 Noun → dinner .10 Noun → flights .40

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 34/40

slide-35
SLIDE 35

TU Graz - Signal Processing and Speech Communication Laboratory S VP Verb Book NP Det the Nominal Nominal Noun Dinner Noun Flight S VP Verb Book NP Det the Nominal Noun Dinner NP Nominal Noun Flight Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 35/40

slide-36
SLIDE 36

TU Graz - Signal Processing and Speech Communication Laboratory S VP Verb Book NP Det the Nominal Nominal Noun Dinner Noun Flight .05 .20 .30 .20 .60 .20 .40 .75 .10 S VP Verb Book NP Det the Nominal Noun Dinner NP Nominal Noun Flight .05 .10 .20 .15 .75 .75 .40 .10 .60 .30

◮ P(Tleft) = 0.05·0.2·0.2·0.2·0.75·0.3·0.6·0.1·0.4 = 2.2·10−6 ◮ P(Tright) =

0.05 · 0.1 · 0.2 · 0.15 · 0.75 · 0.75 · 0.3 · 0.6 · 0.1 · 0.4 = 6.1 · 10−7

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 36/40

slide-37
SLIDE 37

TU Graz - Signal Processing and Speech Communication Laboratory

Problems with PCFGs

◮ Poor independence assumption: PCFGs assume that all rules

are essentially independent but, e.g. in English NP → Pro more likely when in subject position

◮ Solution: parent annotation – annotate each node with its

parent in the parse tree, e.g. NP ˆS

◮ Lack of lexical conditioning: Difficult to incorporate lexical

information; Pre-terminal rules can inherit important information from words which help to make choices higher up the parse, e.g. Workers dumped sacks into a bin.

◮ Solution: lexicalized grammar – each non-terminal is

annotated with its lexical head; e.g. V P(dumped) → V BD(dumped) NP(sacks) PP(into)

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 37/40

slide-38
SLIDE 38

TU Graz - Signal Processing and Speech Communication Laboratory

Conclusion

◮ Basic parsing approach (without constraints) not practical in

real applications

◮ Whatever approach taken, lexicon is the real bottleneck ◮ Grammar to analyze sentences ◮ Two architectures for syntactic parsing are Top-Down and

Bottom-Up

◮ Dynamic programming parsing algorithms (CKY) efficiently

parse ambiguous sentences but do not solve them

◮ Probabilistic parser compute the probability of each

interpretation and choose the most probable interpretation

◮ Probabilities can be derived from hand-parsed ”tree-banks” ◮ Better models for solving all kinds of ambiguities – parent

annotation, lexicalized grammar

Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 38/40

slide-39
SLIDE 39

TU Graz - Signal Processing and Speech Communication Laboratory

References

  • N. Chater, C.D. Manning,

“Probabilistic models of language processing and acquisition,” Trends in Cognitive Sciences, vol. 10, pp. 335–344, 2006.

  • D. Jurafsky, J. H. Martin,

“Speech and Language Processing”, Pearson International Edition, 2009. Fuchs Anna & L¨ aßer Andreas Advanced Signal Processing 2 page 39/40

slide-40
SLIDE 40