For Thursday No new reading Homework: Chapter 23, exercise 15 - - PowerPoint PPT Presentation

for thursday
SMART_READER_LITE
LIVE PREVIEW

For Thursday No new reading Homework: Chapter 23, exercise 15 - - PowerPoint PPT Presentation

For Thursday No new reading Homework: Chapter 23, exercise 15 Homework Instructions 1. Pick a machine translation system. 2. Write (or find) 5 sentences of varying complexity in English. 3. Pick a language (A). 4. For each


slide-1
SLIDE 1

For Thursday

  • No new reading
  • Homework:

– Chapter 23, exercise 15

slide-2
SLIDE 2

Homework Instructions

1. Pick a machine translation system. 2. Write (or find) 5 sentences of varying complexity in English. 3. Pick a language (A). 4. For each sentence from 1, translate it into language A and back to English. Then run that result back through the same language and back to English. 5. Pick a second, very different, language (B). 6. Redo step 4 with language B. 7. Turn in each of the 5 versions of the sentences in English (25 “sentences” total) and what the two languages are plus a discussion of the results.

slide-3
SLIDE 3

Program 5

slide-4
SLIDE 4

Syntactic Parsing

  • Given a string of words, determine if it is

grammatical, i.e. if it can be derived from a particular grammar.

  • The derivation itself may also be of interest.
  • Normally want to determine all possible

parse trees and then use semantics and pragmatics to eliminate spurious parses and build a semantic representation.

slide-5
SLIDE 5

Parsing Complexity

  • Problem: Many sentences have many

parses.

  • An English sentence with n prepositional

phrases at the end has at least 2n parses.

I saw the man on the hill with a telescope on Tuesday in Austin...

  • The actual number of parses is given by the

Catalan numbers:

1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796...

slide-6
SLIDE 6

Parsing Algorithms

  • Top Down: Search the space of possible

derivations of S (e.g.depth-first) for one that matches the input sentence.

I saw the man. S -> NP VP NP -> Det Adj* N Det -> the Det -> a Det -> an NP -> ProN ProN -> I VP -> V NP V -> hit V -> took V -> saw NP -> Det Adj* N Det -> the Adj* -> e N -> man

slide-7
SLIDE 7

Parsing Algorithms (cont.)

  • Bottom Up: Search upward from words

finding larger and larger phrases until a sentence is found.

I saw the man. ProN saw the man ProN -> I NP saw the man NP -> ProN NP N the man N -> saw (dead end) NP V the man V -> saw NP V Det man Det -> the NP V Det Adj* man Adj* -> e NP V Det Adj* N N -> man NP V NP NP -> Det Adj* N NP VP VP -> V NP S S -> NP VP

slide-8
SLIDE 8

Bottom-up Parsing Algorithm

function BOTTOM-UP-PARSE(words, grammar) returns a parse tree forest  words loop do if LENGTH(forest) = 1 and CATEGORY(forest[1]) = START(grammar) then return forest[1] else i  choose from {1...LENGTH(forest)} rule  choose from RULES(grammar) n  LENGTH(RULE-RHS(rule)) subsequence  SUBSEQUENCE(forest, i, i+n-1) if MATCH(subsequence, RULE-RHS(rule)) then forest[i...i+n-1] / [MAKE-NODE(RULE-LHS(rule), subsequence)] else fail end

slide-9
SLIDE 9

Chart Parsers

slide-10
SLIDE 10

Augmented Grammars

  • Simple CFGs generally insufficient:

“The dogs bites the girl.”

  • Could deal with this by adding rules.

– What’s the problem with that approach?

  • Could also “augment” the rules: add

constraints to the rules that say number and person must match.

slide-11
SLIDE 11

Verb Subcategorization

slide-12
SLIDE 12

Semantics

  • Need a semantic representation
  • Need a way to translate a sentence into that

representation.

  • Issues:

– Knowledge representation still a somewhat

  • pen question

– Composition “He kicked the bucket.” – Effect of syntax on semantics

slide-13
SLIDE 13

Dealing with Ambiguity

  • Types:

– Lexical – Syntactic ambiguity – Modifier meanings – Figures of speech

  • Metonymy
  • Metaphor
slide-14
SLIDE 14

Resolving Ambiguity

  • Use what you know about the world, the

current situation, and language to determine the most likely parse, using techniques for uncertain reasoning.

slide-15
SLIDE 15

Discourse

  • More text = more issues
  • Reference resolution
  • Ellipsis
  • Coherence/focus
slide-16
SLIDE 16

Survey of Some Natural Language Processing Research

slide-17
SLIDE 17

Speech Recognition

  • Two major approaches

– Neural Networks – Hidden Markov Models

  • A statistical technique
  • Tries to determine the probability of a certain string
  • f words producing a certain string of sounds
  • Choose the most probable string of words
  • Both approaches are “learning” approaches
slide-18
SLIDE 18

Syntax

  • Both hand-constructed approaches and data-

driven or learning approaches

  • Multiple levels of processing and goals of

processing

  • Most active area of work in NLP (maybe

the easiest because we understand syntax much better than we understand semantics and pragmatics)

slide-19
SLIDE 19

POS Tagging

  • Statistical approaches--based on probability
  • f sequences of tags and of words having

particular tags

  • Symbolic learning approaches

– One of these: transformation-based learning developed by Eric Brill is perhaps the best known tagger

  • Approaches data-driven
slide-20
SLIDE 20

Developing Parsers

  • Hand-crafted grammars
  • Usually some variation on CFG
  • Definite Clause Grammars (DCG)

– A variation on CFGs that allow extensions like agreement checking – Built-in handling of these in most Prologs

  • Hand-crafted grammars follow the different

types of grammars popular in linguistics

  • Since linguistics hasn’t produced a perfect

grammar, we can’t code one

slide-21
SLIDE 21

Efficient Parsing

  • Top down and bottom up both have issues
  • Also common is chart parsing

– Basic idea is we’re going to locate and store info about every string that matches a grammar rule

  • One area of research is producing more

efficient parsing

slide-22
SLIDE 22

Data-Driven Parsing

  • PCFG - Probabilistic Context Free

Grammars

  • Constructed from data
  • Parse by determining all parses (or many

parses) and selecting the most probable

  • Fairly successful, but requires a LOT of

work to create the data

slide-23
SLIDE 23

Applying Learning to Parsing

  • Basic problem is the lack of negative

examples

  • Also, mapping complete string to parse

seems not the right approach

  • Look at the operations of the parse and

learn rules for the operations, not for the complete parse at once

slide-24
SLIDE 24

Syntax Demos

  • http://www2.lingsoft.fi/cgi-bin/engcg
  • http://nlp.stanford.edu:8080/parser/index.jsp
  • http://teemapoint.fi/nlpdemo/servlet/ParserS

ervlet

  • http://www.link.cs.cmu.edu/link/submit-

sentence-4.html

slide-25
SLIDE 25

Language Identification

  • http://rali.iro.umontreal.ca/
slide-26
SLIDE 26

Semantics

  • Most work probably hand-constructed

systems

  • Some more interested in developing the

semantics than the mappings

  • Basic question: what constitutes a semantic

representation?

  • Answer may depend on application
slide-27
SLIDE 27

Possible Semantic Representations

  • Logical representation
  • Database query
  • Case grammar
slide-28
SLIDE 28

Distinguishing Word Senses

  • Use context to determine which sense of a

word is meant

  • Probabilistic approaches
  • Rules
  • Issues

– Obtaining sense-tagged corpora – What senses do we want to distinguish?

slide-29
SLIDE 29

Semantic Demos

  • http://www.cs.utexas.edu/users/ml/geo.html
  • http://www.ling.gu.se/~lager/Mutbl/demo.ht

ml

slide-30
SLIDE 30

Information Retrieval

  • Take a query and a set of documents.
  • Select the subset of documents (or parts of

documents) that match the query

  • Statistical approaches

– Look at things like word frequency

  • More knowledge based approaches

interesting, but maybe not helpful

slide-31
SLIDE 31

Information Extraction

  • From a set of documents, extract

“interesting” pieces of data

  • Hand-built systems
  • Learning pieces of the system
  • Learning the entire task (for certain versions
  • f the task)
  • Wrapper Induction
slide-32
SLIDE 32

IE Demos

  • http://services.gate.ac.uk/annie/