Noisy Channel Models CMSC 723 / LING 723 / INST 725 M ARINE C - - PowerPoint PPT Presentation

noisy channel models
SMART_READER_LITE
LIVE PREVIEW

Noisy Channel Models CMSC 723 / LING 723 / INST 725 M ARINE C - - PowerPoint PPT Presentation

Noisy Channel Models CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu T oday HW1, Q&A Weighted FSAs Noisy Channel Models Project 1 HW1: your goals for the class based on word frequency i 58 research 17


slide-1
SLIDE 1

Noisy Channel Models

CMSC 723 / LING 723 / INST 725 MARINE CARPUAT

marine@cs.umd.edu

slide-2
SLIDE 2

T

  • day
  • HW1, Q&A
  • Weighted FSAs
  • Noisy Channel Models
  • Project 1
slide-3
SLIDE 3

HW1: your goals for the class based

  • n word frequency

i 58 to 57 and 53 in 45 the 33 , 27

  • f

26 learn 20 nlp 19 a 18 my 18 this 18

  • n

9 research 17 linguistics 15 language 14 some 14 processing 12 be 11 computational11 natural 11 want 11 have 10 how 10 is 10 like 10

slide-4
SLIDE 4

HW1: word frequency distribution

slide-5
SLIDE 5

HW1: your goals for the class based

  • n word frequency (no stopwords)

learn 20 nlp 19 research 17 linguistics 15 language 14 processing 12 want 11 natural 11 computational11 like 10 understanding9 machine 9 security 2 hci 2 visualization1 social 1 search 1 probabilistic1 news 1 media 1 linguistic 1 interactive 1 interaction 1 human-in-the-loop 1 human-computer 1 techniques 6 projects 6 class 6 apply 6 models 5 interested 5 goal 5 work 4 systems 4 study 4 human 4 data 4 computer 4 applications 4

slide-6
SLIDE 6

HW1: probability review

Suppose that 1/100,000 of the population has the ability to read other people's minds. You have a test that, if someone can read minds, reads positive with 95% probability; and, if someone cannot read minds, reads negative with 99.5% probability. I take the test and it reads positive. What is the probability that I can do mind reading? (Express your answer as real number in [0,1])

slide-7
SLIDE 7

T

  • day
  • HW1, Q&A
  • Weighted FSAs
  • Noisy Channel Models
  • Project 1
slide-8
SLIDE 8

Bigram Language Model

I am Sam Sam I am I do not like green eggs and ham <s> <s> <s> </s> </s> </s>

Training Corpus P( I | <s> ) = 2/3 = 0.67 P( Sam | <s> ) = 1/3 = 0.33 P( am | I ) = 2/3 = 0.67 P( do | I ) = 1/3 = 0.33 P( </s> | Sam )= 1/2 = 0.50 P( Sam | am) = 1/2 = 0.50 ... Bigram Probability Estimates

slide-9
SLIDE 9

FSA as a language model

he saw me he ran home she talked

How does this FSA language model differ from a bigram model?

slide-10
SLIDE 10

Weighted FSAs

  • Assigns a probability to each string that it

accepts

  • Usually probabilities add up to one

– But not necessary

  • Strings that are not accepted are said to

have probability zero

slide-11
SLIDE 11

Weighted FSA as a language model

slide-12
SLIDE 12

Weighted Finite-State Automata

  • We can view n-gram language models as

weighted finite state automata

  • We can also define weighted finite-state

transducers

– Generates pairs of strings and assigns a weight to each pair – Weight can often be interpreted as conditional probability P(output-string | input-string)

slide-13
SLIDE 13

T

  • day
  • HW1, Q&A
  • Weighted FSAs
  • Noisy Channel Models
  • Project 1
slide-14
SLIDE 14

Noisy Channel Models

  • Divide-and-conquer strategy common in

NLP modeling

  • Goal: recover X from Y (decoding)

P(X) source model P(Y|X) channel model X* = argmax_x P(X|Y)

slide-15
SLIDE 15

Noisy Channel Models: Spelling correction

slide-16
SLIDE 16

Noisy Channel Models: T

  • kenization
slide-17
SLIDE 17

Noisy Channel Models: Speech Recognition

slide-18
SLIDE 18

Weighted FSTs and the Noisy Channel Model

slide-19
SLIDE 19

Exercise:

  • Define noisy channel models for

– Machine translation from French to English – Question Answering

slide-20
SLIDE 20

T

  • day
  • HW1, Q&A
  • Weighted FSAs

– and how they relate to n-gram models

  • Noisy Channel Models

– Source model, Channel model, Decoding

  • Project 1
slide-21
SLIDE 21

Recall: Complete Morphological Parser

slide-22
SLIDE 22

Recall: Practical NLP Applications

  • In practice, it is almost never necessary to write FSTs by

hand…

  • Typically, one writes rules:

– Chomsky and Halle Notation: a → b / c__d = rewrite a as b when occurs between c and d – E-Insertion rule

  • Rule → FST compiler handles the rest…

ε → e / x s z ^ __ s #

slide-23
SLIDE 23

P1: practical details

www.cs.umd.edu/class/fall2015/cmsc723/p1.html

Teams of 2 or 3 Due before class on Tu Sep 29 Submit code/outputs using handin (see details in piazza post)

slide-24
SLIDE 24

T

  • day
  • HW1, Q&A
  • Weighted FSAs
  • Noisy Channel Models
  • Project 1
slide-25
SLIDE 25

What’s next…

  • Supervised classification, neural networks,

and neural language modeling

  • Project 1 lab