Natural Language Processing with Python CS372: Spring, 20 15 - - PowerPoint PPT Presentation

natural language processing with python
SMART_READER_LITE
LIVE PREVIEW

Natural Language Processing with Python CS372: Spring, 20 15 - - PowerPoint PPT Presentation

Natural Language Processing with Python CS372: Spring, 20 15 Lecture 12 Categorizing and Tagging Words Jong C. Park Department of Computer Science Korea Advanced Institute of Science and Technology CATEGORIZING AND TAGGING WORDS Using a


slide-1
SLIDE 1

Natural Language Processing with Python

CS372: Spring, 20 15 Lecture 12 Categorizing and Tagging Words

Jong C. Park Department of Computer Science Korea Advanced Institute of Science and Technology

slide-2
SLIDE 2

CATEGORIZING AND TAGGING WORDS

Using a Tagger Tagged Corpora Mapping Words to Properties Using Python Dictionaries Automatic Tagging N-Gram Tagging Transformation-based Tagging How to Determine the Category of a Word

2015-04-09 CS372: NLP with Python 2

slide-3
SLIDE 3

 Questions

  • What are lexical categories, and how are they

used in natural language processing?

  • What is a good Python data structure for

storing words and their categories?

  • How can we automatically tag each word of a

text with its word class?

2015-04-09

Introduction

CS372: NLP with Python 3

slide-4
SLIDE 4

 Indexing Lists Versus Dictionaries  Dictionaries in Python  Defining Dictionaries  Default Dictionaries  Incrementally Updating a Dictionary  Complex Keys and Values  Inverting a Dictionary

2015-04-09 CS372: NLP with Python 4

Mapping Words to Properties Using Python Dictionaries

dictionary data type

slide-5
SLIDE 5

 List

  • A text is treated in Python as a list of words.
  • We can look up a particular item by giving its

index.

  • text1[100]

 Figure 5-2. List lookup.

2015-04-09 CS372: NLP with Python 5

Indexing Lists Versus Dictionaries

slide-6
SLIDE 6

 With frequency distributions, we specify a

word and get back a number.

  • fdist[‘monstrous’]

 Figure 5-3. Dictionary lookup.

2015-04-09 CS372: NLP with Python 6

Indexing Lists Versus Dictionaries

Other names for dictionary are map, hashmap, hash, and associative array.

slide-7
SLIDE 7

 In Figure 5-3, we mapped from names to

numbers, unlike with a list.

 Table 5-4. Linguistic objects as mappings

from keys to values.

2015-04-09 CS372: NLP with Python 7

Indexing Lists Versus Dictionaries

The mapping is from a “word” to some structured object.

slide-8
SLIDE 8

2015-04-09 CS372: NLP with Python 8

Dictionaries in Python

 Python provides a dictionary data type

that can be used for mapping between arbitrary types.

pos is defined as an empty dictionary.

slide-9
SLIDE 9

2015-04-09 CS372: NLP with Python 9

Dictionaries in Python

 We can employ the keys to retrieve values.  Question:

  • How do we work out the legal keys for a

dictionary, where in the case of lists and strings we can use len() to work out which integers will be legal indexes?

If the dictionary is not big, we can simply inspect its contents by evaluating the variable pos.

slide-10
SLIDE 10

 To just find the keys, we can either convert

the dictionary to a list or use the dictionary in a context where a list is expected, as the parameter of sorted() or in a for loop.

2015-04-09 CS372: NLP with Python 10

Dictionaries in Python

slide-11
SLIDE 11

2015-04-09 CS372: NLP with Python 11

Dictionaries in Python

 The dictionary methods keys(), values(),

and items() allow us to access the keys, values, and key-value pairs as separate lists.

slide-12
SLIDE 12

 When we look something up in a dictionary, we

get only one value for each key.

 However, there is a way of storing multiple

values in an entry.

 We may use a list value, e.g., pos[‘sleep’] = [‘N’,

‘V’].

2015-04-09 CS372: NLP with Python 12

Dictionaries in Python

  • Cf. the CMU Pronouncing Dictionary
slide-13
SLIDE 13

2015-04-09 CS372: NLP with Python 13

Defining Dictionaries

 We can use the same key-value pair format to

create a dictionary.

 Dictionary keys must be immutable types, such

as strings and tuples.

slide-14
SLIDE 14

2015-04-09 CS372: NLP with Python 14

Default Dictionaries

When we access a non-existent entry, it is automatically added to the dictionary. int, float, str, list, dict, tuple

 If we try to access a key that is not in a dictionary,

we get an error.

 Since Python 2.5, a special kind of dictionary

called a defaultdict has been available.

slide-15
SLIDE 15

 We can use default dictionaries to deal

with hapaxes and low frequency words.

2015-04-09 CS372: NLP with Python 15

Default Dictionaries

We can replace low frequency words with a special “out of vocabulary” token.

slide-16
SLIDE 16

 Example 5-3. Incrementally updating a

dictionary, and sorting by value.

2015-04-09 CS372: NLP with Python 16

Incrementally Updating a Dictionary

slide-17
SLIDE 17

2015-04-09 CS372: NLP with Python 17

Incrementally Updating a Dictionary

slide-18
SLIDE 18

2015-04-09 CS372: NLP with Python 18

Incrementally Updating a Dictionary

itemgetter(n) returns a function that can be called on some other sequence object to obtain the nth element.

slide-19
SLIDE 19

2015-04-09 CS372: NLP with Python 19

Incrementally Updating a Dictionary

 Useful programming idiom:

  • We initialize a defaultdict and then use a for

loop to update its values.

slide-20
SLIDE 20

2015-04-09 CS372: NLP with Python 20

Incrementally Updating a Dictionary

 The following example uses the same

pattern to create an anagram dictionary.

 NLTK provides a convenient way of

accumulating words through nltk.Index().

slide-21
SLIDE 21

 Default dictionaries can have complex

keys and values.

2015-04-09 CS372: NLP with Python 21

Complex Keys and Values

slide-22
SLIDE 22

2015-04-09 CS372: NLP with Python 22

Inverting a Dictionary

 Dictionaries support efficient lookup.

  • However, finding a key given a value is slower

and more cumbersome.

  • If we expect to do this kind of “reverse lookup”
  • ften, it helps to construct a dictionary that

maps values to keys.

slide-23
SLIDE 23

 Examples of reverse lookup

2015-04-09 CS372: NLP with Python 23

Inverting a Dictionary

slide-24
SLIDE 24

 Table 5-5. Python’s dictionary methods.

2015-04-09 CS372: NLP with Python 24

Inverting a Dictionary

slide-25
SLIDE 25

 The Default Tagger  The Regular Expression Tagger  The Lookup Tagger  Evaluation

2015-04-09 CS372: NLP with Python 25

Automatic Tagging

>>> from nltk.corpus import brown >>> brown_tagged_sents = brown.tagged_sents(categories=‘news’) >>> brown_sents = brown.sents(categories=‘news’)

slide-26
SLIDE 26

2015-04-09 CS372: NLP with Python 26

The Default Tagger

 The simplest possible tagger assigns the

same tag to each token.

  • It establishes an important baseline.
  • In order to get the best result, we tag each

word with the most likely tag.

Pros?

slide-27
SLIDE 27

2015-04-09 CS372: NLP with Python 27

The Regular Expression Tagger

 Assign tags to tokens on the basis of

matching patterns.

slide-28
SLIDE 28

2015-04-09 CS372: NLP with Python 28

The Lookup Tagger

 Find the hundred most frequent words

and store their most likely tag.

  • Use it as the model for a “lookup tagger”.
slide-29
SLIDE 29

 Example 5-4. Lookup tagger performance

with varying model size.

2015-04-09 CS372: NLP with Python 29

The Lookup Tagger

slide-30
SLIDE 30

2015-04-09 CS372: NLP with Python 30

The Lookup Tagger

slide-31
SLIDE 31

 We evaluate the performance of a tagger

relative to the tags a human expert would assign.

  • Since we usually don’t have access to an

expert and impartial human judge we make do instead with gold standard test data.

  • The tagger is regarded as being correct if the

tag it guesses for a given word is the same as the gold standard tag.

2015-04-09 CS372: NLP with Python 31

Evaluation

slide-32
SLIDE 32

 Mapping Words to Properties Using

Python Dictionaries

  • Indexing Lists Versus Dictionaries
  • Dictionaries in Python
  • Defining Dictionaries
  • Default Dictionaries
  • Incrementally Updating a Dictionary
  • Complex Keys and Values
  • Inverting a Dictionary

2015-04-09

Summary

CS372: NLP with Python 32

slide-33
SLIDE 33

 Automatic Tagging

  • The Default Tagger
  • The Regular Expression Tagger
  • The Lookup Tagger
  • Evaluation

2015-04-09 CS372: NLP with Python 33

Summary

slide-34
SLIDE 34

 30 April, 2015 (in class)  Prepare a 5 minute presentation for your

term project (approximately 7 slides).

  • The project must have a clear I/ O.
  • Explain the measure of the quality of the
  • utput.
  • Give a measure of how good your system is

(together with a prediction of your system’s performance against this measure).

CS372: NLP with Python 34

Project: First Presentation

2015-04-09