Statistical Natural Language Processing Dr. Besnik Fetahu Overview - - PowerPoint PPT Presentation

statistical natural language processing
SMART_READER_LITE
LIVE PREVIEW

Statistical Natural Language Processing Dr. Besnik Fetahu Overview - - PowerPoint PPT Presentation

Statistical Natural Language Processing Dr. Besnik Fetahu Overview POS tagging Morphology Phrase structure and ambiguities Semantics POS tagging Group words of a language into classes which show similar syntactic behavior


slide-1
SLIDE 1

Statistical Natural Language Processing

  • Dr. Besnik Fetahu
slide-2
SLIDE 2

Overview

  • POS tagging
  • Morphology
  • Phrase structure and ambiguities
  • Semantics
slide-3
SLIDE 3

POS tagging

  • Group words of a language into classes which

show similar syntactic behavior

  • These classes are known as grammatical

categories or part-of-speech tags

  • The most important classes are nouns (NN),

verbs (VB), and adjectives (JJ)

  • Nouns: refer to people, animals, concepts, things
  • Verbs: usually are used to express an action
  • Adjectives: usually describe properties of nouns
slide-4
SLIDE 4

POS tagging

The 𝒕𝒃𝒆 𝒋𝒐𝒖𝒇𝒎𝒎𝒋𝒉𝒇𝒐𝒖 𝒉𝒔𝒇𝒇𝒐 𝒈𝒃𝒖 …

  • ne is in the corner.

Children eat sweet candy. Nouns (NN) Verbs (VB) Adjectives (JJ) adjectives Children refers to a group

  • f people, whereas candy

refers a type of food. Sweet is an adjective as it describes the attribute of candy. Eat is a verb as it describes an action, that of children eating candy.

slide-5
SLIDE 5

POS tagging - Ambiguities

Children eat sweet candy. Nouns (NN) Verbs (VB) Adjectives (JJ) Children refers to a group

  • f people, whereas candy

refers a type of food. Sweet is an adjective as it describes the attribute of candy. Eat is a verb as it describes an action, that of children eating candy. Sweet can be a noun (i.e. in British English) meaning the same as candy Candy can be a verb describing the act of preserving (e.g. fruit).

slide-6
SLIDE 6

POS tagging – Open vs. Closed categories

  • Open or lexical categories:
  • NN, VB, JJ: constantly there are new additions into

the vocabulary of any language, thus, they for each vocabulary they represent the largest groups of words

  • Closed or functional categories:
  • PRP, DT: they have few members. The words

belonging to these categories have a clear grammatical use.

slide-7
SLIDE 7

POS tagging - Morphology

  • Words categories are systematically related by the

process known as morphological processes, which include:

  • Inflection
  • Derivation
  • Compounding
  • Word morphology deals with word form change
  • Dependent on the language word forms may vary
  • greatly. E.g. in Finish a verb can have up to 10,000

verb forms

slide-8
SLIDE 8

POS tagging - Inflection

  • Are the systematic modifications of a root by

means of prefixes and to indicate grammatical distinctions like singular and plural.

  • It does not change the class the word belongs

to.

  • It may vary features such as: tense, number,

and plurality.

slide-9
SLIDE 9

POS tagging - Derivation

  • Derivation is less systematic.
  • It usually results in the word changing its syntactic

class:

  • E.g., for wide vs. widely, change from adjective to adverb
  • In some cases, for some adjectives we cannot derive

the adverbial form (e.g. old, difficult).

  • Changing adjectives to verbs through the suffix en
  • Changing verbs to adjectives through the suffix able
slide-10
SLIDE 10

POS tagging - Compounding

  • Compounding refers to the process of merging two
  • r more words together
  • Most frequent is noun-noun compounds
  • Other cases include adjectives and verbs:
  • Over take
  • Mad cow disease

??? Donau dampf schif fahrts gesellschaft Rind fleisch etikettierungs überwachungs aufgaben übertragungs gesetz

slide-11
SLIDE 11

Nouns

  • Nouns refer to entities in the real world, e.g. people, things,

animals etc.

  • In English, nouns have only one inflection: plural vs. singular
  • In plural, usually the nouns have the suffix s
  • English does not have a system for gender inflections
  • The genitive describes the possessor. E.g., the phrase “the

woman’s house” indicates that the woman owns the house.

number Singular, plural gender Feminine, masculine, neuter case Nominative, genitive, dative, accusative

slide-12
SLIDE 12

Pronouns

  • Pronouns act like variables in that they refer to a person or thing that

is somehow salient in the discourse context.

  • Pronouns distinguish the number of their antecedent, they also mark

person (1st = speaker, 2nd = hearer, or 3rd = other discourse entities).

  • In the English language, pronouns change when they are in the

function of a subject (nominative) or object (accusative) in a sentence.

slide-13
SLIDE 13

Words that accompany nouns: Determiners and adjectives

  • Determiners describe a particular reference to a noun.
  • ``The’’ article indicates something or someone that is known or

that we can uniquely determine.

  • ``A’’ or ``An’’ indicates a person or thing that was not previously

mentioned.

  • ``This`` or ``That’’ are demonstrative determiners
  • Adjectives describe properties of nouns
  • Such uses are usually referred to as attributive or adnominal

(e.g. “a red rose”)

  • Morphological modifications:
  • Comparative e.g. richer, smarter
  • Superlative e.g. richest, smartest
  • Some cases (e.g. periphrastic) comparatives and superlatives

are formed with auxiliary words (e.g. more or most)

slide-14
SLIDE 14

Words that accompany nouns: Determiners and adjectives

  • Quantifiers are words that express ideas like “all”,

“many”, “some”

  • Interrogative pronouns and determiners
  • Used in questions and relative clauses
  • E.g. “which” (interrogative determiner), “whose”

(interrogative pronoun)

slide-15
SLIDE 15

Verbs

  • Verbs are used to describe actions, states, activities
  • The base form of a verb is in present tense (walk)
  • Infinitive is formed with the base form with to (to walk)
  • Progressive uses the suffix –ing and it indicates that an

action is in progress

  • Suffix –ed is used as a suffix for past tense, but also helps in

forming the present/past perfect (e.g. has walked, had walked)

  • Modals express possibilities or obligations
slide-16
SLIDE 16

Adverbs

  • Adverbs act similarly as adjectives, in that they

modify verbs

  • Adverbs specify place, time, manner or degree
  • E.g. ”She often travels to Las Vegas” or
  • E.g. “She started her career off very impressively”
  • Some adverbs modify adjectives
  • E.g. “a shockingly frank exchange”
slide-17
SLIDE 17

Prepositions

  • Prepositions express spatial relationships
  • E.g. “in the glass”, “on the table”, “over their heads”
  • Particles are a subclass of prepositions that can

enter into strong-bonds with verbs in the formation

  • f phrasal verbs.
  • E.g. “The plane took off at Sam.”
  • E.g. “He put me off”
  • In some cases, we need to know the meaning of

the sentence to distinguish particles and prepositions

  • E.g. “She ran up a hill” (preposition) vs. “She ran up a

bill” (particle)

slide-18
SLIDE 18

Conjunctions and Subordinating Conjunctions

  • Coordinating conjunctions coordinate two words or phrases
  • f (usually) the same category :
  • husband and wife [nouns]
  • She bought leased the car. [verbs]

the green triangle and the blue square [noun phrases]

  • She bought her car, but she also considered leasing it. [sentences]
  • Another function of coordinating conjunctions is to link two

sentences (or clauses), e.g.:

  • She said that he would be late. [proposition]
  • She complained because he was late. [reason]
  • I won’t wait if he is late. [condition]
  • She thanked him although he was late. [concession]
  • She left before he arrived. [temporal]
slide-19
SLIDE 19

Phrase Structure

  • Languages have constraints on word order
  • Words are organized into phrases, which gives rise

to the idea of such groups behaving as constituents

  • Constituents can be detected by their being able to
  • ccur in various positions, and showing uniform

syntactic possibilities for expansion.

  • I put the bagels in the freezer.
  • The bagels, I put in the freezer.
  • I put in the fridge the bagels (that John had given me).
slide-20
SLIDE 20

Phrase Structure

  • A whole sentence is given the category S.
  • A sentence normally rewrites as a subject noun

phrase NP and a verb phrase VP .

slide-21
SLIDE 21

Noun Phrases

  • The noun is the head of the noun phrase
  • Noun phrases are usually the arguments of

verbs

  • Noun phrases normally consist of an optional

determiner, zero or more adjective phrases, a noun head, and then perhaps post-modifiers (e.g. prepositional phrases or clausal modifiers)

The homeless old man in the park that I tried to help yesterday

slide-22
SLIDE 22

Verb Phrases

  • The verb is the head of a verb phrase
  • Verb phrases organize all elements of a

sentence that depend syntactically on the verb

  • Getting to school on time was a struggle.
  • He was trying to keep his temper.
  • That woman quickly showed me the to hide.
slide-23
SLIDE 23

Prepositional and Adjective Phrases

  • Prepositional phrases (PP) are headed by a

preposition and contain a noun phrase complement

  • PP can appear within all the other major phrase

types (i.e., noun phrases and verb phrases) and usually express spatial and temporal locations and

  • ther attributes.
  • Complex adjective phrases are not so common:
  • She is very sure of herself; He seemed a man who was

quite certain to succeed.

slide-24
SLIDE 24

Phrase structure grammars

  • Syntactic analysis of a sentence tells us the meaning of a

sentence from the meaning of the words

  • In English, the basic word order is subject-verb-object
  • This order is modified only to express particular ‘mood’

categories.

  • E.g. In interrogatives (or questions), the subject and first auxiliary

verb are inverted

  • In imperatives there is no subject:

The children (subject) should (auxiliary verb) eat spinach (object). Should (auxiliary verb) the children (subject) eat spinach (object)? Eat spinach!

slide-25
SLIDE 25

Phrase structure grammars

Rewrite rules Derivations of the rewrite rules

slide-26
SLIDE 26

Phrase structure grammars

Local tree representation of the derivation rules Bracketing of the derivation rules

[S [NP [AT The] [NNS children]] [VP [VBD ate] [NP [AT the] [NN cake]]]]

slide-27
SLIDE 27

Phrase structure grammars

The women who found the wallet were given a reward.

Subject-Verb agreement Recursive constituency

slide-28
SLIDE 28

Phrase structure grammars

Should Peter buy a book? Which book should Peter buy? Long dependencies (anything beyond a tri-gram)

slide-29
SLIDE 29

Phrase structure ambiguity

  • In most cases, there are many different phrase

structure trees that could all have given rise to a particular sequence of words.

  • This phenomenon is called phrase structure

ambiguity or syntactic ambiguity.

  • One type of syntactic ambiguity that is

particularly frequent is attachment ambiguity.

slide-30
SLIDE 30

Phrase structure ambiguity – attachment ambiguity

The children ate the cake with a spoon. “High” attachment to the verb phrase makes a statement about the instrument that the children used while eating the cake “Low” attachment to the noun phrase tells us which cake was eaten

slide-31
SLIDE 31

Phrase structure ambiguity – attachment ambiguity

  • How can we resolve the attachment ambiguity?
  • Garden phenomena: A garden path sentence leads

you along a path that suddenly turns out not to work.

  • Garden pathing is the phenomenon of first

being tricked into adopting a spurious parse and then having to backtrack to try to construct the right parse.

The horse raced past the barn fell. The horse raced past the barn …

Fell cannot be added to this parse. So we have to backtrack to raced and construct a completely different parse, corresponding to the meaning The horse fell after it had been the barn.

slide-32
SLIDE 32

Semantics and Pragmatics

  • Semantics is the study of the meaning of

words, constructions, and utterances.

  • Semantics can be divided into:
  • Lexical semantics
  • Phrase/Sentence semantics
slide-33
SLIDE 33

Lexical Semantics

  • The basis of lexical semantics is the study of how

words are related to each other. This include several word relatedness cases:

  • Lexical hierarchies (i.e. hypernymy and hyponymy)
  • Antonyms (words with opposite meaning)
  • Meronymy (words that form part of relationships)
  • Synonyms (words with the same meaning)
  • Homonyms (words that are written the same but have

different meaning)

  • Lexical ambiguity can refer to both homonymy and

polysemy.

slide-34
SLIDE 34

Phrase/Sentence semantics

  • Knowing the meaning of individual words does not

guarantee that we can resolve the meaning of a phrase or sentence:

  • white paper, white hair, white skin, white wine
  • The meaning of the whole is the sum of the

meanings of the parts plus some additional semantics that cannot be predicted from the parts.

  • If the relationship between the meaning of the words

and the meaning of the phrase is completely

  • paque, we call the phrase an idiom.
slide-35
SLIDE 35

Phrase/Sentence semantics

  • One central part in determining the correct semantics
  • f a sentence is the resolution of anaphoric relations.
  • Another crucial part in determining the semantics of a

phrase is pragmatics.

  • Pragmatics study how knowledge about the world and

language conventions interact with literal meaning:

  • E.g. knowing that hurricanes are disasters.

Mary helped Peter get out of the cab. He thanked her. Mary helped the other passenger out of the cab. The man had asked her to help him because of his foot injury.

slide-36
SLIDE 36

Next Lecture

  • Language Models
  • Maximum Likelihood Estimation
  • Language Model Smoothing techniques