Introduction to NLP Prof. Rajeev Sangal 7/5/2012 IIIT-H Advanced - - PDF document

introduction to nlp
SMART_READER_LITE
LIVE PREVIEW

Introduction to NLP Prof. Rajeev Sangal 7/5/2012 IIIT-H Advanced - - PDF document

IASNLP 2012 Introduction to NLP Prof. Rajeev Sangal 7/5/2012 IIIT-H Advanced Summer School on Natural Language Processing 23 May - 4 June 2011 IIIT Hyderabad ======================================================== INTRODUCTION


slide-1
SLIDE 1

IASNLP 2012

Introduction to NLP

  • Prof. Rajeev Sangal

7/5/2012

slide-2
SLIDE 2

IIIT-H Advanced Summer School

  • n

Natural Language Processing 23 May - 4 June 2011 IIIT Hyderabad

slide-3
SLIDE 3

======================================================== INTRODUCTION ======================================================== NLP : General

  • Language: A unique ability of humans
  • Language for communication
  • Major function of language: communication.

* How ideas get transferred?

  • Applications
  • Information retrieval

* User gives keywords, machine retrieves relevant documents.

  • Uses structure of web for ranking
  • Information extraction

* Machine goes through text, fills in a template * A "NLP programmer" has to setup the template

  • Question answering

* Goes beyond IR or IE * Understanding required

  • Machine translation

* Sentential analyzer in sourec language * Bilingual dictionary etc. * Sentential generator in target language

  • Dialogue systems

* Expectation * Focus * Topic

slide-4
SLIDE 4

NLP: Block Diagram

  • Form & meaning
  • Sentences -> Meaning

* Done in layers. Output of each layer successively makes meaning more explicit (i.e., closer to meaning or gives a representation which machine can handle more easily)

  • How can meaning be represented ?
  • Techniques for "extracting meaning"
  • Word analysis
  • Phrasal analysis
  • Sentential analysis

etc.

  • Statistical analysis: tagging,

grammatical attachment, word sense

  • Modules
  • Morphological analyzer
  • Chunker / Part-of-speech tagger
  • Sentence parser
  • Semantic processor
  • Pragmatics processor
  • Difference between:
  • Processing algorithms (NLP and ML)
  • Rules/data for processing (Computational Linguistics)
  • Understanding language (Linguistics)
  • Without understanding the nature and structure of language,

and proper structuring of the data,

  • Machine learning is not very effective.
slide-5
SLIDE 5

How do we analyze language? Consider the sentence:

  • Children are watching some programmes
  • n television in the house
  • Analysis into Chunks
  • What are the "chunks"?

+ [[ Children ]] (( are watching )) [[ some programmes ]] [[ on television ]] [[ in the house ]]

  • Chunks

* Noun chunks (NP, PP) in square brackets * Verb chunks (VG) in parentheses

  • Chunks represent objects
  • Noun chunks represent objects/concepts
  • Verb chunks represent actions
  • 1

(( NP 1.1 children )) 2 (( VG 2.1 are 2.2 watching )) 3 (( NP 3.1 some 3.2 programmes )) 4 (( PP 4.1

  • n

4.2 (( 4.2.1 television )) )) 5 (( PP 5.1 in 5.2 (( 5.2.1 the 5.2.2 house )) ))

slide-6
SLIDE 6

Part-of-speech Tagging [[ Children_NNS ]] (( are_VBP watching_VBG )) [[ some_DT programmes_NNS ]] [[ on_IN television_NN ]] [[ in_IN the_DT house_NN ]] A part-of-speech (POS) tag attached to each word. 'NNS' stands for plural common noun.

  • Towards Shakti Standard Format

1 (( NP 1.1 children NNS )) 2 (( VG 2.1 are VBP 2.2 watching VBG )) 3 (( NP 3.1 some DT 3.2 programmes NNS )) 4 (( PP 4.1

  • n

IN 4.1.1 (( NP 4.1.2 television NN )) )) 5 (( PP 5.1 in IN 5.2 (( NP 5.2.1 the DT 5.2.2 house NN )) ))

slide-7
SLIDE 7

Morphological Analysis Children < fs af=child,n,m,p,3,0,,> are < fs af=are,n,m,s,3,0,,>|< fs af=be,v,m,p,3,0,,> watching < fs af=watch,v,m,s,3,0,,/aspect='PROG'> some < fs af=some,D,m,s,3,0,,>|< fs af=some,det,m,s,3,0,,>|< fs af=some,P,m,p,3,0,,> programmes < fs af=programme,n,m,p,3,0,,>|< fs af=programme,v,m,s,3,0,, |tense='PRES'>

  • n

< fs af=on,p,m,s,3,0,,>|< fs af=on,n,m,s,3,0,,>|< fs af=on,adj,m,s,3,0,,>|< fs af=on,D,m,s,3,0,,>|< fs af=on,p,m,s,3,0,,> television < fs af=television,n,m,s,3,0,,> in < fs af=in,p,m,s,3,0,,>|< fs af=in,D,m,s,3,0,,>|< fs af=in,p,m,s,3,0,,> the < fs af=the,det,m,s,3,0,,> house < fs af=house,n,m,s,3,0,,>|< fs af=house,v,m,s,3,0,,> Shakti Standard Format

1 (( NP 1.1 children NNS < fs af='child,n,m,p,3,0,,'> )) | | | | | | | | | | | | | | | | | \ root | | |pers | | | | | cat |number case gender 2 (( VG 2.1 are VBP < fs af=are,n,m,s,3,0,,>|< fs af=be,v,m,p,3,0,,> 2.2 watching VBG < fs af=watch,v,m,s,3,0,, /aspect='PROG'> )) 3 (( NP 3.1 some DT < fs af=some,D,m,s,3,0,,>|< fs af=some,det,m,s,3,0,,>|< fs af=some,P,m,p,3,0,,> 3.2 programmes NNS < fs af=programme,n,m,p,3,0,,>|< fs af=programme,v,m,s,3,0,, /tense='PRES'> )) 4 (( PP 4.1

  • n

IN < fs af=on,p,m,s,3,0,,>|< fs af=on,n,m,s,3,0,,>|< fs af=on,adj,m,s,3,0,,>|< fs af=on,D,m,s,3,0,,>|< fs af=on,p,m,s,3,0,,> 4.1.1 (( NP 4.1.2 television NN < fs af=television,n,m,s,3,0,,> )) )) 5 (( PP 5.1 in IN < fs af=in,p,m,s,3,0,,>|< fs af=in,D,m,s,3,0,,>|< fs af=in,p,m,s,3,0,,> 5.2 (( NP 5.2.1 the DT < fs af=the,det,m,s,3,0,,> 5.2.2 house NN < fs af=house,n,m,s,3,0,,>|< fs af=house,v,m,s,3,0,,> )) ))

slide-8
SLIDE 8

Layers of Processing

  • 1. T1 - Mark n/v tags

The child has plucked a flower in the mango garden.

  • --- ------ ----- ------

n v n x n + Ex. POS tagging using dictionary lookup

  • 2. T2 - Chunking

After marking POS tags, mark the phrases/chunks and their types such as NG (noun group), VG (verb group): (The child) (has plucked) (a flower) (in the mango garden)

  • ---- ------- ------ ------

n vm n x n

  • ---------- ------------ -------- -------------------

NG VG NG NG +Ex. 'The' applies to 'child', making it definite.

  • Ex. 'a' applies to 'flower', making it indefinite.
  • 3. Putting the above two (T1 and T2) together:

| sentence V

  • |T1|
  • |sentence with POS tags

V

  • |T2|
  • |sentence with POS tags and chunks

V

  • Normally, do T1 then do T2

+ Ex. Is 'mango' (1) adjective or (2) noun? If option (1) is taken in calling mango as adjective, task T1 becomes harder, and T2 becomes easier: x (1) (2) Mango= | adjective noun

  • T1 | hard easy

T2 | easy hard

  • In real life chosen: (2)

But sometimes, one does T2 and comes back to complete T1

slide-9
SLIDE 9

======================================================== CONTEXT FREE GRAMMER ======================================================== CONTEXT-FREE GRAMMAR FOR ENGLISH

  • WRITING A TOY ENGLISH GRAMMAR:

Context free grammar or (a restricted) phrase structure grammar + An example grammar given below. The first rule says that a sentence (S) consists of a noun-phrase (NP) and a verb phrase (VP):

  • S -> NP VP
  • NP -> det adj* n
  • NP -> n-proper
  • VP -> v [NP] [NP] PP*
  • PP -> prep NP

where: + NP: noun phrase (Ex. the red block, a sharp arrow) + VP: verb phrase (Ex. lifted the red block, fired an arrow) + PP: preposition phrase (Ex. with hands, at the deer)

  • n: noun (Ex. boy, child, arrow, block)
  • n-proper: proper noun (Ex. Ram, Mohan)
  • v: verb (Ex. lift, fire, give)
  • det: determiner (Ex. a, the)
  • adj: adjective (Ex. big, red, sharp)
  • EXAMPLE PHRASE STRUCTURE TREE:

The boy fired the arrow. S | .----------. | | NP VP | | .---. .-------. | | | | det n v NP | | | .----. The boy fired | | det n | | The arrow

slide-10
SLIDE 10

PHRASE STRUCTURE TREE

  • Leaf nodes of tree, read in left to right order give us sentence

+ Ex: 'The child saw Mohan' from S | .-----------. | | NP VP | | .----. .------. | | | | det n v NP | | | | the child saw n-proper | Mohan

  • Groups - related elements togehter

+ Ex: 'the child' NP | .----. | | det n | | the child

  • Hierarchy of grouped elements

+ Ex.1: Groupings

  • v (for verb 'saw')
  • NP (for 'Mohan')

* Put them together in VP. + Ex.2: Groupings:

  • NP (for 'the child')
  • VP (for 'saw Mohan')

* They are put together in S.

  • Terminology: Mother and daughter nodes

+ Ex. NP is mother node, and 'det' and 'n' are daughter (or children) nodes.

slide-11
SLIDE 11

PHRASE STRUCTURE TREE to MEANING

  • Relating a phrase structure tree to a modifier-modified tree
  • Example sentence 1: 'arrow of son of Dasarath'

* Phrase structure tree 1: NP_1 | .--------------. | | | n of NP_2 | | | .----------. arrow | | | n of NP_3 | | son n-proper | Dasarath

  • Notion of the head of a phrase
  • What is the head of NP_1 ?

+ Consider Ex. The arrow of son of Dasarath is sharp

  • Who/what is sharp: arrow, son, Dasarath?

. Ans: Arrow. Therefore, 'arrow' is the head of NP_1 + Consider Ex. The son of Dasarath is sincere.

  • Who is sincere: Son, Dasarath?

. Ans: Son. Therefore, 'son' is the head of NP_2 * Head of a phrase is determined by rules of the language:

  • In case of NPs with 'of' in English, the noun on

the left is the head.

  • Modifier-modified tree 1 (for the example on the top):

arrow | |of | son | |of | Dasarath

slide-12
SLIDE 12
  • Example sentence 2: 'Ram saw Mohan'

* Phrase structure tree: S | .----------. | | NP VP | | | .------. n-proper | | | v NP Ram | | saw n-proper | Ram * Modifier-modified tree: saw | .-----. |k1 |k2 | | Ram Mohan

  • Rules can be prepared for converting from ps-tree to mm-tree

(later)

slide-13
SLIDE 13

CFG: Example PS Trees Indicating Meaning

  • Ex.1: The child plucked a flower in the garden.

S |

  • | |

NP VP | |

  • ----- ---------------------------

| | | | | det n v NP PP | | | | | | | | ------- ---------- | | | | | | | the child plucked det n prep NP | | | | | | | ------- | | | | | a flower in det n | | the garden * Note that PP is a child of VP indicating that PP denotes a participant in the action of plucking (denoted by VP)

  • Ex.2: The child plucked a flower in the pot
  • Where the above means that flower is in the pot

(child is not in the pot, and plucking did not take place in the pot).

  • Thus 'in the pot' is related to flower directly,

but not directly to plucking. S |

  • | |

NP VP | |

  • ----- -------------------

| | | | det n V NP | | | | | | | --------------- | | | | | the child plucked NP PP | |

  • ------ --------

| | | | det n prep NP | | | | | | | ----- | | | | | a flower in det n | | the pot

slide-14
SLIDE 14

* Note PP is a child of NP in the above.

  • Indicates that PP is directly related to

(or modifies) NP, and not the VP.

  • Augment the toy grammar with following rule

NP -> NP PP NP and PP Phrase Structure

  • raama kii miThaaii kii duukaana

Ram 's sweet 's shop

  • The above has two possible meanings.

The head is the same: duukaana 1 First meaning - ( (Ram's sweet)'s shop)

  • Ex. Shop of Ram's sweet
  • ie. Shop which sells Ram's sweet

2 Second meaning - ( Ram's (sweet's shop))

  • Ex. Sweet shop of Ram
  • ie. Sweet shop which belongs to Ram
  • Structures Representing the Above Meaning

1 First structure - ( (Ram's sweet)'s shop) * Phrase structure tree NP | .--------------. | | PP NP | | .-------------. | | | duukaana NP prep | | .-----------. | | | kii PP NP | | .------. | | | miiThaaii NP prep | | raama kii * Dependency tree duukaana | miiThaaii | raama

slide-15
SLIDE 15

2 Second structure - ( Ram's (sweet's shop)) * Phrase structure tree NP | .--------------------. | | PP NP | | .-----. .-----------. | | | | NP prep PP NP | | | | | | .-------. | raama kii | | duukaana NP prep | | miiThaaii kii * Dependency tree duukaana | .-------. | | raama miiThaaii

  • Above Phrase Structures with Heads Marked (by ~)

1 Phrase structure - ( (Ram's sweet)'s shop) NP~duukaana | .--------------. | | PP~miithaaii NP | | .-------------. | | | duukaana NP~miithaaii prep | | .-----------. | | | kii PP~raama NP | | .------. | | | miiThaaii NP prep | | raama kii

slide-16
SLIDE 16

2 Phrase structure - ( Ram's (sweet's shop)) NP~duukaana | .--------------------. | | PP~raama NP~duukaana | | .-----. .-----------. | | | | NP prep PP~miThaaii NP | | | | | | .-------. | raama kii | | duukaana NP prep | | miiThaaii kii CFG: Adverb

  • To handle adverbs which come at the end, augment the VP rule

with optional 'adv': VP -> v [NP] [NP] PP* [adv] + Ex. The child plucked a flower quickly. S | |

  • | |

NP VP | |

  • ----- ---------------------

| | | | | | | | | | det n v NP adv | | | | | | | | ------- | | | | | | | the child plucked det n quickly | | | | a flower

  • To handle proper nouns and pronouns, add the rule

NP -> proper-noun | pronoun

slide-17
SLIDE 17

Relational Structure

  • Constituent Structure
  • Example sentence:

The child has plucked a flower in the mango garden.

  • -------- ----------- -------- -------------------

NG1 VG NG2 NG3 (with prep)

  • Different ways of combining the above units into larger constituents
  • How do we decide
  • Constituent structure of the room

room | .------------------------------------------------. | | | | | walls floor ceiling tables chairs | | .-----------. .------------------. | | | | | | | W1 W2 W3 W4 ceiling-slab tubelight fan

  • Is it a good way to define constituents?

. Good for inventory, not for teaching (or function)

  • Relational structure - Take basic units, connect them with labelled arcs
  • Nodes - Basic units
  • Edges - between related nodes
  • Labels on edge - Indicating type of relationship
  • Relational structure - Ex Room

/--------------> chair 2 visibility | Black board -------> table | \--------------> chair 1 adjacency

  • Relational structure for sentence above:

VG | .------------. |subj |obj |place | | | NG1 NG2 NG3 Labels on edges represent grammatical relations.

slide-18
SLIDE 18
  • With thematic and karaka roles

VG | .-------------. |agnt |theme |akshar | | | NG1 NG2 NG3 Labels on edges represent semantic relations.

  • With karaka level

VG | .------------. |k1 |k2 |k7p | | | NG1 NG2 NG3