IASNLP 2012
Introduction to NLP
- Prof. Rajeev Sangal
7/5/2012
Introduction to NLP Prof. Rajeev Sangal 7/5/2012 IIIT-H Advanced - - PDF document
IASNLP 2012 Introduction to NLP Prof. Rajeev Sangal 7/5/2012 IIIT-H Advanced Summer School on Natural Language Processing 23 May - 4 June 2011 IIIT Hyderabad ======================================================== INTRODUCTION
IASNLP 2012
7/5/2012
IIIT-H Advanced Summer School
Natural Language Processing 23 May - 4 June 2011 IIIT Hyderabad
======================================================== INTRODUCTION ======================================================== NLP : General
* How ideas get transferred?
* User gives keywords, machine retrieves relevant documents.
* Machine goes through text, fills in a template * A "NLP programmer" has to setup the template
* Goes beyond IR or IE * Understanding required
* Sentential analyzer in sourec language * Bilingual dictionary etc. * Sentential generator in target language
* Expectation * Focus * Topic
NLP: Block Diagram
* Done in layers. Output of each layer successively makes meaning more explicit (i.e., closer to meaning or gives a representation which machine can handle more easily)
etc.
grammatical attachment, word sense
and proper structuring of the data,
How do we analyze language? Consider the sentence:
+ [[ Children ]] (( are watching )) [[ some programmes ]] [[ on television ]] [[ in the house ]]
* Noun chunks (NP, PP) in square brackets * Verb chunks (VG) in parentheses
(( NP 1.1 children )) 2 (( VG 2.1 are 2.2 watching )) 3 (( NP 3.1 some 3.2 programmes )) 4 (( PP 4.1
4.2 (( 4.2.1 television )) )) 5 (( PP 5.1 in 5.2 (( 5.2.1 the 5.2.2 house )) ))
Part-of-speech Tagging [[ Children_NNS ]] (( are_VBP watching_VBG )) [[ some_DT programmes_NNS ]] [[ on_IN television_NN ]] [[ in_IN the_DT house_NN ]] A part-of-speech (POS) tag attached to each word. 'NNS' stands for plural common noun.
1 (( NP 1.1 children NNS )) 2 (( VG 2.1 are VBP 2.2 watching VBG )) 3 (( NP 3.1 some DT 3.2 programmes NNS )) 4 (( PP 4.1
IN 4.1.1 (( NP 4.1.2 television NN )) )) 5 (( PP 5.1 in IN 5.2 (( NP 5.2.1 the DT 5.2.2 house NN )) ))
Morphological Analysis Children < fs af=child,n,m,p,3,0,,> are < fs af=are,n,m,s,3,0,,>|< fs af=be,v,m,p,3,0,,> watching < fs af=watch,v,m,s,3,0,,/aspect='PROG'> some < fs af=some,D,m,s,3,0,,>|< fs af=some,det,m,s,3,0,,>|< fs af=some,P,m,p,3,0,,> programmes < fs af=programme,n,m,p,3,0,,>|< fs af=programme,v,m,s,3,0,, |tense='PRES'>
< fs af=on,p,m,s,3,0,,>|< fs af=on,n,m,s,3,0,,>|< fs af=on,adj,m,s,3,0,,>|< fs af=on,D,m,s,3,0,,>|< fs af=on,p,m,s,3,0,,> television < fs af=television,n,m,s,3,0,,> in < fs af=in,p,m,s,3,0,,>|< fs af=in,D,m,s,3,0,,>|< fs af=in,p,m,s,3,0,,> the < fs af=the,det,m,s,3,0,,> house < fs af=house,n,m,s,3,0,,>|< fs af=house,v,m,s,3,0,,> Shakti Standard Format
1 (( NP 1.1 children NNS < fs af='child,n,m,p,3,0,,'> )) | | | | | | | | | | | | | | | | | \ root | | |pers | | | | | cat |number case gender 2 (( VG 2.1 are VBP < fs af=are,n,m,s,3,0,,>|< fs af=be,v,m,p,3,0,,> 2.2 watching VBG < fs af=watch,v,m,s,3,0,, /aspect='PROG'> )) 3 (( NP 3.1 some DT < fs af=some,D,m,s,3,0,,>|< fs af=some,det,m,s,3,0,,>|< fs af=some,P,m,p,3,0,,> 3.2 programmes NNS < fs af=programme,n,m,p,3,0,,>|< fs af=programme,v,m,s,3,0,, /tense='PRES'> )) 4 (( PP 4.1
IN < fs af=on,p,m,s,3,0,,>|< fs af=on,n,m,s,3,0,,>|< fs af=on,adj,m,s,3,0,,>|< fs af=on,D,m,s,3,0,,>|< fs af=on,p,m,s,3,0,,> 4.1.1 (( NP 4.1.2 television NN < fs af=television,n,m,s,3,0,,> )) )) 5 (( PP 5.1 in IN < fs af=in,p,m,s,3,0,,>|< fs af=in,D,m,s,3,0,,>|< fs af=in,p,m,s,3,0,,> 5.2 (( NP 5.2.1 the DT < fs af=the,det,m,s,3,0,,> 5.2.2 house NN < fs af=house,n,m,s,3,0,,>|< fs af=house,v,m,s,3,0,,> )) ))
Layers of Processing
The child has plucked a flower in the mango garden.
n v n x n + Ex. POS tagging using dictionary lookup
After marking POS tags, mark the phrases/chunks and their types such as NG (noun group), VG (verb group): (The child) (has plucked) (a flower) (in the mango garden)
n vm n x n
NG VG NG NG +Ex. 'The' applies to 'child', making it definite.
| sentence V
V
V
+ Ex. Is 'mango' (1) adjective or (2) noun? If option (1) is taken in calling mango as adjective, task T1 becomes harder, and T2 becomes easier: x (1) (2) Mango= | adjective noun
T2 | easy hard
But sometimes, one does T2 and comes back to complete T1
======================================================== CONTEXT FREE GRAMMER ======================================================== CONTEXT-FREE GRAMMAR FOR ENGLISH
Context free grammar or (a restricted) phrase structure grammar + An example grammar given below. The first rule says that a sentence (S) consists of a noun-phrase (NP) and a verb phrase (VP):
where: + NP: noun phrase (Ex. the red block, a sharp arrow) + VP: verb phrase (Ex. lifted the red block, fired an arrow) + PP: preposition phrase (Ex. with hands, at the deer)
The boy fired the arrow. S | .----------. | | NP VP | | .---. .-------. | | | | det n v NP | | | .----. The boy fired | | det n | | The arrow
PHRASE STRUCTURE TREE
+ Ex: 'The child saw Mohan' from S | .-----------. | | NP VP | | .----. .------. | | | | det n v NP | | | | the child saw n-proper | Mohan
+ Ex: 'the child' NP | .----. | | det n | | the child
+ Ex.1: Groupings
* Put them together in VP. + Ex.2: Groupings:
* They are put together in S.
+ Ex. NP is mother node, and 'det' and 'n' are daughter (or children) nodes.
PHRASE STRUCTURE TREE to MEANING
* Phrase structure tree 1: NP_1 | .--------------. | | | n of NP_2 | | | .----------. arrow | | | n of NP_3 | | son n-proper | Dasarath
+ Consider Ex. The arrow of son of Dasarath is sharp
. Ans: Arrow. Therefore, 'arrow' is the head of NP_1 + Consider Ex. The son of Dasarath is sincere.
. Ans: Son. Therefore, 'son' is the head of NP_2 * Head of a phrase is determined by rules of the language:
the left is the head.
arrow | |of | son | |of | Dasarath
* Phrase structure tree: S | .----------. | | NP VP | | | .------. n-proper | | | v NP Ram | | saw n-proper | Ram * Modifier-modified tree: saw | .-----. |k1 |k2 | | Ram Mohan
(later)
CFG: Example PS Trees Indicating Meaning
S |
NP VP | |
| | | | | det n v NP PP | | | | | | | | ------- ---------- | | | | | | | the child plucked det n prep NP | | | | | | | ------- | | | | | a flower in det n | | the garden * Note that PP is a child of VP indicating that PP denotes a participant in the action of plucking (denoted by VP)
(child is not in the pot, and plucking did not take place in the pot).
but not directly to plucking. S |
NP VP | |
| | | | det n V NP | | | | | | | --------------- | | | | | the child plucked NP PP | |
| | | | det n prep NP | | | | | | | ----- | | | | | a flower in det n | | the pot
* Note PP is a child of NP in the above.
(or modifies) NP, and not the VP.
NP -> NP PP NP and PP Phrase Structure
Ram 's sweet 's shop
The head is the same: duukaana 1 First meaning - ( (Ram's sweet)'s shop)
2 Second meaning - ( Ram's (sweet's shop))
1 First structure - ( (Ram's sweet)'s shop) * Phrase structure tree NP | .--------------. | | PP NP | | .-------------. | | | duukaana NP prep | | .-----------. | | | kii PP NP | | .------. | | | miiThaaii NP prep | | raama kii * Dependency tree duukaana | miiThaaii | raama
2 Second structure - ( Ram's (sweet's shop)) * Phrase structure tree NP | .--------------------. | | PP NP | | .-----. .-----------. | | | | NP prep PP NP | | | | | | .-------. | raama kii | | duukaana NP prep | | miiThaaii kii * Dependency tree duukaana | .-------. | | raama miiThaaii
1 Phrase structure - ( (Ram's sweet)'s shop) NP~duukaana | .--------------. | | PP~miithaaii NP | | .-------------. | | | duukaana NP~miithaaii prep | | .-----------. | | | kii PP~raama NP | | .------. | | | miiThaaii NP prep | | raama kii
2 Phrase structure - ( Ram's (sweet's shop)) NP~duukaana | .--------------------. | | PP~raama NP~duukaana | | .-----. .-----------. | | | | NP prep PP~miThaaii NP | | | | | | .-------. | raama kii | | duukaana NP prep | | miiThaaii kii CFG: Adverb
with optional 'adv': VP -> v [NP] [NP] PP* [adv] + Ex. The child plucked a flower quickly. S | |
NP VP | |
| | | | | | | | | | det n v NP adv | | | | | | | | ------- | | | | | | | the child plucked det n quickly | | | | a flower
NP -> proper-noun | pronoun
Relational Structure
The child has plucked a flower in the mango garden.
NG1 VG NG2 NG3 (with prep)
room | .------------------------------------------------. | | | | | walls floor ceiling tables chairs | | .-----------. .------------------. | | | | | | | W1 W2 W3 W4 ceiling-slab tubelight fan
. Good for inventory, not for teaching (or function)
/--------------> chair 2 visibility | Black board -------> table | \--------------> chair 1 adjacency
VG | .------------. |subj |obj |place | | | NG1 NG2 NG3 Labels on edges represent grammatical relations.
VG | .-------------. |agnt |theme |akshar | | | NG1 NG2 NG3 Labels on edges represent semantic relations.
VG | .------------. |k1 |k2 |k7p | | | NG1 NG2 NG3