introduction to nlp
play

Introduction to NLP Prof. Rajeev Sangal 7/5/2012 IIIT-H Advanced - PDF document

IASNLP 2012 Introduction to NLP Prof. Rajeev Sangal 7/5/2012 IIIT-H Advanced Summer School on Natural Language Processing 23 May - 4 June 2011 IIIT Hyderabad ======================================================== INTRODUCTION


  1. IASNLP 2012 Introduction to NLP Prof. Rajeev Sangal 7/5/2012

  2. IIIT-H Advanced Summer School on Natural Language Processing 23 May - 4 June 2011 IIIT Hyderabad

  3. ======================================================== INTRODUCTION ======================================================== NLP : General o Language: A unique ability of humans o Language for communication - Major function of language: communication. * How ideas get transferred? o Applications - Information retrieval * User gives keywords, machine retrieves relevant documents. - Uses structure of web for ranking - Information extraction * Machine goes through text, fills in a template * A "NLP programmer" has to setup the template - Question answering * Goes beyond IR or IE * Understanding required - Machine translation * Sentential analyzer in sourec language * Bilingual dictionary etc. * Sentential generator in target language - Dialogue systems * Expectation * Focus * Topic

  4. NLP: Block Diagram o Form & meaning - Sentences -> Meaning * Done in layers. Output of each layer successively makes meaning more explicit (i.e., closer to meaning or gives a representation which machine can handle more easily) - How can meaning be represented ? o Techniques for "extracting meaning" - Word analysis - Phrasal analysis - Sentential analysis etc. - Statistical analysis: tagging, grammatical attachment, word sense o Modules - Morphological analyzer - Chunker / Part-of-speech tagger - Sentence parser - Semantic processor - Pragmatics processor o Difference between: - Processing algorithms (NLP and ML) - Rules/data for processing (Computational Linguistics) - Understanding language (Linguistics) o Without understanding the nature and structure of language, and proper structuring of the data, - Machine learning is not very effective.

  5. How do we analyze language? Consider the sentence: - Children are watching some programmes on television in the house -------------------------------------------- Analysis into Chunks o What are the "chunks"? + [[ Children ]] (( are watching )) [[ some programmes ]] [[ on television ]] [[ in the house ]] o Chunks * Noun chunks (NP, PP) in square brackets * Verb chunks (VG) in parentheses o Chunks represent objects - Noun chunks represent objects/concepts - Verb chunks represent actions -------------------------------------------- 1 (( NP 1.1 children )) 2 (( VG 2.1 are 2.2 watching )) 3 (( NP 3.1 some 3.2 programmes )) 4 (( PP 4.1 on 4.2 (( 4.2.1 television )) )) 5 (( PP 5.1 in 5.2 (( 5.2.1 the 5.2.2 house )) ))

  6. Part-of-speech Tagging [[ Children_NNS ]] (( are_VBP watching_VBG )) [[ some_DT programmes_NNS ]] [[ on_IN television_NN ]] [[ in_IN the_DT house_NN ]] A part-of-speech (POS) tag attached to each word. 'NNS' stands for plural common noun. -------------------------------------------- Towards Shakti Standard Format 1 (( NP 1.1 children NNS )) 2 (( VG 2.1 are VBP 2.2 watching VBG )) 3 (( NP 3.1 some DT 3.2 programmes NNS )) 4 (( PP 4.1 on IN 4.1.1 (( NP 4.1.2 television NN )) )) 5 (( PP 5.1 in IN 5.2 (( NP 5.2.1 the DT 5.2.2 house NN )) ))

  7. Morphological Analysis Children < fs af=child,n,m,p,3,0,,> are < fs af=are,n,m,s,3,0,,>|< fs af=be,v,m,p,3,0,,> watching < fs af=watch,v,m,s,3,0,,/aspect='PROG'> some < fs af=some,D,m,s,3,0,,>|< fs af=some,det,m,s,3,0,,>|< fs af=some,P,m,p,3,0,,> programmes < fs af=programme,n,m,p,3,0,,>|< fs af=programme,v,m,s,3,0,, |tense='PRES'> on < fs af=on,p,m,s,3,0,,>|< fs af=on,n,m,s,3,0,,>|< fs af=on,adj,m,s,3,0,,>|< fs af=on,D,m,s,3,0,,>|< fs af=on,p,m,s,3,0,,> television < fs af=television,n,m,s,3,0,,> in < fs af=in,p,m,s,3,0,,>|< fs af=in,D,m,s,3,0,,>|< fs af=in,p,m,s,3,0,,> the < fs af=the,det,m,s,3,0,,> house < fs af=house,n,m,s,3,0,,>|< fs af=house,v,m,s,3,0,,> Shakti Standard Format 1 (( NP 1.1 children NNS < fs af='child,n,m,p,3,0,,'> )) | | | | | | | | | | | | | | | | | \ root | | |pers | | | | | cat |number case gender 2 (( VG 2.1 are VBP < fs af=are,n,m,s,3,0,,>|< fs af=be,v,m,p,3,0,,> 2.2 watching VBG < fs af=watch,v,m,s,3,0,, /aspect='PROG'> )) 3 (( NP 3.1 some DT < fs af=some,D,m,s,3,0,,>|< fs af=some,det,m,s,3,0,,>|< fs af=some,P,m,p,3,0,,> 3.2 programmes NNS < fs af=programme,n,m,p,3,0,,>|< fs af=programme,v,m,s,3,0,, /tense='PRES'> )) 4 (( PP 4.1 on IN < fs af=on,p,m,s,3,0,,>|< fs af=on,n,m,s,3,0,,>|< fs af=on,adj,m,s,3,0,,>|< fs af=on,D,m,s,3,0,,>|< fs af=on,p,m,s,3,0,,> 4.1.1 (( NP 4.1.2 television NN < fs af=television,n,m,s,3,0,,> )) )) 5 (( PP 5.1 in IN < fs af=in,p,m,s,3,0,,>|< fs af=in,D,m,s,3,0,,>|< fs af=in,p,m,s,3,0,,> 5.2 (( NP 5.2.1 the DT < fs af=the,det,m,s,3,0,,> 5.2.2 house NN < fs af=house,n,m,s,3,0,,>|< fs af=house,v,m,s,3,0,,> )) ))

  8. Layers of Processing 1. T1 - Mark n/v tags The child has plucked a flower in the mango garden. ---- ------ ----- ------ n v n x n + Ex. POS tagging using dictionary lookup 2. T2 - Chunking After marking POS tags, mark the phrases/chunks and their types such as NG (noun group), VG (verb group): (The child) (has plucked) (a flower) (in the mango garden) ----- ------- ------ ------ n vm n x n ----------- ------------ -------- ------------------- NG VG NG NG +Ex. 'The' applies to 'child', making it definite. Ex. 'a' applies to 'flower', making it indefinite. 3. Putting the above two (T1 and T2) together: | sentence V ---- |T1| ---- |sentence with POS tags V ---- |T2| ---- |sentence with POS tags and chunks V o Normally, do T1 then do T2 + Ex. Is 'mango' (1) adjective or (2) noun? If option (1) is taken in calling mango as adjective, task T1 becomes harder, and T2 becomes easier: x (1) (2) Mango= | adjective noun ------------------------ T1 | hard easy T2 | easy hard ------------------------ In real life chosen: (2) But sometimes, one does T2 and comes back to complete T1

  9. ======================================================== CONTEXT FREE GRAMMER ======================================================== CONTEXT-FREE GRAMMAR FOR ENGLISH o WRITING A TOY ENGLISH GRAMMAR: Context free grammar or (a restricted) phrase structure grammar + An example grammar given below. The first rule says that a sentence (S) consists of a noun-phrase (NP) and a verb phrase (VP): - S -> NP VP - NP -> det adj* n - NP -> n-proper - VP -> v [NP] [NP] PP* - PP -> prep NP where: + NP: noun phrase (Ex. the red block, a sharp arrow) + VP: verb phrase (Ex. lifted the red block, fired an arrow) + PP: preposition phrase (Ex. with hands, at the deer) - n: noun (Ex. boy, child, arrow, block) - n-proper: proper noun (Ex. Ram, Mohan) - v: verb (Ex. lift, fire, give) - det: determiner (Ex. a, the) - adj: adjective (Ex. big, red, sharp) o EXAMPLE PHRASE STRUCTURE TREE: The boy fired the arrow. S | .----------. | | NP VP | | .---. .-------. | | | | det n v NP | | | .----. The boy fired | | det n | | The arrow

  10. PHRASE STRUCTURE TREE o Leaf nodes of tree, read in left to right order give us sentence + Ex: 'The child saw Mohan' from S | .-----------. | | NP VP | | .----. .------. | | | | det n v NP | | | | the child saw n-proper | Mohan o Groups - related elements togehter + Ex: 'the child' NP | .----. | | det n | | the child o Hierarchy of grouped elements + Ex.1: Groupings - v (for verb 'saw') - NP (for 'Mohan') * Put them together in VP. + Ex.2: Groupings: - NP (for 'the child') - VP (for 'saw Mohan') * They are put together in S. o Terminology: Mother and daughter nodes + Ex. NP is mother node, and 'det' and 'n' are daughter (or children) nodes.

  11. PHRASE STRUCTURE TREE to MEANING o Relating a phrase structure tree to a modifier-modified tree o Example sentence 1: 'arrow of son of Dasarath' * Phrase structure tree 1: NP_1 | .--------------. | | | n of NP_2 | | | .----------. arrow | | | n of NP_3 | | son n-proper | Dasarath o Notion of the head of a phrase - What is the head of NP_1 ? + Consider Ex. The arrow of son of Dasarath is sharp - Who/what is sharp: arrow, son, Dasarath? . Ans: Arrow. Therefore, 'arrow' is the head of NP_1 + Consider Ex. The son of Dasarath is sincere. - Who is sincere: Son, Dasarath? . Ans: Son. Therefore, 'son' is the head of NP_2 * Head of a phrase is determined by rules of the language: - In case of NPs with 'of' in English, the noun on the left is the head. o Modifier-modified tree 1 (for the example on the top): arrow | |of | son | |of | Dasarath

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend