Machine Leaning: Synergy or Discord- a Case Study with MT, IR and - PowerPoint PPT Presentation

Natural Language Processing and Machine Leaning: Synergy or Discord- a Case Study with MT, IR and Sentiment FIRE 2016 Pushpak Bhattacharyya IIT Patna and IIT Bombay pb@cse.iitb.ac.in 9 th Dec, 2016

Need for NLP • Huge amount of language data in electronic form • Unstructured data (like free flowing text) will grow to 40 zetabytes (1 zettabyte= 10 21 bytes) by 2020. • How to make sense of this huge data? • Example-1: e-commerce companies need to know sentiment of online users, sifting through 1 lakh e- opinions per week: needs NLP • Example-2: Translation industry to grow to $37 billion business by 2020

Nature of Machine Learning • Automatically learning rules and concepts from data Learning the concept of table. What is “ tableness ” Rule: a flat surface with 4 legs (approx.: to be refined gradually)

Why NLP and ML? • Impossible for humans (single or a team) to makes sense of and analyse humongous text data • Many processing steps in NLP • Impossible to give correct-consistent-complete rules covering each and every situation • Example: Rule: Adjectives preceded Nouns (“blue sky”), but not in French ! (“ ciel bleu”)

NLP: layered, multidimensional Problem Semantics NLP Trinity Parsing Part of Speech Tagging Discourse and Co reference Morph Increased Marathi French Analysis Semantics Complexity Of HMM Processing Hindi English Parsing Language CRF MEMM Algorithm Chunking POS tagging Morphology

NLP= Ambiguity Processing • Lexical Ambiguity • Structural Ambiguity • Semantic Ambiguity • Pragmatic Ambiguity

Examples 1. (ellipsis) Amsterdam airport: “Baby Changing Room ” 2. (Attachment/grouping) Public demand changes (credit for the phrase: Jayant Haritsa): (a) Public demand changes, but does any body listen to them? (b) Public demand changes, and we companies have to adapt to such changes. (c) Public demand changes have pushed many companies out of business 3. (Pragmatics-1) The use of shin bone is to locate furniture in a dark room 9 Dec 2016 FIRE16:NLP-ML 7

New words and terms (people are very creative!!) 1. ROFL : rolling on the floor laughing; LOL : laugh out loud 2. facebook : to use facebook; google : to search 3. communifake : faking to talk on mobile; Obamacare : medical care system introduced through the mediation of President Obama (portmanteau words) 4. After BREXIT (UK's exit from EU), in Mumbai Mirror, and on Tweet: We got Brexit. What's next? Grexit. Departugal. Italeave. Fruckoff. Czechout. Oustria. Finish. Slovakout. Latervia. Byegium

Inter layer interaction Text- 1: “ I saw the boy with a telescope which he dropped accidentally ” Text- 2: “ I saw the boy with a telescope which I dropped accidentally nsubj(saw-2, I-1) Discourse and Co reference root(ROOT-0, saw-2) det(boy-4, the-3) nsubj(saw-2, I-1) Semantics dobj(saw-2, boy-4) root(ROOT-0, saw-2) det(telescope-7, a-6) det(boy-4, the-3) Parsing prep_with(saw-2, telescope-7) dobj(saw-2, boy-4) dobj(dropped-10, telescope-7) det(telescope-7, a-6) nsubj(dropped-10, he-9) prep_with(saw-2, telescope-7) Chunking rcmod(telescope-7, dropped-10) dobj(dropped-10, telescope-7) advmod(dropped-10, accidentally-11) nsubj(dropped-10, I-9) POS rcmod(telescope-7, dropped-10) tagging advmod(dropped-10, accidentally-11) Morphology

NLP: deal with multilinguality Language Typology

Rules: when and when not • When the phenomenon is understood AND expressed, rules are the way to go • “Do not learn when you know!!” • When the phenomenon “seems arbitrary” at the current state of knowledge, DATA is the only handle! – Why do we say “Many Thanks” and not “Several Thanks”! – Impossible to give a rule • Rely on machine learning to tease truth out of data; Expectation not always met with 

Impact of probability: Language modeling Probabilities computed in the context of corpora 1. P(“The sun rises in the east”) 2. P(“The sun rise in the east”) • Less probable because of grammatical mistake. 3.P(The svn rises in the east) • Less probable because of lexical mistake. 4.P(The sun rises in the west) • Less probable because of semantic mistake. 9 Dec 2016 FIRE16:NLP-ML 12

Power of Data

Automatic image labeling (Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan, 2014) Automatically captioned: “Two pizzas sitting on top of a stove top oven” 9 Dec 2016 FIRE16:NLP-ML 14

Automatic image labeling (cntd) 9 Dec 2016 FIRE16:NLP-ML 15

Main methodology • Object A: extract parts and features • Object B which is in correspondence with A: extract parts and features • LEARN mappings of these features and parts • Use in NEW situations: called DECODING 9 Dec 2016 FIRE16:NLP-ML 16

Feature correspondence “I am hungry now” 9 Dec 2016 FIRE16:NLP-ML 17

Linguistics-Computation Interaction • Need to understand BOTH language phenomena and the data • An annotation designer has to understand BOTH linguistics and statistics! Linguistics and Annotator Data and Language phenomena statistical phenomena

Case Study-1: Machine Translation Good Linguistics + Good ML Pushpak Bhattacharyya, Machine Translation , CRC Press, 2015 Raj Dabre, Fabien Cromiere, Sadao Kurohash and Pushpak Bhattacharyya, Leveraging Small Multilingual Corpora for SMT Using Many Pivot Languages , NAACL 2015 , Denver, Colorado, USA, May 31 - June 5, 2015.

Kinds of MT Systems (point of entry from source to the target text) (Vauquois. 1968) 9 Dec 2016 FIRE16:NLP-ML 20

Simplified Vauquois

RBMT-EBMT-SMT spectrum: knowledge (rules) intensive to data (learning) intensive SMT RBMT EBMT 9 Dec 2016 FIRE16:NLP-ML 22

Illustration of difference of RBMT, SMT, EMT • Peter has a house • Peter has a brother • This hotel has a museum 9 Dec 2016 FIRE16:NLP-ML 23

The tricky case of ‘have’ translation English Marathi पीटरकडे एक घर आहे / piitar kade • Peter has a house – ek ghar aahe • Peter has a brother पीटरला एक भाऊ आहे / piitar laa – ek bhaauu aahe • This hotel has a museum हॎया हॉटेलमधॎये एक संग्ऱहालय आहे / – hyaa hotel madhye ek saMgrahaalay aahe 9 Dec 2016 FIRE16:NLP-ML 24

RBMT If syntactic subject is animate AND syntactic object is owned by subject Then “have” should translate to “ kade … aahe ” If syntactic subject is animate AND syntactic object denotes kinship with subject Then “have” should translate to “ laa … aahe ” If syntactic subject is inanimate Then “have” should translate to “ madhye … aahe ” 9 Dec 2016 FIRE16:NLP-ML 25

EBMT X have Y  X_kade Y aahe / X_laa Y aahe / X_madhye Y aahe 9 Dec 2016 FIRE16:NLP-ML 26

SMT • has a house  kade ek ghar aahe <cm> one house has • has a car  kade ek gaadii aahe <cm> one car has • has a brother  laa ek bhaau aahe <cm> one brother has • has a sister  laa ek bahiin aahe <cm> one sister has • hotel has  hotel madhye aahe hotel <cm> has • hospital has  haspital madhye aahe hospital <cm> has 9 Dec 2016 FIRE16:NLP-ML 27

SMT: new sentence “This hospital has 100 beds” • n -grams ( n=1, 2, 3, 4, 5 ) like the following will be formed: – “This”, “hospital”,… (unigrams) – “This hospital”, “hospital has”, “has 100”,… (bigrams) – “This hospital has”, “hospital has 100”, … (trigrams) DECODING !!! 9 Dec 2016 FIRE16:NLP-ML 28

Foundation of SMT • Data driven approach • Goal is to find out the English sentence e given foreign language sentence f whose p(e | f) is maximum. • Translations are generated on the basis of statistical model • Parameters are estimated using bilingual parallel corpora 9 Dec 2016 FIRE16:NLP-ML 29

The all important word alignment • The edifice on which the structure of SMT is built (Brown et. Al., 1990, 1993; Och and Ney, 1993) • Word alignment  Phrase alignment (Koehn et al, 2003) • Word alignment  Tree Alignment (Chiang 2005, 200t; Koehn 2010) • Alignment at the heart of Factor based SMT too (Koehn and Hoang 2007) 9 Dec 2016 FIRE16:NLP-ML 30

Word alignment as the crux of Statistical Machine Translation French English (1) three rabbits (1) trois lapins a b w x (2) rabbits of Grenoble (2) lapins de Grenoble b c d x y z 9 Dec 2016 FIRE16:NLP-ML 31

Initial Probabilities: each cell denotes t(a  w), t(a  x) etc. a b c d w 1/4 1/4 1/4 1/4 x 1/4 1/4 1/4 1/4 y 1/4 1/4 1/4 1/4 z 1/4 1/4 1/4 1/4

“counts” a b c d b c d a b c d a b   x y z w x w 0 0 0 0 w 1/2 1/2 0 0 x 0 1/3 1/3 1/3 x 1/2 1/2 0 0 y 0 1/3 1/3 1/3 y 0 0 0 0 z 0 1/3 1/3 1/3 z 0 0 0 0 9 Dec 2016 FIRE16:NLP-ML 33

Revised probabilities table a b c d w 1/2 1/4 0 0 x 1/2 5/12 1/3 1/3 y 0 1/6 1/3 1/3 z 0 1/6 1/3 1/3

“revised counts” a b c d a b c d a b b c d   w x x y z w 1/2 3/8 0 0 w 0 0 0 0 x 1/2 5/8 0 0 x 0 5/9 1/3 1/3 y 0 0 0 0 y 0 2/9 1/3 1/3 z 0 0 0 0 z 0 2/9 1/3 1/3 9 Dec 2016 FIRE16:NLP-ML 35

Machine Leaning: Synergy or Discord- a Case Study with MT, IR and - PowerPoint PPT Presentation

Natural Language Processing and Machine Leaning: Synergy or Discord- a Case Study with MT, IR and Sentiment FIRE 2016 Pushpak Bhattacharyya IIT Patna and IIT Bombay pb@cse.iitb.ac.in 9 th Dec, 2016 Need for NLP Huge amount of language

Assessment Synergy Office of Institutional Effectiveness Assessment Synergy Background

Post-Deal Integration & Synergy Capture As multiples include synergy potential, cracking the

i-Synergy Introducing the worlds first smart cleaning machine! Why i-Synergy i20NBTL ?

3/18/2012 2011 ODU / CREED 2011 ODU / CREED INDUSTRIAL REVIEW INDUSTRIAL REVIEW LEANING INTO

Advanced Driver Assistance System Synergy - 360 AVM Proposal Synergy Smart Vision 360 : a

Analyst Meeting Synergy and Equity Offering July 30 th , 2019 1 Over ervi view ew and

IT SYNERGY WORK SHOP IT Synergy: OVERVIEW An increase in the value of the organization as a

Synergy of social networks defeats online privacy Eleonora Petridou Marek Kuczy nski System

Synergy: Quality of Service Synergy: Quality of Service Support for Distributed Support for

GM Synergy Short Coaching Skills Workshop 1 WHAT IS THE GM SYNERGY PROJECT? Based on the

EXPLORING SYNERGY WITH INDUSTRY EXPLORING SYNERGY WITH INDUSTRY Media X X @ Stanford University

Synergy a new approach for optimizing the resource usage in OpenStack Overview Synergy cloud

In Innovative developments of Samara State Medical Univ iversity 1 Gra ravity Stand SYNERGY

Phone System Integration with Synergy Enterprise Phone Integration This Phone Integration

Teamwork Makes the Dream Work. Harold Wilson 2019-20 KRTA President Synergy occurs

Synergy Vector Shipping Services LLC Y o u r L o gis tic P artner w ith all yo ur D em

MORE THAN WORDS: SYNTACTIC PACKAGING AND IMPLICIT SENTIMENT Greene & Resnik 2009 MOTIVATION

General Announcement::Corporate Presentation Page 1 of 1 GENERAL ANNOUNCEMENT::CORPORATE

East Asian mtDNA haplogroup determination in Koreans: Haplogroup-level coding region SNP analysis

Operation Overview Operation Overview and and Strategy Summary Strategy Summary December

Concrete & Asphalt Shingle Recycling Center The

Automated Shingling Team 1, Robot Autonomy Spring 2016 Outline The Problem Background What

Or Mistakes I Have Made Steve Jackson NB West Contracting St. Louis, MO N.B. West Contracting

(RAS) in Asphalt Pavement Workshop & Field Demonstrations June 15, 2015 City of Columbus

Machine Leaning: Synergy or Discord- a Case Study with MT, IR and - PowerPoint PPT Presentation

Natural Language Processing and Machine Leaning: Synergy or Discord- a Case Study with MT, IR and Sentiment FIRE 2016 Pushpak Bhattacharyya IIT Patna and IIT Bombay pb@cse.iitb.ac.in 9 th Dec, 2016 Need for NLP Huge amount of language

Assessment Synergy Office of Institutional Effectiveness Assessment Synergy Background

Post-Deal Integration &amp; Synergy Capture As multiples include synergy potential, cracking the

i-Synergy Introducing the worlds first smart cleaning machine! Why i-Synergy i20NBTL ?

3/18/2012 2011 ODU / CREED 2011 ODU / CREED INDUSTRIAL REVIEW INDUSTRIAL REVIEW LEANING INTO

Advanced Driver Assistance System Synergy - 360 AVM Proposal Synergy Smart Vision 360 : a

Analyst Meeting Synergy and Equity Offering July 30 th , 2019 1 Over ervi view ew and

IT SYNERGY WORK SHOP IT Synergy: OVERVIEW An increase in the value of the organization as a

Synergy of social networks defeats online privacy Eleonora Petridou Marek Kuczy nski System

Synergy: Quality of Service Synergy: Quality of Service Support for Distributed Support for

GM Synergy Short Coaching Skills Workshop 1 WHAT IS THE GM SYNERGY PROJECT? Based on the

EXPLORING SYNERGY WITH INDUSTRY EXPLORING SYNERGY WITH INDUSTRY Media X X @ Stanford University

Synergy a new approach for optimizing the resource usage in OpenStack Overview Synergy cloud

In Innovative developments of Samara State Medical Univ iversity 1 Gra ravity Stand SYNERGY

Phone System Integration with Synergy Enterprise Phone Integration This Phone Integration

Teamwork Makes the Dream Work. Harold Wilson 2019-20 KRTA President Synergy occurs

Synergy Vector Shipping Services LLC Y o u r L o gis tic P artner w ith all yo ur D em

MORE THAN WORDS: SYNTACTIC PACKAGING AND IMPLICIT SENTIMENT Greene &amp; Resnik 2009 MOTIVATION

General Announcement::Corporate Presentation Page 1 of 1 GENERAL ANNOUNCEMENT::CORPORATE

East Asian mtDNA haplogroup determination in Koreans: Haplogroup-level coding region SNP analysis

Operation Overview Operation Overview and and Strategy Summary Strategy Summary December

Concrete &amp; Asphalt Shingle Recycling Center The

Automated Shingling Team 1, Robot Autonomy Spring 2016 Outline The Problem Background What

Or Mistakes I Have Made Steve Jackson NB West Contracting St. Louis, MO N.B. West Contracting

(RAS) in Asphalt Pavement Workshop &amp; Field Demonstrations June 15, 2015 City of Columbus

Post-Deal Integration & Synergy Capture As multiples include synergy potential, cracking the

MORE THAN WORDS: SYNTACTIC PACKAGING AND IMPLICIT SENTIMENT Greene & Resnik 2009 MOTIVATION

Concrete & Asphalt Shingle Recycling Center The

(RAS) in Asphalt Pavement Workshop & Field Demonstrations June 15, 2015 City of Columbus