A Look inside the Distributionally Similar Terms Kow Kuroda, - PowerPoint PPT Presentation

A Look inside the Distributionally Similar Terms Kow Kuroda, Jun’ichi Kazama and Kentaro Torisawa National Institute of Information and Communications Technology (NICT), Japan The 2nd International Workshop on NLP Challenges in the Information Explosion Era (NLPIX 2010) Large-scale and sharable NLP infrastructures and beyond August 28, 2010, Beijing International Convention Center Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing “Distributional” Hypothesis • Extensive use of distributional similarity derived from the “distributional” hypothesis (Harris 1959) is one of the key concepts of NLP that made it successful. • Hindle (1990), Grefenstette (1993), Lee (1997), Lin (1998) • Reason for its nearly unanimous acceptance is not so much positively motivated, however. • If the hypothesis is not accepted, then most of Web-derived data would be intractable. • Yet .. 2 Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing Three Questions We Address • Can distributional similarity really be equated with semantic similarity? • No agreement seems to be reached as to what count as semantic similarity. • And there are several kinds of semantic similarity itself. • Even if distributional similarity can be equated with semantic similarity, to what extent is it so? • Even if they can be equated to a large extent, is it valid on a large scale? • We address these questions in our study. 3 Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing Outline • Method • Preparing data • Classification task • Results • Summary 4 Tuesday, September 7, 2010

Method Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing General Framework • Step 1. Select a set of “base” terms B = { b 1 , b 1 , ..., b n } • Step 2. Use a certain similarity measure M (such as Jensen- Shannon divergence) to construct a list of n terms T = [ t i ,1 , t i ,2 , ..., t i,j , ..., t i,n ] • where t i,j denotes the j th most similar term in T against b i in B . • Step 3. Generate P ( k ), a set of t i,1, t i,2, ..., t i,k with each paired with b i . Human raters classify P ( k ) with reference to a guideline. 6 Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing Product of Steps 1 and 2 b i ’s most similar b i ’s 2 nd most similar b i ’s k th most similar base term under M term under M term under M b 1 t 1,1 t 1,2 ... t 1, k b 2 t 2,1 t 2,2 ... t 2, k ⋮ ⋮ ⋮ ⋱ ⋮ ... b n t n ,1 t n ,2 t n , k Each row represents T [ b i ] 7 Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing Parameters Considered • How much for n ? In other words, how many “bases” to evaluate? • In our case, n = 150,000 • How much for k ? In other words, how many similar terms to evaluate? • In our case, k = 2. • What similarity metric to use? • We used the Jensen-Shannon divergence for M under distributional probabilities of < n , p , v > (Kazama et al. 2009) 8 Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing Characteristics of Step 3 • We classified 300,000 pairs into the 18 finer-grained classes of semantic relation (to be explained). • But we also applied candidate filtering (to be explained). • Note • In Kazama’s clustering data, n corresponds to the count rank of dependency relation types. This should be an indicator of token frequencies of base terms. 9 Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing Sample of Data Used in Step 3 10 Tuesday, September 7, 2010

Preparing Data Tuesday, September 7, 2010

トランペットバイオリン二胡クラリネットオルガンサックス三味線チェロヴァイオリンエレクトーン NLPIX2010, Aug 28, 2010, Beijing 10 Most Similar Terms of “ ピアノ ” (piano) rank Japanese (original) English translation Score 1 Electone , electric organ –0.322 2 violin –0.357 3 violin –0.358 3 cello –0.358 5 trumpet –0.377 6 shamisen , Japanese 3-string guitar –0.383 7 saxophone –0.390 8 organ –0.392 9 clarinet –0.394 10 erh hu –0.396 12 Tuesday, September 7, 2010

シベリウスシューマンベートーヴェンシューベルトラヴェルヘンデルハイドンショスタコーヴィッチブラームスメンデルスゾーン NLPIX2010, Aug 28, 2010, Beijing 10 Most Similar Terms of “ チャイコフスキー ” (Tchaikovsky) rank Japanese (original) English translation Score 1 Brahms –0.152 2 Schumann –0.163 3 Mendelssohn –0.166 4 Shostakovich –0.178 5 Sibelius –0.180 6 Haydn –0.181 6 Händel –0.181 8 Ravel –0.182 9 Schubert –0.197 10 Beethoven –0.190 13 Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing Terms Excluded from Candidates • Strings that were judged to fail to have meaning due to segmentation error. • An independent task was performed for this. • Terms begin with Roman digits (i.e., “0”, “1”, ..., “9”) • Terms ending with 88 derivational morphemes that lead to either POS-change or obscure semantics • Terms containing more than one occurrence of “ ・ ” • “ ・ ” means either disjunction, conjunction or surrogate of “white space” in Japanese. 14 Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing 88 Derivational Morphemes for Candidate Filtering • • Hedge-deriver - さん , - サン , - ちゃん , - チャン , - さ • ま , - サマ , - 様 , - くん , - 君 , - どの , - 殿 - など , - 等 , - たち , - 達 , - ども , - ら , - 以外 , - ほか , - 他 , - くらい , - ぐらい , - まま , - ご • Temporalizer or Locationalizer と , - ついで , - づつ • - ばあい , 場合 , - ため , - 為 , - せい , - コト , - • Modalizer こと , - 事 , - トコロ , - ところ , - 所 , - 処 , - と • き , - 時 , - ころ , - ごろ , - 頃 , - 際 , - なか , - 中 , - とおり , - あたり , - ぶり , - 振り , - あま - うえ , - 上 , - 下 , - 前 , - 後 , - ちかく , - 近く , り , - 余り , - ほど , - かわり , - 代わり - ほう , - 方 • Nominalizer • • Deriver of other POS-terms - たの , - いの , - うの , - くの , - すの , - つの , • - 的だ , - 的に , - した , - った , - である , - で - ぬの , - ふの , - むの , - ゆの , - るの , - なの , は , - です , - ます - んか , - るか , - でか , - っか • Epithet-deriver Tuesday, September 7, 2010

Classification Task Its design and practice Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing Factoring out “semantic similarity” • We employed 18 finer-grained classes build on four basic “components” of semantic similarity 1. synonymic relation 2. hypernym-hyponym relation 3. meronymic relation 4. classmate relation • They are designed based on research like Fellbaum, ed. (1998), Murphy (2003) 17 Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing 18 Subtypes in the Hierarchy e : erroneous m : misuse pair pair u : pair of terms v *: notational in no conceivable f : quasi- variation of the semantic relation erroneous pair same term s :* synonymous v : allographic s : synonymous pair in the pair of pair pair of different broadest sense meaningful terms terms a : acronymic pair n : alias pair p : meronymic pair r : pair of terms in x : pair with a k : classmate a conceivable pair of forms meaningless without shared semantic relation form k *: classmate morpheme h : hypernym- without obvious hyponym pair w : classmate contrastiveness with shared morpheme y : undecidable k **: classmate in the broadest c : contrastive sense pair without antonymity c *: contrastive pairs d : antonymic pair o : pair in other, unindentified t : pair of terms relation with inherent temporal order 18 Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing 18 Subtypes in the Hierarchy e : erroneous m : misuse pair pair u : pair of terms v *: notational in no conceivable f : quasi- variation of the semantic relation erroneous pair same term s :* synonymous v : allographic s : synonymous pair in the pair of pair pair of different broadest sense meaningful terms terms a : acronymic pair n : alias pair p : meronymic pair r : pair of terms in x : pair with a k : classmate a conceivable pair of forms meaningless without shared semantic relation form k *: classmate morpheme h : hypernym- without obvious hyponym pair w : classmate contrastiveness with shared morpheme y : undecidable k **: classmate in the broadest c : contrastive sense pair without antonymity c *: contrastive pairs d : antonymic pair o : pair in other, unindentified t : pair of terms relation with inherent temporal order 19 Tuesday, September 7, 2010

NLPIX2010, Aug 28, 2010, Beijing Characteristics of the Hierarchy • s *, k **, p , h , and o are major divisions and are expected to be mutually exclusive. • s * has four subtypes: s , m , v * and n . • k ** has two subtypes: k * and c *. • k * has two subtypes: s * and w differing with presence of a common morpheme. • c * has three subtypes: c , d and t . • In the most tolerant condition, {s*, k**, p, h} corresponds to the overall class of semantically similar terms. • Note that {m, e} or {m, e, f} are only classes in which distributional and semantic similarities do not match up. 20 Tuesday, September 7, 2010

A Look inside the Distributionally Similar Terms Kow Kuroda, - PowerPoint PPT Presentation

A Look inside the Distributionally Similar Terms Kow Kuroda, Junichi Kazama and Kentaro Torisawa National Institute of Information and Communications Technology (NICT), Japan The 2nd International Workshop on NLP Challenges in the Information

Collection #1 LOOk 1/8 LOOk 2/8 LOOk 3/8 LOOk 4/8 LOOk 5/8 LOOk 6/8

On Distributionally Robust Chance Constrained Program with Wasserstein Distance Weijun Xie ISE,

Similarity is crucial to cognition General (often implicit) hypothesis: similar stimulus in

Making it in Industry: A Making it in Industry: A look from Inside look from Inside Presented

A look inside the Windows Kernel CVE-2011-1237 Evolution from XP to 8 Bruno Pujos

0.07 0.06 0.05 0.04 Unspecialized inside Specialized inside (rot, trans) Specialized inside

Finding Similar Items:Nearest Neighbor Search Barna Saha March 29, 2018 Finding Similar Items

Trigonometric functions Step one: similar triangles Two similar triangles have the same set of

Solving 0-1 Semidefinite Programs for Distributionally Robust Allocation of Surgery Blocks Yiling

Distributionally Robust Stochastic Optimization and Learning Models/Algorithms for Data-Driven

Moment-based Distributionally Robust Server Allocation and Scheduling Problems Yiling Zhang 1 ,

Distributionally Robust Approaches for Optimal Power Flow with Uncertain Reserves from Load

Distributionally Robust Optimization with Decision-Dependent Ambiguity Set Nilay Noyan Sabanc

Effective Scenarios in Multistage Distributionally Robust Optimization with Total Variation

Regularized & Distributionally Robust Data-Enabled Predictive Control Florian D orfler

Principled Learning Method for Wasserstein Distributionally Robust Optimization with Local

Litchi,Banana I ndustry in Litchi,Banana I ndustry in China China By: By: Yi Ganjun Ganjun

Primary 4 Meet-the-Parents Session 22 February 2019 Yio Chu Kang Primary School Yio Chu Kang

Primary 3 Meet-the-Parents Session 22 February 2019 Yio Chu Kang Primary School Yio Chu Kang

ACCT 420: Logistic Regression for Corporate Fraud Session 6 Dr. Richard M. Crowley 1 Front

ME? VULNEX: www.vulnex.com Blog: www.simonroses.com Twitter: @simonroses TALK OBJECTIVES

Comments on The Implications of Digital Currencies for Monetary Policy and the International

Conditional Probability We always use all available information when we assess the probability of

CSSE490 Web Services Development Introductions, REST, and Back-end tools Motivation for this

A Look inside the Distributionally Similar Terms Kow Kuroda, - PowerPoint PPT Presentation

A Look inside the Distributionally Similar Terms Kow Kuroda, Junichi Kazama and Kentaro Torisawa National Institute of Information and Communications Technology (NICT), Japan The 2nd International Workshop on NLP Challenges in the Information

Collection #1 LOOk 1/8 LOOk 2/8 LOOk 3/8 LOOk 4/8 LOOk 5/8 LOOk 6/8

On Distributionally Robust Chance Constrained Program with Wasserstein Distance Weijun Xie ISE,

Similarity is crucial to cognition General (often implicit) hypothesis: similar stimulus in

Making it in Industry: A Making it in Industry: A look from Inside look from Inside Presented

A look inside the Windows Kernel CVE-2011-1237 Evolution from XP to 8 Bruno Pujos

0.07 0.06 0.05 0.04 Unspecialized inside Specialized inside (rot, trans) Specialized inside

Finding Similar Items:Nearest Neighbor Search Barna Saha March 29, 2018 Finding Similar Items

Trigonometric functions Step one: similar triangles Two similar triangles have the same set of

Solving 0-1 Semidefinite Programs for Distributionally Robust Allocation of Surgery Blocks Yiling

Distributionally Robust Stochastic Optimization and Learning Models/Algorithms for Data-Driven

Moment-based Distributionally Robust Server Allocation and Scheduling Problems Yiling Zhang 1 ,

Distributionally Robust Approaches for Optimal Power Flow with Uncertain Reserves from Load

Distributionally Robust Optimization with Decision-Dependent Ambiguity Set Nilay Noyan Sabanc

Effective Scenarios in Multistage Distributionally Robust Optimization with Total Variation

Regularized &amp; Distributionally Robust Data-Enabled Predictive Control Florian D orfler

Principled Learning Method for Wasserstein Distributionally Robust Optimization with Local

Litchi,Banana I ndustry in Litchi,Banana I ndustry in China China By: By: Yi Ganjun Ganjun

Primary 4 Meet-the-Parents Session 22 February 2019 Yio Chu Kang Primary School Yio Chu Kang

Primary 3 Meet-the-Parents Session 22 February 2019 Yio Chu Kang Primary School Yio Chu Kang

ACCT 420: Logistic Regression for Corporate Fraud Session 6 Dr. Richard M. Crowley 1 Front

ME? VULNEX: www.vulnex.com Blog: www.simonroses.com Twitter: @simonroses TALK OBJECTIVES

Comments on The Implications of Digital Currencies for Monetary Policy and the International

Conditional Probability We always use all available information when we assess the probability of

CSSE490 Web Services Development Introductions, REST, and Back-end tools Motivation for this

Regularized & Distributionally Robust Data-Enabled Predictive Control Florian D orfler