Creation HS: Computational Linguistics for Low-Resource - PowerPoint PPT Presentation

Ef#iciency ¡in ¡Resource ¡ Creation ¡ HS: ¡Computational ¡Linguistics ¡for ¡Low-‑Resource ¡Languages ¡ ¡ Mengfei ¡Zhou ¡ June ¡1, ¡2016 ¡ ¡ ¡ Ins4tute ¡for ¡Computa4onal ¡Linguis4cs ¡ ¡ University ¡Heidelberg ¡ 1 ¡ ¡ ¡

Motivation ¡ • A ¡lack ¡of ¡annotated ¡data ¡ • Collec4on ¡of ¡data ¡is ¡neither ¡easy ¡nor ¡cheap ¡ • We ¡may ¡have ¡a ¡lot ¡of ¡English ¡annotated ¡data, ¡ but ¡for ¡a ¡new ¡language, ¡how ¡can ¡we ¡effec4vely ¡ create ¡annotated ¡data? ¡ ¡ 2 ¡

Big ¡Picture ¡ How ¡can ¡we ¡create ¡annotated ¡data ¡effec4vely? ¡ ¡ ¡ -‑ Approach ¡1: ¡ac4ve ¡learning ¡using ¡human ¡annota4on ¡ ß ¡effec4ve! ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡task: ¡base ¡noun ¡phrase ¡chunking ¡ -‑ Approach ¡2: ¡human ¡rule ¡wri4ng ¡ ß ¡not ¡effec4ve! ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡task: ¡base ¡noun ¡phrase ¡chunking ¡ -‑ Approach ¡3: ¡projec4on ¡across ¡aligned ¡corpora ¡ ß ¡effec4ve! ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡task: ¡1. ¡part-‑of-‑speech ¡tagger ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡2. ¡morphological ¡analyzer ¡ ¡ ¡ 3 ¡

Big ¡Picture ¡ How ¡can ¡we ¡create ¡annotated ¡data ¡effec4vely? ¡ ¡ ¡ -‑ Approach ¡1: ¡ac4ve ¡learning ¡using ¡human ¡annota4on ¡ ß ¡effec4ve! ¡ ¡ Ngai ¡& ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡task: ¡base ¡noun ¡phrase ¡chunking ¡ Yarowsky ¡ -‑ Approach ¡2: ¡human ¡rule ¡wri4ng ¡ ß ¡not ¡effec4ve! ¡ 2000 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡task: ¡base ¡noun ¡phrase ¡chunking ¡ -‑ Approach ¡3: ¡projec4on ¡across ¡aligned ¡corpora ¡ ß ¡effec4ve! ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡task: ¡1. ¡part-‑of-‑speech ¡tagger ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡2. ¡morphological ¡analyzer ¡ ¡ ¡ 4 ¡

Big ¡Picture ¡ How ¡can ¡we ¡create ¡annotated ¡data ¡effec4vely? ¡ ¡ ¡ -‑ Approach ¡1: ¡ac4ve ¡learning ¡using ¡human ¡annota4on ¡ ß ¡effec4ve! ¡ ¡ Ngai ¡& ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡task: ¡base ¡noun ¡phrase ¡chunking ¡ Yarowsky ¡ -‑ Approach ¡2: ¡human ¡rule ¡wri4ng ¡ ß ¡not ¡effec4ve! ¡ 2000 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡task: ¡base ¡noun ¡phrase ¡chunking ¡ -‑ Approach ¡3: ¡projec4on ¡across ¡aligned ¡corpora ¡ ß ¡effec4ve! ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡task: ¡1. ¡part-‑of-‑speech ¡tagger ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡2. ¡morphological ¡analyzer ¡ ¡ Yarowsky ¡ et ¡al ¡2001 ¡ 5 ¡

Presentation ¡Outline ¡ ¡ Ngai ¡& ¡ • Base ¡noun ¡phrase ¡chunking ¡ Yarowsky ¡ • Ac4ve ¡learning: ¡the ¡basics ¡ 2000 ¡ • Apply ¡ac4ve ¡learning ¡to ¡base ¡noun ¡phrase ¡chunking ¡ • Learning ¡by ¡rules ¡for ¡base ¡noun ¡phrase ¡chunking ¡ • Comparison ¡(human ¡cost, ¡performance): ¡rule ¡wri4ng ¡vs. ¡ annota4on ¡ -‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑-‑ projec4on ¡across ¡aligned ¡corpora ¡applying ¡to ¡2 ¡tasks ¡ ¡ ¡ ¡ ¡1. ¡part-‑of-‑speech ¡tagger ¡(detailed) ¡ Yarowsky ¡ ¡ ¡ ¡ ¡2. ¡morphological ¡analyzer ¡(basic) ¡ ¡ et ¡al ¡2001 ¡ 6 ¡ ¡

Base ¡Noun ¡Phrase ¡Chunking ¡ -‑ Each ¡of ¡these ¡larger ¡boxes ¡is ¡a ¡NP ¡chunk ¡ -‑ Amount ¡of ¡work ¡has ¡been ¡done ¡in ¡this ¡domain ¡and ¡many ¡ different ¡methods ¡have ¡been ¡applied ¡ -‑ Ramshaw ¡& ¡Marcus’ ¡transforma4on ¡rules-‑based ¡system ¡ (f-‑measure ¡92.0) ¡is ¡regarded ¡as ¡the ¡de ¡facto ¡standard ¡for ¡ 7 ¡ the ¡domain ¡

Which ¡approach ¡can ¡work ¡ better? ¡ Rule-‑wri4ng ¡approach ¡ Labeling ¡data ¡and ¡ -‑> ¡directly ¡encode ¡ using ¡ac4ve ¡learning ¡ linguis4c ¡knowledge ¡ -‑> ¡label ¡sentences ¡& ¡ ¡ let ¡the ¡machine ¡sort ¡it ¡ ¡ out ¡ ra4onalist ¡approach ¡ induc4onist ¡approach ¡ 8 ¡

Active ¡Learning: ¡the ¡basics ¡ ¡ Learner-‑guided ¡selec4on ¡to ¡reduce ¡annota4on ¡effort ¡ ¡ picture ¡from ¡Rehbein ¡& ¡Ruppenhofer’s ¡slide ¡ ¡ 9 ¡

Active ¡Learning: ¡the ¡basics ¡ ¡ ¡ picture ¡from ¡Rehbein ¡& ¡Ruppenhofer ¡2016‘s ¡slide ¡ 10 ¡

Active ¡Learning: ¡the ¡basics ¡ ¡ ¡ picture ¡from ¡Rehbein ¡& ¡Ruppenhofer ¡2016’s ¡slide ¡ 11 ¡

Active ¡Learning: ¡the ¡basics ¡ ¡ Which ¡one ¡is ¡the ¡most ¡useful ¡example ¡for ¡classifier? ¡ ¡ 12 ¡ picture ¡from ¡Rehbein ¡& ¡Ruppenhofer ¡2016‘s ¡slide ¡

Active ¡Learning: ¡the ¡basics ¡ ¡ Which ¡one ¡is ¡the ¡most ¡useful ¡example ¡for ¡classifier? ¡ ¡ 13 ¡ picture ¡from ¡Rehbein ¡& ¡Ruppenhofer ¡2016‘s ¡slide ¡

Active ¡Learning: ¡the ¡basics ¡ ¡ Which ¡one ¡is ¡the ¡most ¡useful ¡example ¡for ¡classifier? ¡ ¡ the ¡more ¡ uncertain ¡the ¡ example, ¡the ¡ useful ¡it ¡would ¡ ¡ be ¡to ¡have ¡this ¡ example ¡ annotated ¡!! ¡ 14 ¡ picture ¡from ¡Rehbein ¡& ¡Ruppenhofer ¡2016‘s ¡slide ¡

Query-‑by-‑committee ¡approach ¡ • How ¡can ¡we ¡find ¡the ¡most ¡uncertain ¡examples? ¡ • Query-‑by-‑commieee ¡approach ¡uses ¡mul4ple ¡ models ¡to ¡evaluate ¡the ¡data, ¡and ¡candidates ¡for ¡ annota4on ¡are ¡drawn ¡from ¡the ¡pool ¡of ¡examples ¡ in ¡which ¡the ¡models ¡disagree. ¡ 15 ¡

Apply ¡active ¡learning ¡to ¡base ¡ noun ¡phrase ¡chunking ¡ corpus ¡C ¡ 16 ¡

Apply ¡active ¡learning ¡to ¡base ¡ noun ¡phrase ¡chunking ¡ C: ¡15-‑18 ¡of ¡ corpus ¡C ¡ the ¡Wall ¡ Street ¡Journal ¡ Treebank ¡ 17 ¡

Apply ¡active ¡learning ¡to ¡base ¡ noun ¡phrase ¡chunking ¡(Step ¡1) ¡ C: ¡15-‑18 ¡of ¡ corpus ¡C ¡ the ¡Wall ¡ Street ¡Journal ¡ arbitrarily ¡pick ¡t ¡sentences ¡ Treebank ¡ for ¡hand ¡annota4on ¡ Seed ¡set ¡ t ¡= ¡100 ¡ 18 ¡

Apply ¡active ¡learning ¡to ¡base ¡ noun ¡phrase ¡chunking ¡(Step ¡2) ¡ t ¡ training ¡ corpus ¡C ¡ set ¡T ¡ delete ¡these ¡t ¡sentences ¡ put ¡these ¡t ¡sentences ¡into ¡T ¡ from ¡C ¡ C: ¡15-‑18 ¡of ¡ Seed ¡set ¡ the ¡Wall ¡ t ¡= ¡100 ¡ Street ¡Journal ¡ Treebank ¡ 19 ¡

Apply ¡active ¡learning ¡to ¡base ¡ noun ¡phrase ¡chunking ¡(Step ¡3) ¡ t1 ¡ t2 ¡ t3 ¡ m ¡= ¡3 ¡ divide ¡T ¡into ¡m ¡subset ¡ 20 ¡

Creation HS: Computational Linguistics for Low-Resource - PowerPoint PPT Presentation

Ef#iciency in Resource Creation HS: Computational Linguistics for Low-Resource Languages Mengfei Zhou June 1, 2016 Ins4tute for Computa4onal Linguis4cs

Creation of new mark Creation of new markets ets Creation of new mark Creation of new markets

Computational Linguistics for Low-Resource Languages October 26, 2011 Alexis Palmer Wednesday,

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Outline zipfR zipfR (Computational) linguistics Evert & Baroni Evert & Baroni

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL,

Computational Linguistics for Low-Resource Languages November 2, 2011 Alexis Palmer Wednesday,

Linguistics 201 Personnel Introduction to Linguistics General Course Description Syllabus

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for

Computational Linguistics I CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

Computational Linguistics I CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

Experimental Linguistics Session B: Experiment Creation II Lecturer: Roland Mhlenbernd

High-dimensional statistics and probability Christophe Giraud Universit e Paris Saclay M2

Quantum Dynamics of Systems Under Repeated Observation Reconstruction of Structure from

Gauge Theory of Topological Phases of Matter 1 ETH Zurich, September 2018 1 J. Fr ohlich,

Mating quadratic maps with the modular group Luna Lomonaco IMPA Joint work with Shaun Bullett,

Statistical Mechanics of the Universe J urg Fr ohlich, ETH Zurich Rome, September 2017

Simulation abstraite : une analyse statique de modles Simulink Alexandre Chapoutot 1

WHAT IS LIFE? LITE IN-FLIGHT ENTERTAINMENT E-COMMERCE SYSTEM M A Y 2 0 2 0 REDEFINING

C P S C 314 TEXTURE MAPPING UGRAD.CS.UBC.CA/~cs314 Glen Berseth (Based of Mikhail Bessmeltsev

Creation HS: Computational Linguistics for Low-Resource - PowerPoint PPT Presentation

Ef#iciency in Resource Creation HS: Computational Linguistics for Low-Resource Languages Mengfei Zhou June 1, 2016 Ins4tute for Computa4onal Linguis4cs

Creation of new mark Creation of new markets ets Creation of new mark Creation of new markets

Computational Linguistics for Low-Resource Languages October 26, 2011 Alexis Palmer Wednesday,

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Outline zipfR zipfR (Computational) linguistics Evert &amp; Baroni Evert &amp; Baroni

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL,

Computational Linguistics for Low-Resource Languages November 2, 2011 Alexis Palmer Wednesday,

Linguistics 201 Personnel Introduction to Linguistics General Course Description Syllabus

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for

Computational Linguistics I CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

Computational Linguistics I CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

Experimental Linguistics Session B: Experiment Creation II Lecturer: Roland Mhlenbernd

High-dimensional statistics and probability Christophe Giraud Universit e Paris Saclay M2

Quantum Dynamics of Systems Under Repeated Observation Reconstruction of Structure from

Gauge Theory of Topological Phases of Matter 1 ETH Zurich, September 2018 1 J. Fr ohlich,

Mating quadratic maps with the modular group Luna Lomonaco IMPA Joint work with Shaun Bullett,

Statistical Mechanics of the Universe J urg Fr ohlich, ETH Zurich Rome, September 2017

Simulation abstraite : une analyse statique de modles Simulink Alexandre Chapoutot 1

WHAT IS LIFE? LITE IN-FLIGHT ENTERTAINMENT E-COMMERCE SYSTEM M A Y 2 0 2 0 REDEFINING

C P S C 314 TEXTURE MAPPING UGRAD.CS.UBC.CA/~cs314 Glen Berseth (Based of Mikhail Bessmeltsev

Outline zipfR zipfR (Computational) linguistics Evert & Baroni Evert & Baroni