Weak Semi-Markov CRFs for NP Chunking in Informal Text Aldrian - PowerPoint PPT Presentation

Contributions Dataset Models Experiments Conclusion Weak Semi-Markov CRFs for NP Chunking in Informal Text Aldrian Obaja Muis and Wei Lu Singapore University of Technology and Design

Contributions Dataset Models Experiments Conclusion Paper Contributions In this paper, we contributed: 1 Noun phrase-annotated SMS corpus 1 1 Tao Chen and Min-Yen Kan (2013). “Creating a live, public short message service corpus: the NUS SMS corpus”. In: Language Resources and Evaluation . Vol. 47. Springer Netherlands, pp. 299–335. 2 / 13

Contributions Dataset Models Experiments Conclusion Paper Contributions In this paper, we contributed: 1 Noun phrase-annotated SMS corpus 1 2 Weak semi-Markov CRF 1 Tao Chen and Min-Yen Kan (2013). “Creating a live, public short message service corpus: the NUS SMS corpus”. In: Language Resources and Evaluation . Vol. 47. Springer Netherlands, pp. 299–335. 2 / 13

Contributions Dataset Models Experiments Conclusion NP-annotated SMS Corpus 3 / 13

Contributions Dataset Models Experiments Conclusion NP-annotated SMS Corpus We used Brat Rapid Annotation Tool (BRAT) 2 for annotations, recruiting undergraduate students to annotate the noun phrases. 2 http://brat.nlplab.org/ 4 / 13

Contributions Dataset Models Experiments Conclusion NP-annotated SMS Corpus We used Brat Rapid Annotation Tool (BRAT) 2 for annotations, recruiting undergraduate students to annotate the noun phrases. Examples: 2 http://brat.nlplab.org/ 4 / 13

Contributions Dataset Models Experiments Conclusion Annotations Statistics 64 annotators 5 / 13

Contributions Dataset Models Experiments Conclusion Annotations Statistics 64 annotators 26,500 SMS messages 5 / 13

Contributions Dataset Models Experiments Conclusion Annotations Statistics 64 annotators 26,500 SMS messages 76,490 noun phrases 5 / 13

Contributions Dataset Models Experiments Conclusion Annotations Statistics 64 annotators 26,500 SMS messages 76,490 noun phrases 359,009 tokens 5 / 13

Contributions Dataset Models Experiments Conclusion Models 6 / 13

Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O ( n |Y| 2 ) 7 / 13

Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B N N N I I I O O O O O O said said Dr Teh Dr Teh Fig. 2: Semi-CRF: O ( nL |Y| 2 ) Fig. 1: Linear CRF: O ( n |Y| 2 ) 7 / 13

Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B N N N I I I O O O O O O said said Dr Teh Dr Teh Fig. 2: Semi-CRF: O ( nL |Y| 2 ) Fig. 1: Linear CRF: O ( n |Y| 2 ) N N N N N N O O O O O O said Dr Teh Fig. 3: Weak Semi-CRF: O ( n |Y| 2 + nL |Y| ) 7 / 13

Contributions Dataset Models Experiments Conclusion Empirical Verification 8 / 13

Contributions Dataset Models Experiments Conclusion F1-Score Linear CRF Semi-CRF Weak Semi-CRF 80 74 . 69 74 . 60 74 . 58 74 . 37 74 . 39 74 . 31 72 . 68 72 . 49 71 . 19 70 F1-Score (%) 60 50 Basic features +affixes All features 9 / 13

Contributions Dataset Models Experiments Conclusion Training Speed 2 Linear-CRF Avg. time per iteration (s) Semi-CRF Weak Semi-CRF 1 . 5 1 0 . 5 5 , 000 10 , 000 15 , 000 20 , 000 # training instances (SMS) 10 / 13

Contributions Dataset Models Experiments Conclusion Conclusion 11 / 13

Contributions Dataset Models Experiments Conclusion Conclusion We have created a new NP-annotated dataset on informal text 12 / 13

Contributions Dataset Models Experiments Conclusion Conclusion We have created a new NP-annotated dataset on informal text We can split the decisions of selecting segment length and segment type to improve the training time, while maintaining similar accuracy 12 / 13

Contributions Dataset Models Experiments Conclusion Thank You Code and data available at: http://statnlp.org/research/ie/ Aldrian Obaja Muis and Wei Lu Singapore University of Technology and Design 13 / 13

Weak Semi-Markov CRFs for NP Chunking in Informal Text Aldrian - PowerPoint PPT Presentation

Contributions Dataset Models Experiments Conclusion Weak Semi-Markov CRFs for NP Chunking in Informal Text Aldrian Obaja Muis and Wei Lu Singapore University of Technology and Design Contributions Dataset Models Experiments Conclusion

Food Solutions New England Tom Kelly PhD Executive Director UNH Sustainability Institute

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

CRFS LIVESTOCK WORKGROUP GOAL The CRFS Livestock Work Group will conduct and coordinate research,

Hierarchy of Ideas Page 43 Transform the World Hierarchy of Ideas Chunking Up Chunking Down

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is annotation for ? 2. Linear

Higher-order CRFs Nikos Komodakis (University of Crete) Introduction Conditional Random Fields

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Alignment-Guided Chunking Yanjun Ma , Nicolas Stroppa, Andy Way { yma,nstroppa,away }

Bimodal Algorithms Uni-modal distribution Input data block boundaries unimodal chunking 64 KB

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Identification of weak lumpability in Markov chains General criteria for weak lumpability found,

By: John McLaughlin June 2014 - Post Elect McLaughlin & Associates www.mclaughlinonline.com

S TRUCTURED B ENEFIT -R ISK A SSESSMENT : A REVIEW OF KEY PUBLICATIONS AND INITIATIVES Alexander

Excess Soil Management Regulatory Proposal Summary Note : Deck paraphrases the proposal documents

Em erging m ethodological standards: overview of current international benefit-risk initiatives

Assessing the Environment for Support of Youth Physical Activity In Rural Communities SOCIETY

Restoration Project Sponsored by: Yakama Nation Wildlife Resource Management Program Presented

Joint Licensure Web Summit March 27, 2018 3:00 PM EST 1 Lets Make the Most of Our Time

JOINT INVESTORS MEET March 28 th , 2012 | Pithampur Investor Presentation In this Presentation

Sambuz

Useful Links

Newsletter

Mail Us

Weak Semi-Markov CRFs for NP Chunking in Informal Text Aldrian - PowerPoint PPT Presentation

Contributions Dataset Models Experiments Conclusion Weak Semi-Markov CRFs for NP Chunking in Informal Text Aldrian Obaja Muis and Wei Lu Singapore University of Technology and Design Contributions Dataset Models Experiments Conclusion

Food Solutions New England Tom Kelly PhD Executive Director UNH Sustainability Institute

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

CRFS LIVESTOCK WORKGROUP GOAL The CRFS Livestock Work Group will conduct and coordinate research,

Hierarchy of Ideas Page 43 Transform the World Hierarchy of Ideas Chunking Up Chunking Down

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is annotation for ? 2. Linear

Higher-order CRFs Nikos Komodakis (University of Crete) Introduction Conditional Random Fields

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Alignment-Guided Chunking Yanjun Ma , Nicolas Stroppa, Andy Way { yma,nstroppa,away }

Bimodal Algorithms Uni-modal distribution Input data block boundaries unimodal chunking 64 KB

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Identification of weak lumpability in Markov chains General criteria for weak lumpability found,

By: John McLaughlin June 2014 - Post Elect McLaughlin &amp; Associates www.mclaughlinonline.com

S TRUCTURED B ENEFIT -R ISK A SSESSMENT : A REVIEW OF KEY PUBLICATIONS AND INITIATIVES Alexander

Excess Soil Management Regulatory Proposal Summary Note : Deck paraphrases the proposal documents

Em erging m ethodological standards: overview of current international benefit-risk initiatives

Assessing the Environment for Support of Youth Physical Activity In Rural Communities SOCIETY

Restoration Project Sponsored by: Yakama Nation Wildlife Resource Management Program Presented

Joint Licensure Web Summit March 27, 2018 3:00 PM EST 1 Lets Make the Most of Our Time

JOINT INVESTORS MEET March 28 th , 2012 | Pithampur Investor Presentation In this Presentation

Sambuz

Useful Links

Newsletter

Mail Us

By: John McLaughlin June 2014 - Post Elect McLaughlin & Associates www.mclaughlinonline.com