Kai-Wei Chang University of Illinois at Urbana-Champaign Dream: - PowerPoint PPT Presentation

Practical Learning Algorithms for Structured Prediction Models Kai-Wei Chang University of Illinois at Urbana-Champaign

Dream: Intelligent systems that are able to read, to see, to talk, and to answer questions. 2

Translation system Personal assistant system 3

Carefully Slide 4

小心 : 地滑 : Carefully Slide Careful Landslip Take Care Wet Floor Caution Smooth 5

Q: [Chris] = [Mr. Robin] ? Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy , Chris lived in a pretty home called Cotchfield Farm . When Chris was three years old, his father wrote a poem about him . The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Slide modified from Dan Roth 6

Complex Decision Structure Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy , Chris lived in a pretty home called Cotchfield Farm . When Chris was three years old, his father wrote a poem about him . The poem was printed in a magazine for others to read. Mr. Robin then wrote a book 7

Co-reference Resolution Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh . As a boy , Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him . The poem was printed in a magazine for others to read. Mr. Robin then wrote a book 8

Scalability Issues Algorithm 2 is shown to perform a local-optimality guarantee. Robin is alive and well. He is the same better Berg-Kirkpatrick, ACL Bill Clinton , recently elected as the President of Can learning to search work even Methods for learning to search Consequently, LOLS can Robin is alive and well. He is the person that you read about in the book, 2010. It can also be expected to the USA , has been invited by the Russian when the reference is poor? for structured prediction typically improve upon the reference President] , [Vladimir Putin , to visit Russia . same person that you read about Winnie the Pooh. As a boy, Chris lived converge faster -- anyway, the E- President Clinton said that he looks forward to We provide a new learning to imitate a reference policy, with policy, unlike previous in the book, Winnie the Pooh. As in a pretty home called Cotchfield step changes the auxiliary strengthening ties between USA and Russia search algorithm, LOLS, which existing theoretical guarantees algorithms. This enables us to a boy, Chris lived in a pretty Farm. When Chris was three years old, function by changing the does well relative to the demonstrating low regret develop structured contextual home called Cotchfield Farm. his father wrote a poem about him. The expected counts, so there's no reference policy, but additionally compared to that reference. This bandits, a partial information When Chris was three years old, poem was printed in a magazine for point in finding a local maximum guarantees low regret compared is unsatisfactory in many structured prediction setting with his father wrote a poem about others to read. Mr. Robin then wrote a of the auxiliary to deviations from the learned applications where the reference many potential applications. him. The poem was printed in a book function in each iteration policy. policy is suboptimal and the goal magazine for others to read. Mr. of learning is to Robin then wrote a book  Large amount of data  Complex decision structure 9

Goal: Practical Machine Learning  [Modeling] Expressive and general formulations  [Algorithms] Principled and efficient  [Applications] Support many applications 10

My Research Contributions Limited memory linear classifier [KDD 10, 11, TKDD 12] Latent representation for knowledge bases [EMNLP 13, 14] Data Size Linear classification [ ICML08, KDD 08, Robin is alive and well. He is the same person that Methods for learning to you read about in the JMLR 08a, 10a, 10b,10c] search for structured book, Winnie the Pooh. prediction typically Can learning to search As a boy, Chris lived in a imitate a reference policy, work even when the Algorithm 2 is shown to a local-optimality pretty home called with existing theoretical reference is poor? perform better Berg- guarantee. Consequently, Cotchfield Farm. When guarantees demonstrating We provide a new Kirkpatrick, ACL 2010. It LOLS can improve upon Chris was three years old, low regret compared to learning to search can also be expected to the reference policy, his father wrote a poem that reference. This is algorithm, LOLS, which converge faster -- unlike previous about him. The poem was unsatisfactory in many does well relative to the anyway, the E-step algorithms. This enables printed in a magazine for applications where the reference policy, but changes the auxiliary us to develop structured others to read. Mr. Robin reference policy is additionally guarantees function by changing the contextual bandits, a then wrote a book Bill Clinton , recently elected as the suboptimal and the goal low regret compared to President of the USA , has been invited expected counts, so there's partial information of learning is to by the Russian President] , [Vladimir deviations from the Structured prediction models no point in finding a local structured prediction Putin , to visit Russia . President learned policy. Clinton said that he looks forward to maximum of the auxiliary setting with many strengthening ties between USA and function in each iteration potential applications. Russia [ICML 14, ECML 13a, 13b , AAAI 15, CoNLL 11, 12] Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years Problem Complexity old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book 11

My Research Contributions LIBLINEAR [ICML08, KDD 08, JMLR 08a, 10a, 10b,10c] • Implements our proposed learning algorithms • Supports binary and multiclass classification Impact: > 60,000 downloads, > 2,600 citations in AI (AAAI, IJCAI), Data Mining (KDD, ICDM), Machine Learning (ICML, NIPS) Computer Vision (ICCV, CVPR), Information Retrieval (WWW, SIGIR), NLP (ACL, EMNLP), Multimedia (ACM-MM), HCI (UIST), System (CCS) Limited memory linear classifier [KDD 10, KDD 11, TKDD 12] Latent representation for knowledge bases [EMNLP 13, EMNLP 14] Data Size Linear Classification [ICML08, KDD 08, JMLR 08a, 10a, 10b,10c] Structured prediction models [ICML 14, ECML 13a, 13b, AAAI 15, CoNLL 11,12] 12 Problem complexity

My Research Contributions (Selective) Block Minimization [KDD 10, 11, TKDD 12] Supports learning from large data and streaming data KDD best paper (2010), Yahoo! KSC award (2011) Limited memory linear classifier [KDD 10, KDD 11, TKDD 12] Latent representation for knowledge bases [EMNLP 13, EMNLP 14] Data Size Linear Classification [ICML08, KDD 08, JMLR 08a, 10a, 10b,10c] Structured prediction models [ICML 14, ECML 13a, 13b, AAAI 15, CoNLL 11,12] 13 Problem complexity

My Research Contributions Latent Representation for KBs [EMNLP 13b,14] Tensor methods for completing missing entries in KBs Applications: e.g., entity relation extraction, word relation extraction. Limited memory linear classifier [KDD 10, KDD 11, TKDD 12] Latent representation for knowledge bases [EMNLP 13, EMNLP 14] Data Size Linear Classification [ICML08, KDD 08, JMLR 08a, 10a, 10b,10c] Structured prediction models [ICML 14, ECML 13a, 13b, AAAI 15, CoNLL 11,12] 14 Problem complexity

My Research Contributions Structured Prediction Models [ECML 13a, 13b, ICML14, CoNLL 11,12, ECML 13a, AAAI15] • Design tractable, principled, domain specific models • Speedup general structured models Limited memory linear classifier [KDD 10, KDD 11, TKDD 12] Latent representation for knowledge bases [EMNLP 13, EMNLP 14] Data Size Linear Classification [ICML08, KDD 08, JMLR 08a, 10a, 10b,10c] Structured prediction models [ICML 14, ECML 13a, 13b, AAAI 15, CoNLL 11,12] 15 Problem complexity

Structured Prediction Assign values to a set of interdependent output variables Task Input Output Part-of-speech They operate Pronoun Verb Noun And Noun Tagging ships and banks. Dependency They operate Root They operate ships and banks . Parsing ships and banks. Segmentation 16

Structured Prediction Models  Learn a scoring function: 𝑇𝑑𝑝𝑠𝑓 𝑝𝑣𝑢𝑞𝑣𝑢 𝑧 | 𝑗𝑜𝑞𝑣𝑢 𝑦 , 𝑛𝑝𝑒𝑓𝑚 𝑥  Linear model: 𝑇 𝑧 | 𝑦, 𝑥 = 𝑗 𝑥 𝑗 𝜚 𝑗 𝑦, 𝑧  Features: e.g., Verb-Noun, Mary-Noun Output 𝑧: Noun Verb Det Adj Noun Input 𝑦: Mary had a little lamb Features based on both input and output 17

Inference  Find the best scoring output given the model argmax 𝑇𝑑𝑝𝑠𝑓 𝑝𝑣𝑢𝑞𝑣𝑢 𝑧 | 𝑗𝑜𝑞𝑣𝑢 𝑦 , 𝑛𝑝𝑒𝑓𝑚 𝑥 𝑧  Output space is usually exponentially large  Inference algorithms:  Specific: e.g., Viterbi (linear chain)  General: Integer linear programming (ILP)  Approximate inference algorithms: e.g., belief propagation, dual decomposition 18

Kai-Wei Chang University of Illinois at Urbana-Champaign Dream: - PowerPoint PPT Presentation

Practical Learning Algorithms for Structured Prediction Models Kai-Wei Chang University of Illinois at Urbana-Champaign Dream: Intelligent systems that are able to read, to see, to talk, and to answer questions. 2 Translation system

Kai-Wei Chang UCLA References: http://kwchang.net Kai-Wei Chang

Biases in NLP Models and What It Takes to Control them Kai-Wei Chang 1 A carton of ML (NLP)

Structured Predictions: Practical Advancements and Applications Kai-Wei Chang University of

Goliath grouper management stakeholder project Kai Lorenzen Kai Lorenzen, Jessica Sutt, Joy ,

KAI TAK what what s next? s next? KAI TAK Agenda Agenda Issues and

Online Aging Monitoring and Resilience Hao-Chun Chang, Li-An Huang, Kai-Chang Wu Department of

Robust Text Classifier on Test-Time Budgets Md Rizwan Parvez, Tolga Bolukbasi, Kai-Wei Chang,

Language Modeling Diyi Yang Some slides borrowed from Yulia Tsvetkov at CMU and Kai-Wei Chang at

Lecture 2: N-gram Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage:

Efficient Contextual Representation Learning With Continuous Outputs Kai-Wei Chang Liunian Harold

Lecture 15: Dependency Parsing Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse

Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research Build an

Lecture 7: Word Embeddings Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse

Lecture 6: Vector Space Model Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse

Lecture 3: Language Model Smoothing Kai-Wei Chang CS @ University of Virginia kw@kwchang.net

Lecture 1: Introduction Kai-Wei Chang CS @ UCLA kw@kwchang.net Couse webpage:

Algorithms in Nature Network robustness Slides adapted from Carl Kingsford Network robustness

UMBC A B M A L T F O U M B C I M Y O R T 1 (10/11/06) I E S R C E O V

Introduction to Dependability slides made with the collaboration of: Laprie, Kanoon, Romano

MTAGS 2009 Many Task Computing for Multidisciplinary Ocean Sciences: Real-Time Uncertainty

Nuclear Safety Standards Committee 41 st Meeting, 21 23 June, 2016 Joint IAEA-ICTP Essential

Category-Based Task Specific Grasping Ekaterina Nikandrova and Ville Kyrki Department of

C Townsend D Plank E Compton Slide 2 / 68 2 The electron charge was measured the first time

Columbia University 1. What Is a Neutrino Anyway? 2. The Question Of Neutrino Mass 3.