wrapup ie qa and dialog
play

Wrapup: IE, QA, and Dialog Mausam Grading 50% 40% project 20% - PowerPoint PPT Presentation

Wrapup: IE, QA, and Dialog Mausam Grading 50% 40% project 20% final exam 15% 20% regular reviews 15% 10% midterm survey 10% presentation Extra credit: participation Plan (1 st half of the course) Classical


  1. Wrapup: IE, QA, and Dialog Mausam

  2. Grading • 50% 40% project • 20% final exam • 15% 20% regular reviews • 15% 10% midterm survey • 10% presentation • Extra credit: participation

  3. Plan (1 st half of the course) • Classical papers/problems in IE: Bootstrapping, NELL, Open IE • Important techniques for IE: CRFs, tree kernels, distant supervision, joint inference, deep learning, reinforcement learning • IE++ • coreference • paraphrases • inference Plan (2 nd half of the course) • QA: • Conversational agents:

  4. Plan (1 st half++ of the course) • Classical papers/problems in IE: Bootstrapping, NELL, Open IE • Important techniques for IE: Semi-CRFs, tree kernels, distant supervision, joint inference, topic models, deep learning (CNNs), reinforcement learning • IE++: coreference • paraphrases • Inference: random walks, neural models Plan (2 nd half of the course) • QA: open QA, semantic parsing. LSTM, attention, more attention, Recursive NN, deep feature fusion network • Conversational agents: Gen. Hierarchical nets, GANs, MemNets

  5. NLP (or any application course) • Techniques/Models • Problems • Bootstrapping • NER • (coupled) Semi-SSL • Entity/Rel/Event Extraction • PGMs: semi-CRF, MultiR, LDA • Open Rel/Event Extraction • Tree Kernels • Multi-task learning • Multi-instance learning • KB inference • Random walks over graphs • Open QA • Reinforcement learning • Machine comprehension • Task-oriented dialog w/ KB • CNN, LSTM, Bi-LSTM, Recursive NN • General dialog • Attention, MemNets, • GANs

  6. How much data? • Large supervised dataset: supervised learning • Trick to compute large supervised dataset w/o noise • Semi-CRF, Twit-NER/POS, QuizBowl, SQUaD QA, CNN QA, Movies, Ubuntu, OQA, random walks… (negative data can be artificial) • Small supervised dataset: semi-supervised learning • Bootstrapping, co-training, Graph-based SSL • No supervised dataset: unsupervised learning/rules • TwitIE • ReVerb • Trick to compute large supervised dataset with noise: distant supervision • MultiR, PCNNs

  7. Non-deep L Ideas: Semi-supervised • Bootstrapping • (in a loop) automatic generation of training data by matching known facts • Multi-view / Multi-task co-training • Constraints between tasks; Agreement between multiple classifiers for same concept • Graph-based SSL • Agreement between nodes of the graph

  8. Non-deep L Ideas: distant supervision • KB of facts: known. Extraction supervision: unknown • Bootstrap a training dataset: matching sentences with facts • Hypothesis 1: all such sentences are positive training for a fact: NOISY • Hypothesis 2: all such sentences form a bag. Each bag must have a unique relation: BETTER • Hypothesis 3: each bag can have multiple labels: EVEN BETTER • Multi-Instance Learning • Noisy OR in PGMs • maximize the max probability in the bag

  9. Non-deep L Ideas: No Intermediate Supervision • QA tasks: (Question, Answer) pairs known; inference chain: unknown • Distant Supervision: KB fact known; which sentence to extract from: unknown • OQA (which proof is better is not known) • Random walk inference (which path is better is not known) • MultiR (which sentence in corpus is not known) • Approach • create a model for scoring each path/proof using weights on properties of each constituent • train using known supervision (perceptron style updates) • Differences: OQA scores each edge separately, PRA scores path; MultiR – mil.

  10. Non-deep L Ideas: Sparsity • Tree Kernels: two features (paths) are similar if one has many constituent elements with the other. Similarity weighted by penalty to non-similar elements • Paraphrase dataset for QA • Open relations as supplements in KB inference

  11. Deep Learning Models • Convolutional NNs • Handle fixed length contexts • Recurrent NNs • Handle small variable length histories • LSTMs/GRUs • Handle larger variable length histories • Bi-LSTMs • Handle larger variable length histories and futures • Recursive NNs • Handle variable length partially ordered histories

  12. Deep Learning Models (contd) • Hierarchical Recurrent NNs • RNN over RNNs • Attention models • attach non-uniform importance to histories based on evidence (question) • Co-attention models • attach non-uniform importances to histories in two different NNs • MemNets • add an external storage with explicit read, write, updates • Generative Adversarial Nets • a better training procedure using actor-critic architecture

  13. Hierarchical Models • Semi-CRFs: joint segmentation and labeling • Sentence is a sequence of segments, which are sequence of words • Allows segment level features to be added • HRED: LSTM over LSTM • Document is a sequence of sentences, which is a sequence of words • Conversation is a sequence of utterances, which is a sequence of words

  14. RL for Text • Two uses • Use 1: search the Web to find easy documents for IE • Use 2: Policy gradient algorithm for updating weights for generator in GANs.

  15. Bootstrapping • [Akshay] Fuzzy matching between seed tuples and text • [Shantanu] Named entity tags in patterns • [Gagan, Barun] Confidence level for each pattern and fact • Semantic drift

  16. NELL • Never-ending/lifelong learning • Human supervision to guide the learning • [many] multi-view multi-task co-training • [many] coupling constraints for high precision. • [Dinesh] ontology to define the constraints

  17. Open IE • [many] ontology-free, scalablity • [Surag] data-driven research through extensive error analysis • [Dinesh] reusing datasets from one task to another • [Partha] open relations as supplementary knowledge to reduce sparsity

  18. Tree Kernels • [Shantanu] major info about the relation lies in the shortest path of the dependency parse

  19. Semi-CRFs • [many] segment level features in CRF • [Dinesh] joint segmentation and labeling ? • Order L CRFs vs Semi-CRFs

  20. MultiR • [Rishab] Use of KB to create a training set • [Surag] multi-instance learning in PGMs • [Akshay] relationship between sentence-level and aggregate extractions • [Gagan] Vitterbi approximation (replace expectation with max)

  21. PCNNs • [Haroun] Max pooling to make layers independent of sentence size • [Akshay] Piecewise max pooling to capture arg1, rel, arg2 • [Akshay] Multi-instance learning in neural nets • Positional embeddings

  22. TwitIE • [Haroun] tweets are challenging, but redundancy is good • [Dinesh] G 2 test for ranking entities for a given date • [Shantanu] event type discovery using topic models

  23. RL for IE • [many] active querying for gathering external evidence

  24. PRA for KB inference • [Haroun, Akshay] low variance sampling • [Arindam] learning non-functional relations • [Nupur] paths as features in a learning model

  25. Joint MF-TF • [Akshay, Shantanu] OOV handling • [Nupur] loss function in joint modeling

  26. Open QA • [Surag] structured perceptron in a pipeline model • [Akshay] paraphrase corpus for question rewriting • [Shantanu] mining paraphrase operators from corpus • [Arindam] decomposition of scoring over derivation steps

  27. LSTMs • [Haroun] attention > depth • [Akshay] cool way to construct the dataset • [Dinesh] two types of readers

  28. Co-attention • [many] iterative refinement of answer span selection*

  29. HRED • [Akshay] pretraining dialog model with a QA dataset • [Arindam] passing intermediate context improves coherence? • [Barun] split of local dialog generator and global state tracker

  30. MSQU • [many] partially annotated data • [many] natural language -> SQL

  31. GANs • [many] teacher forcing • [Akshay] interesting heuristics • [Arindam] discriminator feedback can be backpropagated despite being non-differentiable

  32. MemNets • [Surag] typed OOVs • [Haroun] hops • [Shantanu, Gagan] subtask-styled evaluation

  33. Open/Next Issues • IE: mature? • Event extraction • Temporal extraction • Rapid retargettability • KB Inference • Long way to go • Combining DL and path-based models

  34. Open/Next Issues • QA systems • Dataset driven research: [MC] SQUaD – tremendous progress • Answering in the wild: not clear (large answer spaces?) • Deep learning for large-scale QA • Conversational agents • [Task driven] how to get DL model to issue a variety of queries • [General] how to get the system to say something interesting? • DL: what are the systems really capturing!?

  35. Conclusions • Learn key historical developments in IE • Learn (some) state of the art in IE, inference, QA and dialog • Learn how to critique strengths and weaknesses of a paper • Learn how to brainstorm next steps and future directions • Learn how to summarize an advanced area of research • Learn to do research at the cutting edge

  36. Exam • Bring a laptop • Internet enabled • PDFLatex enabled • Bring a mobile • Taking a picture • Extension cords • It is ok even if you have not deeply understood every paper

  37. Project Presentations • Motivation & Problem definition • 1 Slide of Contribution • Background • Technical Approach • Experiments • Analysis • Conclusions • Future Work

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend