Convolution kernels for natural language (Collins and Duffy, 2001) - PowerPoint PPT Presentation

Mar 15, 2024 •179 likes •405 views

Convolution kernels for natural language (Collins and Duffy, 2001) LING 572 Advanced Statistical Methods for NLP February 20, 2020 1 Based on F. Xia, 18 Highlights Introduce a tree kernel Show how it is used for reranking 2

Convolution kernels for natural language   (Collins and Duffy, 2001) LING 572 Advanced Statistical Methods for NLP February 20, 2020 1 Based on F. Xia, ‘18
Highlights ● Introduce a tree kernel ● Show how it is used for reranking 2
Reranking 3
Reranking ● Training data: ● Goal: create a module that reranks candidates ● The reranker is used as a post-processor. ● In this paper, build a reranker for parsing 4
Formulating the problem 5
Reranking: Training Recall that in SVM 6
Perceptron training 7
Tree kernel 8
A tree kernel 9
Intuition ● Given two trees T1 and T2, the more subtrees T1 and T2 share, the more similar they are. ● Method: ● For each tree, enumerate all the subtrees ● Count how many are in common ● Do it in an efficient way 10
Definition of subtree ● A subtree is a subgraph which has more than one node, with the restriction that entire (not partial) rule productions must be included. ● “A subtree rooted at node n” means “a subtree whose root is n”. 11
An example 12
C(n1, n2) C(n1, n2) counts the number of common subtrees rooted at n1 and n2. C(n1, n2) = ?? NP NP DT Adj N DT Adj N asweetapple asweetapple 13
Calculating C(n1, n2) If the productions at n1 and n2 are different then C(n1, n2) = 0 else if n1 and n2 are pre-terminals then C(n1, n2) = 1 else 14
Representing a tree as a feature vector h i ( T 1 ) = ∑ I i ( n 1 ) , where N 1 is the set of nodes in T 1 n 1 ∈ N 1 15
A tree kernel 16
Properties of this kernel ● The value of K(T1, T2) depends greatly on the size of the trees T1 and T2. ● K(T, T) could be huge. The output would be dominated by the most similar tree. => The model would behave like a nearest neighbor rule 17
Down-weighting the contribution of large subtrees when calculating C(n1, n2) If the productions at n1 and n2 are different then C(n1, n2) = 0 else if n1 and n2 are pre-terminals then else 18
Experimental results 19
Experiment setting ● Data: ● Training data: 800 sentences, ● Dev set: 200 sentences ● Test set: 336 sentences ● For each sentence, 100 candidate parse trees • Learner: voted perceptron ● Evaluation measure: 10 runs and report the average parse score ● Baseline (with PCFG): 74% (labeled f-score) 20
Results With different max subtree size 21
Summary ● Show how to use a SVM or a perceptron learner for the reranking task. ● Define a tree kernel that can be calculated in polynomial time. ● Note: the number of features is infinite. ● The reranker improves parse score from 74% to 80%. 22

Recommend

1 Convolution Convolution is an important operation in signal and image processing. Convolution

CS1114 Section 6: Convolution February 27th, 2013 1 Convolution Convolution is an important operation in signal and image processing. Convolution op- erates on two signals (in 1D) or two images (in 2D): you can think of one as the input

354 views • 6 slides

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

Memorial Sloan-Kettering Cancer Center Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification Position-(In)dependent Kernels Advanced Kernels Easysvm Kernels on Graphs 9 Basics Random Walks Subtrees

1.8k views • 148 slides

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or Yacov Hel-Or Bar-Ilan University Haifa University IDC 1 Motivation Motivation Image filtering with a successive set of kernels is very

1.06k views • 58 slides

Vision and Sound Computer Vision Fall 2018 Columbia University Single-modality video

Vision and Sound Computer Vision Fall 2018 Columbia University Single-modality video representations Vision Hearing 1D Convolution 3D Convolution 1D Convolution 3D Convolution 1D Convolution 3D Convolution Slide credit: Andrew Owens

1.07k views • 79 slides

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Arthur CHARPENTIER, transformed kernels and beta kernels Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier Universit Rennes 1 arthur.charpentier@univ-rennes1.fr http

1.18k views • 96 slides

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on structures Kernels on structures Similarity between structured data Kernels allow to generalize notion of dot product (i.e. similarity) to arbitrary

396 views • 37 slides

Chapter 8: Fast Convolution Keshab K. Parhi Chapter 8 Fast Convolution Introduction

Chapter 8: Fast Convolution Keshab K. Parhi Chapter 8 Fast Convolution Introduction Cook-Toom Algorithm and Modified Cook-Toom Algorithm Winograd Algorithm and Modified Winograd Algorithm Iterated Convolution Cyclic

892 views • 50 slides

Correlation, Convolution, Filtering COMPSCI 527 Computer Vision COMPSCI 527 Computer

Correlation, Convolution, Filtering COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Correlation, Convolution, Filtering 1 / 26 Outline 1 Template Matching and Correlation 2 Image Convolution 3 Filters 4 Separable Convolution

426 views • 26 slides

E he i m COMPSCI 527 Computer Vision Correlation, Convolution, Filtering 14 / 26 Image

Image Convolution Image Boundaries: Valid Convolution If I is m n and H is k ` , then J is ( m k + 1 ) ( n ` + 1 ) E he i m COMPSCI 527 Computer Vision Correlation, Convolution, Filtering 14 / 26 Image Convolution

497 views • 13 slides

Improving PixelCNN Vertical stack oblem with this m of masked convolution. Blind spot

Improving PixelCNN Vertical stack oblem with this m of masked convolution. Blind spot Horizontal stack Solution: use two stacks of Stacking layers of masked convolution, convolution creates convolution creates a blindspot a blindspot a

489 views • 48 slides

Comparing Convolution Kernels and RNNs on a wide-coverage computational analysis of natural

Comparing Convolution Kernels and RNNs on a wide-coverage computational analysis of natural language Fabrizio Costa, Paolo Frasconi, Sauro Menchetti Dept. Systems and Computer Science Universit di Firenze Massimiliano Pontil Dept.

787 views • 52 slides

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Graph Mining and Graph Kernels GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan* ^University of Cambridge *IBM T. J. Watson Research Center August 24, 2008 | ACM SIG KDD, Las Vegas Graph Mining and

1.16k views • 48 slides

Scalable Machine Learning 6. Kernels Alex Smola Yahoo! Research and ANU

Scalable Machine Learning 6. Kernels Alex Smola Yahoo! Research and ANU http://alex.smola.org/teaching/berkeley2012 Stat 260 SP 12 6. Kernels Outline Kernels Hilbert Spaces Regularization theory Kernels on strings,

2.07k views • 171 slides

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 / 27 Outline 1 Linear Separability and Feature Augmentation 2 Sample Complexity 3 Computational Complexity 4 Kernels and Nonlinear SVMs 5 Mercers

600 views • 27 slides

Image Sharpness Metric Based on MaxPol Convolution Kernels Mahdi S. Hosseini and Konstantinos N.

Human Visual System (HVS) Response Modelling Numerical Framework by MaxPol Convolution Kernels Natural Image Frequency Falloff Modelling No-Reference (NR) Focus Quality Assessment (FQA) University of Toronto Experiment-I: Synthetic Blur Imaging

437 views • 18 slides

Lecture 2: Convolution Mark Hasegawa-Johnson ECE 401: Signal and Image Analysis, Fall 2020

Outline Averaging Weighted Convolution Differencing Weighted Edges Summary Lecture 2: Convolution Mark Hasegawa-Johnson ECE 401: Signal and Image Analysis, Fall 2020 Outline Averaging Weighted Convolution Differencing Weighted Edges

667 views • 32 slides

Binary Search Trees Dictionary Operations: get(key) put(key, value) remove(key)

Binary Search Trees Dictionary Operations: get(key) put(key, value) remove(key) Additional operations: ascend() get(index) (indexed binary search tree) remove(index) (indexed binary search tree) Complexity Of

218 views • 20 slides

/ AVL trees and rotations This week, you should be able to perform rotations on height

/ AVL trees and rotations This week, you should be able to perform rotations on height -balanced trees, on paper and in code write a rotate() method search for the kth item in-order using rank See schedule page Consider an

404 views • 18 slides

/ AVL trees and rotations This week, you should be able to perform rotations on

/ AVL trees and rotations This week, you should be able to perform rotations on height-balanced trees, on paper and in code write a rotate() method search for the kth item in-order using rank Term project partners posted

459 views • 22 slides

A Countermeasure Against Power Analysis Attacks for FSR-Based Stream Ciphers Shohreh Sharif

A Countermeasure Against Power Analysis Attacks for FSR-Based Stream Ciphers Shohreh Sharif Mansouri and Elena Dubrova Department of Electronic Systems, School of ICT, KTH - Royal Institute of Technology, Stockholm Email:{shsm,dubrova}@kth.se

457 views • 17 slides

For Friday Read Weiss, chapter 6, section 4 Homework: Weiss, chapter 4, exercises 1-2

For Friday Read Weiss, chapter 6, section 4 Homework: Weiss, chapter 4, exercises 1-2 and 8. Make sure you do all of exercise 2 and include parentheses where needed on exercise 8. Programming Assignment 1 Any questions? Binary

406 views • 14 slides

Subscribing to YANG datastore push updates

Subscribing to YANG datastore push updates dra6-ie8-netconf-yang-push-02 NETCONF WG IETF #95 Buenos Aires 7-April-2015 Alexander Clemm Alberto

317 views • 15 slides

Subtype polymorphism Key mechanism to support code reuse A is a subtype of B (written A

Subtype polymorphism Key mechanism to support code reuse A is a subtype of B (written A < : B ) if value a:A can be used whenever a value of supertype B is expected. Example: Circle , Diamond , and Triangle can be used in any

350 views • 21 slides

Subclassing and Subtyping Dr. Mattox Beckman University of Illinois at Urbana-Champaign

Introduction Types and Objects Subclassing and Subtyping Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of Computer Science Introduction Types and Objects Objectives You should be able to ... The idea of a subtype

266 views • 8 slides