PhD course in Machine Learning Kernel Engineering Alessandro - PowerPoint PPT Presentation

PhD course in Machine Learning Kernel Engineering Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it

Kernel Engineering approaches Basic Combinations Canonical Mappings , e.g. object transformations Merging of Kernels

Kernel Combinations an example 3 K polynomial kernel of flat features p K Tree kernel Tree Kernel Combinations: 3 3 K K K , K K K = γ × + = × Tree P Tree p Tree P Tree p + × 3 3 K K K K × p Tree p Tree K , K = γ × + = Tree P Tree P + × 3 3 K K K K × Tree p Tree p

Object Transformation [Moschitti et al, CLJ 2008] K ( O , O ) ( O ) ( O ) ( ( O )) ( ( O )) = φ ⋅ φ = φ φ ⋅ φ φ 1 2 1 2 E M 1 E M 2 ( S ) ( S ) K ( S , S ) = φ ⋅ φ = E 1 E 2 E 1 2 Canonical Mapping , φ M () object transformation, e. g. a syntactic parse tree, into a verb subcategorization frame tree. Feature Extraction , φ E () maps the canonical structure in all its fragments different fragment spaces, e. g. ST, SST and PT.

Predicate Argument Classification In an event: target words describe relation among different entities the participants are often seen as predicate's arguments. Example: Paul gives a talk in Rome

Predicate Argument Classification In an event: target words describe relation among different entities the participants are often seen as predicate's arguments. Example: [ Arg0 Paul] [ predicate gives ] [ Arg1 a talk] [ ArgM in Rome]

Predicate-Argument Feature Representation Given a sentence, a predicate p : 1. Derive the sentence parse tree 2. For each node pair <N p ,N x > a. Extract a feature representation set F b. If N x exactly covers the Arg- i, F is one of its positive examples c. F is a negative example otherwise

Vector Representation for the linear kernel Phrase Type Predicate Word Head Word Parse Tree Position Right Path Voice Active

Kernel Engineering: Tree Tailoring

PAT Kernel [Moschitti, ACL 2004] Given the sentence: [ Arg0 Paul] [ predicate delivers] [ Arg1 a talk] [ ArgM in formal Style] a) b) c) F v,arg.0 S S S F v,arg.1 VP N VP VP N N Paul V NP Paul V Paul V PP NP NP PP PP F v,arg.M Arg. 0 D N IN delivers NP D D N IN N delivers NP IN delivers NP jj a talk in N jj jj a a talk in N talk in N Arg. 1 formal style formal style Arg.M formal style These are Semantic Structures

In other words we consider… S N VP Paul V NP PP D N IN NP delivers jj a in N talk Arg. 1 formal style

Sub-Categorization Kernel (SCF) [Moschitti, ACL 2004] S VP N Paul V NP PP Arg. 0 D N IN delivers NP Predicate jj a talk in N Arg. 1 formal style Arg. M

Experiments on Gold Standard Trees PropBank and PennTree bank about 53,700 sentences Sections from 2 to 21 train., 23 test., 1 and 22 dev. Arguments from Arg0 to Arg5, ArgA and ArgM for a total of 122,774 and 7,359 FrameNet and Collins’ automatic trees 24,558 sentences from the 40 frames of Senseval 3 18 roles (same names are mapped together) Only verbs 70% for training and 30% for testing

Argument Classification with Poly Kernel

PropBank Results

Argument Classification on PAT using different Tree Fragment Extractor 0.88 0.85 Accuracy --- 0.83 ST SST 0.80 Linear PT 0.78 0.75 0 10 20 30 40 50 60 70 80 90 100 % Training Data

FrameNet Results ProbBank arguments vs. Semantic Roles

Kernel Engineering: Node marking

Marking Boundary nodes

Node Marking Effect

Different tailoring and marking MMST CMST

Experiments PropBank and PennTree bank about 53,700 sentences Charniak trees from CoNLL 2005 Boundary detection: Section 2 training Section 24 testing PAF and MPAF

Number of examples/nodes of Section 2

Predicate Argument Feature (PAF) vs. Marked PAF (MPAF) [Moschitti et al, ACL-ws-2005]

More general mappings: Semantic structures for re-ranking [Moschitti et al, CoNLL 2006]

Other Shallow Semantic structures [Moschitti and Quarteroni, NAACL 2008] [ ARG1 Antigens] were [ AM − TMP originally] [ rel defined] [ ARG2 as non- self molecules]. [ ARG0 Researchers] [ rel describe] [ ARG1 antigens][ ARG2 as foreign molecules] [ ARGM − LOC in the body]

Shallow Semantic Trees for SST kernel [Moschitti et al, ACL 2007]

Merging of Kernels [ECIR 2007] : Question/Answer Classification Syntactic/Semantic Tree Kernel Kernel Combinations Experiments

Merging of Kernels [Bloehdorn & Moschitti, ECIR 2007 & CIKM 2007]

Merging of Kernels VP VP V V NP NP gives gives D D N N N N a a good talk solid talk

Delta Evaluation is very simple

Question Classification Definition : What does HTML stand for? Description : What's the final line in the Edgar Allan Poe poem "The Raven"? Entity : What foods can cause allergic reaction in people? Human : Who won the Nobel Peace Prize in 1992? Location : Where is the Statue of Liberty? Manner : How did Bob Marley die? Numeric : When was Martin Luther King Jr. born? Organization : What company makes Bentley cars?

Question Classifier based on Tree Kernels Question dataset ( http://l2r.cs.uiuc.edu/~cogcomp/Data/QA/QC/ ) [Lin and Roth, 2005] ) Distributed on 6 categories: Abbreviations, Descriptions, Entity, Human, Location, and Numeric. Fixed split 5500 training and 500 test questions Cross-validation (10-folds) Using the whole question parse trees Constituent parsing Example “ What is an offer of direct stock purchase plan ? ”

Kernels BOW, POS are obtained with a simple tree, e.g. BOX an What an is offer … * * * * * PT (parse tree) PAS (predicate argument structure)

Question classification

Similarity based on WordNet

Question Classification with S/STK

Multiple Kernel Combinations [Moschitti, CIKM 2008; Moschitti & Quarteroni, NAACL 2008; Moschitti et al., ACL 2007]

TASK: Question/Answer Classification The classifier detects if a pair (question and answer) is correct or not A representation for the pair is needed The classifier can be used to re-rank the output of a basic QA system

Dataset 2: TREC data 138 TREC 2001 test questions labeled as “description” 2,256 sentences, extracted from the best ranked paragraphs (using a basic QA system based on Lucene search engine on TREC dataset) 216 of which labeled as correct by one annotator

Dataset 2: TREC data 138 TREC 2001 test questions labeled as “description” 2,256 sentences, extracted from the best ranked A question is linked to many answers: all its derived paragraphs (using a basic QA system based on pairs cannot be shared by training and test sets Lucene search engine on TREC dataset) 216 of which labeled as correct by one annotator

Bags of words (BOW) and POS-tags (POS) To save time, apply STK to these trees: BOX … an What is of offer * * * * * BOX DT IN WHNP VBZ NN … * * * * *

Word and POS Sequences What is an offer of…? (word sequence, WSK )  What_is_offer  What_is WHNP VBZ DT NN IN…(POS sequence, POSSK )  WHNP_VBZ_NN  WHNP_NN_IN

Syntactic Parse Trees (PT)

Predicate Argument Classification In an event: target words describe relation among different entities the participants are often seen as predicate's arguments. Example: Paul gives a lecture in Rome

Predicate Argument Classification In an event: target words describe relation among different entities the participants are often seen as predicate's arguments. Example: [ Arg0 Paul] [ predicate gives ] [ Arg1 a lecture] [ ArgM in Rome]

Predicate Argument Structure for Partial Tree Kernel (PAS PTK ) [ ARG1 Antigens] were [ AM − TMP originally] [ rel defined] [ ARG2 as non- self molecules]. [ ARG0 Researchers] [ rel describe] [ ARG1 antigens][ ARG2 as foreign molecules] [ ARGM − LOC in the body]

Kernels and Combinations Exploiting the property: k(x,z) = k 1 (x,z)+k 2 (x,z) BOW, POS, WSK, POSSK, PT, PAS PTK ⇒ BOW+POS, BOW+PT, PT+POS, …

Results on TREC Data (5 folds cross validation) 40 38 36 34 F1-measure 32 30 28 26 24 22 20 Kernel Type

Results on TREC Data (5 folds cross validation) 40 38 36 BOW ≈ 24 34 F1-measure 32 POSSK+STK+PAS-PTK ≈ 39 30 28 ⇒ 62 % of improvement 26 24 22 20 Kernel Type

SVM-light-TK Software Encodes ST, SST and combination kernels in SVM-light [Joachims, 1999] Available at http://dit.unitn.it/~moschitt/ Tree forests, vector sets New extensions: the PT kernel will be released asap

PhD course in Machine Learning Kernel Engineering Alessandro - PowerPoint PPT Presentation

PhD course in Machine Learning Kernel Engineering Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Kernel Engineering approaches Basic Combinations Canonical

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Fast Optical Simulation and Reconstruction with Chroma Anthony LaTorre Stan Seibert University

Music History A brief overview The Baroque Era (1600-1750) Highly decorative and elaborate

Text visualization Lucas Rizoli CPSC 533C, November 2006 Web-pages Email Instant messages

Listening to the Webinar Online: Please make sure your computer speakers are turned on or

The Pastor, Politics, and The Preservation of Freedom 1 Cor. 10:31; 1 Tim. 2:16 John Peter

This Prophe phecy & Creation tion Revela lation tion Present THE FUEL PROJECT: Know

World Government on the Horizon Andrew Marshall Woods, ThM., JD., PhD. 1 Overview I. The

Automa'c Extrac'on of Archaeological Events from Text Kate Byrne

PhD course in Machine Learning Kernel Engineering Alessandro - PowerPoint PPT Presentation

PhD course in Machine Learning Kernel Engineering Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Kernel Engineering approaches Basic Combinations Canonical

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Fast Optical Simulation and Reconstruction with Chroma Anthony LaTorre Stan Seibert University

Music History A brief overview The Baroque Era (1600-1750) Highly decorative and elaborate

Text visualization Lucas Rizoli CPSC 533C, November 2006 Web-pages Email Instant messages

Listening to the Webinar Online: Please make sure your computer speakers are turned on or

The Pastor, Politics, and The Preservation of Freedom 1 Cor. 10:31; 1 Tim. 2:16 John Peter

This Prophe phecy &amp; Creation tion Revela lation tion Present THE FUEL PROJECT: Know

World Government on the Horizon Andrew Marshall Woods, ThM., JD., PhD. 1 Overview I. The

Automa'c Extrac'on of Archaeological Events from Text Kate Byrne

This Prophe phecy & Creation tion Revela lation tion Present THE FUEL PROJECT: Know