2/8/2013 The Slot Filling Challenge Overview of the NYU 2011 System - PDF document

2/8/2013  The Slot Filling Challenge  Overview of the NYU 2011 System  Pattern Filler Ang Sun Director of Research, Principal Scientist, inome  Distant Learning Filler asun@inome.com Query:  Hand annotation performance <query id="SF114"> <name>Jim Parsons</name>  Precision: 70% <docid>eng ‐ WL ‐ 11 ‐ 174592 ‐ 12943233</docid> <enttype>PER</enttype>  Recall: 54% <nodeid>E0300113</nodeid> <ignore>per:date_of_birth, per:age, per:city_of_birth</ignore> i d f bi h i f bi h /i  F ‐ measure: 61% </query> DOC1000001:  Top systems rarely exceed 30% F ‐ measure After graduating from high school, Jim Parsons received an undergraduate degree from the University of Houston. He was prolific during this time, appearing in 17 plays in 3 years. Response : SF114 per:schools_attended University of Houston  Entry level is pretty high  Documents have not gone through a careful selection process Jim Parsons was born and raised in Houston … Jim Parsons was born and raised in Houston … Jim Parsons was born and raised in Houston … … He attended Klein Oak High School in … … He attended Klein Oak High School in … … He attended Klein Oak High School in …  Evaluation in a real world scenario  High performance name extraction  Slot types are of different granularities  High performance coreference resolution  … …  per:employee_of  Extraction at large scale  org: top_members/employees  2011: 1.8 million documents  … …  2012: 3.7 million documents 1

2/8/2013  Hand crafted patterns 50 pattern set patterns slots local patterns for person queries title of org, org title, org’s title, title, employee_of 40 title title in GPE, GPE title origin, location_of_residence person, integer, age 30 local patterns for org queries title of org, org title, org’s title top_members/employees % GPE’s org, GPE-based org, org location_of_headquarters 20 of GPE, org in GPE org’s org subsidiaries / parent implicit organzation title [where there is a unique org employee_of [for person 10 mentioned in the current + prior queries]; sentence] top_members/employees [for org queries] 0 Recall Precision F ‐ measure 1 2 3 functional noun F of X, X’s F family relations; org parents where F is a functional noun and subsidiaries NYU 2011 full system just use hand crafted rules NYU 2011 system  Hand crafted patterns  Hand crafted patterns pattern set patterns slots local patterns for person queries title of org, org title, org’s title, title, employee_of title title in GPE, GPE title origin, location_of_residence person, integer, i t age local patterns for org queries title of org, org title, org’s title top_members/employees GPE’s org, GPE-based org, org location_of_headquarters of GPE, org in GPE org’s org subsidiaries / parent implicit organzation title [where there is a unique org employee_of [for person mentioned in the current + prior queries]; sentence] top_members/employees [for org queries] functional noun F of X, X’s F family relations; org parents where F is a functional noun and subsidiaries  http://cs.nyu.edu/grishman/jet/jet.html 2

2/8/2013  Learned patterns (through bootstrapping)  Learned patterns (through bootstrapping) “ chairman of ” “, chairman of ” Basic Idea: It starts from some seed patterns which are used to extract named entity (NE) pairs , which in turn result in more semantic patterns learned from the corpus.  Learned patterns (through bootstrapping)  Learned patterns (through bootstrapping) “, chairman of ” “ chairman of ” “ CEO of ” “ director at” “, CEO of ”, “, director at”, … … <Bill Gates, Microsoft>, <Steve Jobs, Apple > … <Bill Gates, Microsoft>, <Steve Jobs, Apple > …  Learned patterns (through bootstrapping)  Learned patterns (through bootstrapping)  Problem: semantic drift  a pair of names may be connected by patterns “ CEO of ” “ director at” “, CEO of ”, “, director at”, … … belonging to multiple relations <Jeff Bezos, Amazon>, … … 3

2/8/2013 Shortest path nsubj'_traveled_prep_to  Learned patterns (through bootstrapping)  Problem: semantic drift Dependency  Solutions: Parsing T Tree ▪ Manually review top ranked patterns ▪ Guide bootstrapping with pattern clusters <e1>President Clinton</e1> traveled to <e2> the Irish border</e2> for an evening ceremony.  Distant Learning  Distant Learning (the general algorithm)  Map 4.1M Freebase relation instances to 28 slots  Map relations in knowledge bases to KBP slots  Given a pair of names <i,j> occurring together in a sentence in  Search corpora for sentences that contain name the KBP corpus, treat it as a p , pairs ▪ positive example if it is a Freebase relation instance ▪ negative example if <i,j> is not a Freebase instance but <i,j’>  Generate positive and negative training examples is an instance for some j'  j .  Train classifiers using generated examples  Train classifiers using MaxEnt  Fill slots using trained classifiers  Fill slots using trained classifiers, in parallel with other components of NYU system  Problems  Problems  Problem 1 : Class labels are noisy  Problem 1 : Class labels are noisy ▪ Many False Positives because name pairs are often ▪ Many False Negatives because of incompleteness of connected by non ‐ relational contexts y current knowledge bases g FALSE POSITIVES 4

2/8/2013  Problems  Problems  Problem 3 : training ignores co ‐ reference info  Problem 2 : Class distribution is extremely unbalanced ▪ Training relies on full name match between Freebase and ▪ Treat as negative if <i,j> is NOT a Freebase relation instance text ▪ Positive VS negative: 1:37 ▪ But partial names ( Bill, Mr. Gates …) occur often in text ▪ Treat as negative if <i,j> is NOT a Freebase instance but <i,j’> is an instance for ▪ Use co ‐ reference during training? some j'  j AND <i,j> is separated by no more than 12 tokens ▪ Co ‐ reference module itself might be inaccurate and ▪ Positive VS negative: 1:13 adds noise to training ▪ Trained classifiers will have low recall, biased towards ▪ But can it help during testing? negative  The refinement algorithm  Solutions to Problems Represent a training instance by its dependency pattern, the I. shortest path connecting the two names in the dependency tree  Problem 1 : Class labels are noisy representation of the sentence ▪ Refine class labels to reduce noise II. II Estimate precision of the pattern Estimate precision of the pattern count ( p , c i ) prec ( p , c i )   Problem 2 : Class distribution is extremely unbalanced  count ( p , c j ) j ▪ Undersample the majority classes Precision of a pattern p for the class C i is defined as the number of occurrences of p in the class C i divided by the number of occurrences of p in any of the classes C j  Problem 3 : training ignores co ‐ reference info ▪ Incorporate coreference during testing III. Assign the instance the class that its dependency pattern is most precise about  Effort 1 :  The refinement algorithm (cont) multiple n ‐ way instead of single n ‐ way classification  Examples  single n ‐ way: an n ‐ way classifier for all classes ▪ Biased towards majority classes  multiple n ‐ way : an n ‐ way classifier for each pair of name types Example Sentence Class ▪ A classifier for PERSON and PERSON ▪ Another one for PERSON and ORGANIZATION PERSON: PERSON: PERSON: Jon Corzine , the former chairman and CEO of Goldman Sachs appos chairman prep_of Employee_of Employee_of Employee_of ▪ … … ORG: William S. Paley , chairman of appos chairman prep_of CBS … … Founded_by  On average (10 runs on 2011 evaluation data) ▪ single n ‐ way: 180 fills for 8 slots prec (appos chairman prep_of, PERSON:Employee_of ) = 0.754 ▪ multiple n ‐ way: prec (appos chairman prep_of, ORG:Founded_by ) = 0.012 240 fills for 15 slots 5

2/8/2013 The Slot Filling Challenge Overview of the NYU 2011 System - PDF document

2/8/2013 The Slot Filling Challenge Overview of the NYU 2011 System Pattern Filler Ang Sun Director of Research, Principal Scientist, inome Distant Learning Filler asun@inome.com Query: Hand annotation performance <query

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

Revised: March 4, 2013 3/19/2013 3/19/2013 2 3/19/2013 3 3/19/2013 4 3/19/2013 5

Kill-switch Presentation Motivation 2013/5/16 Thursday 2013/5/16 Thursday 2013/5/16 Thursday

rd November Sunday 3 rd Sunday 3 November 2013 2013 1 03/11/2013 2 03/11/2013 3

09/05/2013 Diana LaRue, Midland Probate & Family Court Lori Pritchard, Midland Public Schools

FIRST QUARTER RESULT- YEAR 2013 May 14 th , 2013 FIRST QUARTER 2013 Q1 2013: MAIN FEATURES

January 17, 2013 Tuesday, March 26, 2013 Call to Order Tuesday, March 26, 2013 Fall/Winter

Report from the 2013 Committee of Visitors 28 January 2013 2013 COV Status Report 1 28 January

1H 2013 result presentation Conference call and Q&A 7th August 2013 Event: 1H 2013 result

KION Group Update Call Q1 2013 Gordon Riske, CEO Thomas Toepfer, CFO Wiesbaden, May 28 th 2013

CONSUMER PRICE INDEX OCTOBER 2013 Release on 25 November 2013 Inflation rate for October 2013

FINAL BUDGET Fiscal Year 2013-2014 September 3, 2013 Backup September 3, 2013 Page 2 of 31 FY

Presentation 2Q 2013 25 July 2013 1Q 2013 April 2013 Disclaimer This document has been prepared

2013 - 2016 Strategic Plan 2013 2016 Strategic Plan Team November 27, 2013 2013-2016 Strategic

NOVEMBER 2013 Release on 30 December 2013 Inflation rate for November 2013 Monthly percentage

Third Quarter 2013 12th November 2013 Presentation Third quarter 2013 highlights Third quarter

Tutorial 2 Automatic Placement & Routing Please follow the instructions found under Setup on

A K N Correction: We will redo the empty type the rules were wrong; the video was re-uploaded

Research Misconduct in Dissertations and Scientifjc Publications in Russia Zadar, 20 Sept 2019

Bots using MadCap Flare PRESENTED BY Luciana Alvear Voigt LUCIANA ALVEAR VOIGT BS Business

s s s

Table of contents 1. Introduction: You are already an experimentalist 2. Conditions 3. Items

CPA / Endwall activities Overview RU Endwall up on Feb 23 Ground plane fjller on CPA

ePlace: Electrosta-cs Based Placement Using Nesterovs Method

2/8/2013 The Slot Filling Challenge Overview of the NYU 2011 System - PDF document

2/8/2013 The Slot Filling Challenge Overview of the NYU 2011 System Pattern Filler Ang Sun Director of Research, Principal Scientist, inome Distant Learning Filler asun@inome.com Query: Hand annotation performance <query

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

Revised: March 4, 2013 3/19/2013 3/19/2013 2 3/19/2013 3 3/19/2013 4 3/19/2013 5

Kill-switch Presentation Motivation 2013/5/16 Thursday 2013/5/16 Thursday 2013/5/16 Thursday

rd November Sunday 3 rd Sunday 3 November 2013 2013 1 03/11/2013 2 03/11/2013 3

09/05/2013 Diana LaRue, Midland Probate &amp; Family Court Lori Pritchard, Midland Public Schools

FIRST QUARTER RESULT- YEAR 2013 May 14 th , 2013 FIRST QUARTER 2013 Q1 2013: MAIN FEATURES

January 17, 2013 Tuesday, March 26, 2013 Call to Order Tuesday, March 26, 2013 Fall/Winter

Report from the 2013 Committee of Visitors 28 January 2013 2013 COV Status Report 1 28 January

1H 2013 result presentation Conference call and Q&amp;A 7th August 2013 Event: 1H 2013 result

KION Group Update Call Q1 2013 Gordon Riske, CEO Thomas Toepfer, CFO Wiesbaden, May 28 th 2013

CONSUMER PRICE INDEX OCTOBER 2013 Release on 25 November 2013 Inflation rate for October 2013

FINAL BUDGET Fiscal Year 2013-2014 September 3, 2013 Backup September 3, 2013 Page 2 of 31 FY

Presentation 2Q 2013 25 July 2013 1Q 2013 April 2013 Disclaimer This document has been prepared

2013 - 2016 Strategic Plan 2013 2016 Strategic Plan Team November 27, 2013 2013-2016 Strategic

NOVEMBER 2013 Release on 30 December 2013 Inflation rate for November 2013 Monthly percentage

Third Quarter 2013 12th November 2013 Presentation Third quarter 2013 highlights Third quarter

Tutorial 2 Automatic Placement &amp; Routing Please follow the instructions found under Setup on

A K N Correction: We will redo the empty type the rules were wrong; the video was re-uploaded

Research Misconduct in Dissertations and Scientifjc Publications in Russia Zadar, 20 Sept 2019

Bots using MadCap Flare PRESENTED BY Luciana Alvear Voigt LUCIANA ALVEAR VOIGT BS Business

s s s

Table of contents 1. Introduction: You are already an experimentalist 2. Conditions 3. Items

CPA / Endwall activities Overview RU Endwall up on Feb 23 Ground plane fjller on CPA

ePlace: Electrosta-cs Based Placement Using Nesterovs Method

09/05/2013 Diana LaRue, Midland Probate & Family Court Lori Pritchard, Midland Public Schools

1H 2013 result presentation Conference call and Q&A 7th August 2013 Event: 1H 2013 result

Tutorial 2 Automatic Placement & Routing Please follow the instructions found under Setup on