Chinese Event Extraction School of Data Science, Fudan University - PowerPoint PPT Presentation

Chinese Event Extraction 复旦大学大数据学院 School of Data Science, Fudan University 杨依莹 2017.11.22

大纲 ACE program 1 1 Assignment 3: Chinese event extraction 2 CRF++: Yet Another CRF toolkit 3

ACE program 复旦大学大数据学院 School of Data Science, Fudan University Automatic Content Extraction (ACE) program ： • The objective of the Automatic Content Extraction (ACE) Program was to develop extraction technology to support automatic processing of source language data (in the form of natural text and as text derived from ASR and OCR). • The program relates to English, Arabic and Chinese texts. • The ACE corpus is one of the standard benchmarks for testing new information extraction algorithms.

ACE program 复旦大学大数据学院 School of Data Science, Fudan University Automatic Content Extraction (ACE) program: Given a text in natural language , the ACE challenge is to detect: 1. entities mentioned in the text, such as: persons, organizations, locations, facilities, weapons. 2. relations between entities, such as: person A is the manager of company B. Relation types include: role, part, located, near, and social. 3. events mentioned in the text, such as: interaction, movement, transfer, creation and destruction.

ACE program 复旦大学大数据学院 School of Data Science, Fudan University Automatic Content Extraction (ACE) program: An example of text

ACE program : entity 复旦大学大数据学院 School of Data Science, Fudan University Entity Detection and Tracking (EDT) • ACE tasks identified seven types of entities: Person, Organization, • Location, Facility, Weapon, Vehicle and Geo-Political Entity (GPEs). Each type was further divided into subtypes. For every mention, the annotator identified the maximal extent of the • string that represents the entity and labeled the head of each mention. Nested mentions were also captured.

ACE program : relation 复旦大学大数据学院 School of Data Science, Fudan University • Relation Detection and Characterization (RDC) ： involved the identification of relations between entities. • For every relation, annotators identified two primary arguments • (namely, the two ACE entities that are linked) as well as the relation's temporal attributes.

ACE program : relation 复旦大学大数据学院 School of Data Science, Fudan University • Create new structured knowledge bases, useful for any app • Augment current knowledge bases • Adding words to WordNet thesaurus, facts to FreeBase or DBPedia • DBpedia : an ontology derived from Wikipedia containing over 2 billion RDF triples. • Freebase : a dataset from Wikipedia infoboxes. • On 16 December 2015, Google officially announced the Knowledge Graph API, which is meant to be a replacement to the Freebase API. • Support question answering • The granddaughter of which actor starred in the movie “E.T.”? (acted-in ?x “E.T.”)(is-a ?y actor)(granddaughter-of ?x ?y) •

ACE program : relation 复旦大学大数据学院 School of Data Science, Fudan University Automatic Content Extraction (ACE) program: • 7 types and 17 subtypes relations from “Relation Extraction Task” PERSON- GENERAL PART- PHYSICAL SOCIAL AFFILIATION WHOLE Subsidiary Lasting Citizen- Family Near Personal Geographical Resident- Located Ethnicity- Org-Location- Business Religion Origin ORG ARTIFACT AFFILIATION Investor Founder Student-Alum User-Owner-Inventor- Ownership Employment Manufacturer Membership Sports-Affiliation

ACE program : relation 复旦大学大数据学院 School of Data Science, Fudan University • Physical-Located PER-GPE • He was in Tennessee • Part-Whole-Subsidiary ORG-ORG • XYZ, the parent company of ABC • Person-Social-Family PER-PER • John’s wife Yoko • Org-AFF-Founder PER-ORG • Steve Jobs, co-founder of Apple…

ACE program : relation 复旦大学大数据学院 School of Data Science, Fudan University • Using Patterns to Extract Relations • lexico-syntactic pattern ( 词典 - 语义规则 )

ACE program : relation 复旦大学大数据学院 School of Data Science, Fudan University • Supervised Learning 1. Find all pairs of named entities 2. Decide if 2 entities are related 3. If yes, classify the relation

ACE program : relation 复旦大学大数据学院 School of Data Science, Fudan University • Supervised Learning • The most important step: classification • e.g. American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said.

ACE program : relation 复旦大学大数据学院 School of Data Science, Fudan University • Semi-supervised Learning 1. A few high-precision seed patterns or seed tuples. 2. Finding sentences that contain entities in the seed pair. 3. Extract and generalize the context to learn new patterns. May cause semantic drift

ACE program : relation 复旦大学大数据学院 School of Data Science, Fudan University • Semi-supervised Learning • To avoid semantic drift, we introduce confidence value. • Setting conservative confidence thresholds for the acceptance of new patterns and tuples.

ACE program : event 复旦大学大数据学院 School of Data Science, Fudan University Automatic Content Extraction (ACE) program: Event Detection and Characterization (EDC) •

大纲 ACE program 1 Assignment 3: Chinese event extraction 2 2 CRF++: Yet Another CRF toolkit 3

Description 复旦大学大数据学院 School of Data Science, Fudan University • In this assignment, you will need to use sequence labeling models for Chinese event extraction. • Event information are defined as two parts: • Trigger : the main word that most clearly expresses the occurrence of an event. • Argument : an entity, temporal expression or value that plays a certain role in the event. • For example: “ 因特尔在中国成立了研究中心 ” • “ 成立 ” is the trigger of type Business • “ 英特尔 ”, “ 中国 ” and “ 研究中心 ” are the arguments of type Agent, Place and Org

Description 复旦大学大数据学院 School of Data Science, Fudan University • This task is separated as two subtasks: • Trigger labeling: identify the trigger word in the sentence, and classify it to the following 8 types: • Argument labeling: identify all the arguments in the sentence, and classify them to 35 types (some are listed below, all types could be found in the training file): • You are required to use both HMM and CRF models for this task. You can use any toolkit for their implementation. • Note that the performance of HMM can be very poor.

Formal Definition 复旦大学大数据学院 School of Data Science, Fudan University Input A sequence of segmented Chinese words. Output Label each word with ‘T_type’ (trigger), ‘A_type’ (argument) or ‘O’ (neither trigger nor argument). Save your labeling result after the real label separated with tab. fg1:input fg2: training instance fg3: testing result

Provided Files 复旦大学大数据学院 School of Data Science, Fudan University • trigger_train.txt & trigger_test.txt : • These two files contain 1,918 and 669 instances for training and testing, respectively. • Each line contains one word and its label separated by tabs. • Instances are separated by blank line. • argument_train.txt & argument_test.txt : • These two files contain 2,131 and 997 instances for training and testing, respectively. • Your job is to predict the sequence label for instances in test files, and write your predictions in result files. The labels in test files are only for evaluation. • eval.py • This file can help you evaluate your model’s recall, accuracy, precision and F1-score.

Submission 复旦大学大数据学院 School of Data Science, Fudan University • Generate a zip file and name it as “sid_homework- 3.zip”. • It should include a python file named “extraction.py”, two output files named “trigger_result.txt” and “argument_result.txt”, and a written report named “chinese event extraction.pdf”. • Program: codes should be written in python. • Report: the report needs to be written in English with no more than 4 pages.

Evaluation 复旦大学大数据学院 School of Data Science, Fudan University • We will mark your homework based on the four criteria: • Final accuracy (20%) • Program (30%) • Report (40%) • HMM implementation (10%)

Due 复旦大学大数据学院 School of Data Science, Fudan University • Submit your homework via E-learning system. • Deadline: Mid-night at December 8 th 2017 • If you have any questions about this homework, send email to TA or our course mailbox. • TA in Charge • 杨依莹 (zoeyangyy@163.com )

大纲 ACE program 1 Assignment 3: Chinese event extraction 2 CRF++: Yet Another CRF toolkit 3 3

CRF++: Yet Another CRF toolkit 复旦大学大数据学院 School of Data Science, Fudan University • CRF++ ( http://taku910.github.io/crfpp/ ) is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data. • CRF++ is designed for generic purpose and will be applied to a variety of NLP tasks, such as Named Entity Recognition, Information Extraction and Text Chunking.

Chinese Event Extraction School of Data Science, Fudan University - PowerPoint PPT Presentation

Chinese Event Extraction School of Data Science, Fudan University 2017.11.22 ACE program 1 1 Assignment 3: Chinese event extraction 2 CRF++: Yet Another CRF toolkit 3 ACE program

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

WELCOME CHINESE Your Access Channel to the Chinese Market Welcome Chinese mission statement

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

More Event Combinators CML provides two more event combinators: guard and withNack : val guard :

EUTR Chinese Plywood Project Chinese Plywood Project Presentation 15.04.15 Why Chinese plywood?

Event Extraction Event Template for Terrorist Acts OUTPUT: filled event INPUT: document

Ch 11. Event Cognition Seminar on Event Cognition Summary of Event Cognition Event

Variability Extraction and Analysis Toolkit (VEXA) VEXA Introduction The Variability Extraction

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

Thesis presentation Event Extraction from Text and Translation to Event Calculus Geert Heyman

investors Do You Want: To raise money from Chinese EB-5 investors? To partner with Chinese EB-5

Panel Session Code Abstract Title Author Co-author(s) Chinese and Chinese American

Informational webinar for EPA STAR RFA on Air, Climate and Energy (ACE) Centers: Science

Hardening Windows Applications Hardening Windows Applications olleB olle@toolcrypt.org The

ACE & Behavioural Game Theory, Hierarchy of Cognitive Interactive Agents & Design

An Example Blackjack: Goal is to obtain cards whose sum is as great as possible without exceeding

CINeMA Georgia Salanti & Theodore Papakonstantinou Institute of Social and Preventive

1 B - C

Independent Events 48. Medical Experiment. A medical experiment showed that the probability that a

Sustaining and Spreading Trauma Informed Care in Clinical Practice R.J. Gillespie, MD, MHPE, FAAP

Chinese Event Extraction School of Data Science, Fudan University - PowerPoint PPT Presentation

Chinese Event Extraction School of Data Science, Fudan University 2017.11.22 ACE program 1 1 Assignment 3: Chinese event extraction 2 CRF++: Yet Another CRF toolkit 3 ACE program

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

WELCOME CHINESE Your Access Channel to the Chinese Market Welcome Chinese mission statement

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

More Event Combinators CML provides two more event combinators: guard and withNack : val guard :

EUTR Chinese Plywood Project Chinese Plywood Project Presentation 15.04.15 Why Chinese plywood?

Event Extraction Event Template for Terrorist Acts OUTPUT: filled event INPUT: document

Ch 11. Event Cognition Seminar on Event Cognition Summary of Event Cognition Event

Variability Extraction and Analysis Toolkit (VEXA) VEXA Introduction The Variability Extraction

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

Thesis presentation Event Extraction from Text and Translation to Event Calculus Geert Heyman

investors Do You Want: To raise money from Chinese EB-5 investors? To partner with Chinese EB-5

Panel Session Code Abstract Title Author Co-author(s) Chinese and Chinese American

Informational webinar for EPA STAR RFA on Air, Climate and Energy (ACE) Centers: Science

Hardening Windows Applications Hardening Windows Applications olleB olle@toolcrypt.org The

ACE &amp; Behavioural Game Theory, Hierarchy of Cognitive Interactive Agents &amp; Design

An Example Blackjack: Goal is to obtain cards whose sum is as great as possible without exceeding

CINeMA Georgia Salanti &amp; Theodore Papakonstantinou Institute of Social and Preventive

1 B - C

Independent Events 48. Medical Experiment. A medical experiment showed that the probability that a

Sustaining and Spreading Trauma Informed Care in Clinical Practice R.J. Gillespie, MD, MHPE, FAAP

ACE & Behavioural Game Theory, Hierarchy of Cognitive Interactive Agents & Design

CINeMA Georgia Salanti & Theodore Papakonstantinou Institute of Social and Preventive