Monitoring and analysing multilingual media reports Monitoring and - - PDF document

monitoring and analysing multilingual media reports
SMART_READER_LITE
LIVE PREVIEW

Monitoring and analysing multilingual media reports Monitoring and - - PDF document

The Second KYOTO Workshop, Gifu, Japan, 26 January 2011 1 Monitoring and analysing multilingual media reports Monitoring and analysing multilingual media reports to support users in their daily work Ralf Steinberger & the JRCs OPTIMA


slide-1
SLIDE 1

1 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Monitoring and analysing multilingual media reports Monitoring and analysing multilingual media reports to support users in their daily work

Ralf Steinberger

& the JRC‘s OPTIMA team – Open Source Text Information Mining and Analysis Technical details and publications: http://langtech.jrc.ec.europa.eu/ Applications: http://press.jrc.it/overview.html

2 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Agenda

  • JRC: Who we are – what we do – our customers.
  • Europe Media Monitor (EMM) family of applications

Europe Media Monitor (EMM) family of applications

  • Publicly accessible at http://press.jrc.it/overview.html

1. Gathering of multilingual news; clustering; classification g g g 2. Alerting and early warning 3. Event extraction

  • Adapting to new domains
  • Other text mining applications – Brief overview
  • Summary and Conclusion
  • Summary and Conclusion
slide-2
SLIDE 2

3 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Joint Research Centre - Who we are

  • European Commission

European Commission (scientific-technical arm of public administration)

  • Non-commercial
  • Multi-disciplinary / multilingual

Multi disciplinary / multilingual

  • Relatively small team working on Language Technology

and media monitoring

4 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

EMM media monitoring users – wide coverage, world-wide

  • European Commission (most DGs) and other EU Institutions
  • EU Agencies:

EU Agencies:

  • e.g. Public Health (ECDC), Food Safety (EFSA), Chemicals Bureau (ECHA), etc.
  • EU Member State organisations: e.g.

g g

  • Public Health,
  • law enforcement authorities,

li t

  • parliaments,
  • crisis management/humanitarian
  • International and extra-European organisations: e g

International and extra European organisations: e.g.

  • various UN organisations
  • Centres for Disease Prevention and Control in the US, Canada, China, …
  • The public:
  • Ca. 20 - 30,000 anonymous internet users of publicly accessible EMM systems.

C bi d b t 1 d 2 Milli hit d

  • Combined between 1 and 2 Million hits per day
slide-3
SLIDE 3

5 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Europe Media Monitor (EMM) news gathering - A few facts

  • ~ 2500 Sources (world-wide, with focus on Europe)
  • ~ 2300 news sources (web portals)
  • ~ 200 specialist medical sites
  • ~ 20 commercial newswires
  • Specialist pay-for sources (LexisMed)

Specialist pay for sources (LexisMed)

  • 24/7, updated every 10 minutes
  • ~ 100,000 articles / day in ~ 50 languages
  • Converts dirty html with adverts, menus, html tags,

‘related stories’, etc. into clean and standardised UTF-8 encoded RSS format. UTF 8 encoded RSS format.

  • Articles are fed into the various EMM applications:

6 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Agenda

  • JRC: Who we are – what we do – our customers.
  • Europe Media Monitor (EMM) family of applications

Europe Media Monitor (EMM) family of applications

  • Publicly accessible at http://press.jrc.it/overview.html

1. Gathering of multilingual news; clustering; classification g g g 2. Alerting and early warning 3. Event extraction

  • Adapting to new domains
  • Other text mining applications – Brief overview
  • Summary and Conclusion
  • Summary and Conclusion
slide-4
SLIDE 4

7 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

EMM – NewsBrief (up to 50 languages)

  • Public site: http://emm.newsbrief.eu/
  • Categorises news into ~ 600 categories, using:

Categorises news into 600 categories, using:

  • Boolean search word combinations
  • vicinity operators
  • ptional weights
  • regular expressions
  • Clusters and tracks news live
  • Clusters and tracks news live

(multi-monolingually)

  • Sends out email notifications

Sends out email notifications for each category

  • Detects breaking news

g

  • Lookup of known entities
  • Quotation recognition

8 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

NewsBrief Live Cluster Map

Display of latest geo-located news clusters

slide-5
SLIDE 5

9 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

EMM-NewsBrief – Some environment-related categories

Environment live, at http://emm.newsbrief.eu/

10 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

EMM-NewsBrief – Example page: Ecology

slide-6
SLIDE 6

11 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

MedISys – Filtering and classification in up to 50 languages Access MedISys at http://medusa.jrc.it/ p j

12 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Agenda

  • JRC: Who we are – what we do – our customers.
  • Europe Media Monitor (EMM) family of applications

Europe Media Monitor (EMM) family of applications

  • Publicly accessible at http://press.jrc.it/overview.html

1. Gathering of multilingual news; clustering; classification g g g 2. Alerting and early warning 3. Event extraction

  • Adapting to new domains
  • Other text mining applications – Brief overview
  • Summary and Conclusion
  • Summary and Conclusion
slide-7
SLIDE 7

13 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

MedISys - Aggregation of multilingual information; Alerting

  • Documents from all languages get classified according to the same countries and categories.
  • An increase of the number of media reports on any country-category combination is detected,
  • independently of the reporting language.
  • Graphs and alerts may show events not yet reported in your own language
  • Graphs and alerts may show events not yet reported in your own language.

14 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

slide-8
SLIDE 8

15 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Agenda

  • JRC: Who we are – what we do – our customers.
  • Europe Media Monitor (EMM) family of applications

Europe Media Monitor (EMM) family of applications

  • Publicly accessible at http://press.jrc.it/overview.html

1. Gathering of multilingual news; clustering; classification g g g 2. Alerting and early warning 3. Event extraction

  • Adapting to new domains
  • Other text mining applications – Brief overview
  • Summary and Conclusion
  • Summary and Conclusion

16 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

EMM-NEXUS Event Extraction System

Access NEXUS at: http://emm-labs.jrc.it/ or http://emm.newsbrief.eu/geo?type=event&format=html&language=all

slide-9
SLIDE 9

17 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

EMM-NEXUS – Event Extraction System

  • NEXUS:

Multilingual Information Extraction system Multilingual Information Extraction system for the extraction of structured event descriptions from online news referring to conflicts, crimes and disasters.

  • Currently 7 Languages:
  • Currently 7 Languages:

English, French, Portuguese, Arabic, Spanish, Italian, Russian (and Chinese).

  • Near real-time: every 10 minutes, EMM clusters the latest articles

about the same event and NEXUS extracts structured information.

  • Objective:

Global crisis monitoring (Live situation or long-term trend).

18 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Event Extraction Output (English, French and Portuguese)

Baghdad car bombs kill at least 127

Event Type: Terrorist Attack

Johannesburg: cinq suspects arrêtés pour le meurtre du curé français

Event Type: Terrorist Attack Severity: 127 killed 448 injured Weapons: car bomb

pour le meurtre du curé français

Event Type: Arrest Severity: 1 killed 0 injured Place: Baghdad Severity: 1 killed 0 injured Victims: prêtre français/ Louis Blondel killed Place: Johannesburg

Police search for killer bus driver Timor-Leste: Indonésios estão a fazer Police search for killer bus driver

Event Type: Man-Made Disaster Severity: 1 killed 6 injured

Timor Leste: Indonésios estão a fazer "cortina de fumo" sobre morte dos "5 de Balibó" - viúva (C/ÁUDIO)

Victims: passenger killed Place: London Severity: 5 killed, 0 injured Victims: jornalistas killed Place: Timor-Leste.

slide-10
SLIDE 10

19 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Aggregating information extracted from various articles

Car bomber strikes north Pakistan

ech-chorouk-en Tuesday, November 10, 2009 2:23:00 PM CET

A car bomb has exploded in Pakistani's northwestern town of Charsadda killing at least 10 people.... Bomb explodes in northwestern Pakistani town

yediotaharonot Tuesday, November 10, 2009 1:58:00 PM CET

A bomb exploded in the northwestern Pakistani town of Charsadda on Tuesday causing an unknown number of casualties, police said. "It was a bomb blast.... 10 killed in Pakistan bomb

RTERadio Tuesday, November 10, 2009 1:57:00 PM CET

A bomb has exploded in the north-western Pakistani town of Charsadda, killing 10 people....

TYPE Bombing PLACE Charsadda, Pakistan TIME T d N b 10 2009 TIME Tuesday, November 10, 2009 DEAD COUNT 10 DEAD DESCRIPTION people WOUNDED COUNT/DESC WOUNDED COUNT/DESC DISPLACED COUNT/DESC HOMELESS COUNT/DESC ARRESTED COUNT/DESC PERPETRATOR PERPETRATOR WEAPONS Bomb

20 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Event extraction – Text Version

live

slide-11
SLIDE 11

21 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Event extraction – Display on a map

22 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Event extraction – Visualisation using Google Earth

slide-12
SLIDE 12

23 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Event types currently recognised

24 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Event Type Hierarchy

Crisis event Crisis event Disaster Security-related Humanitarian crisis Medical events Natural Disaster Manmade disaster/ Arrest Kidnapping/Hostage taking Hostage release Violent event Terrorist attack Shooting Armed conflict Natural Disaster accident Trial Hostage release Hostage video Release Armed conflict Execution Crimes Assassination

slide-13
SLIDE 13

25 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Accessing the original news articles via NewsBrief / MedISys

26 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Event Extraction Process

slide-14
SLIDE 14

27 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Event Extraction – Resources used

Light-weight and shallow process to allow coverage of many languages

  • Morphological dictionaries
  • Static resources, mainly for grammatical structure of rules.
  • Domain-specific lexica
  • (Possibly multiword) expressions subcategorised into semantic classes
  • (Possibly multiword) expressions subcategorised into semantic classes

relevant for the domain.

  • Surface-level extraction patterns
  • Often learned (semi-) automatically.

e.g. [VICTIM] was heavily wounded

  • Finite-state grammar rules
  • To recognise person groups or other partial patterns

e.g. the actor, three Iraqi soldiers

  • ecog se pe so g oups o ot e pa t a patte s

e g t e acto , t ee aq so d e s

  • Generalise the surface-level extraction patterns

e.g. has been (strongly) wounded

  • Ideally language-agnostic

28 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Event extraction process

slide-15
SLIDE 15

29 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Agenda

  • JRC: Who we are – what we do – our customers.
  • Europe Media Monitor (EMM) family of applications

Europe Media Monitor (EMM) family of applications

  • Publicly accessible at http://press.jrc.it/overview.html

1. Gathering of multilingual news; clustering; classification g g g 2. Alerting and early warning 3. Event extraction

  • Adapting to new domains, e.g. environment
  • Other text mining applications – Brief overview
  • Summary and Conclusion
  • Summary and Conclusion

30 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Ad ti th t ti t l t d i Adapting the extraction tool to new domains

NEXUS Machine-Learning

Learning new patterns for the same slots Learning new patterns for the same slots Learning semantic classes Learning domain specific words Learning domain-specific words

slide-16
SLIDE 16

31 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

NEXUS – learning new event patterns

32 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Learning domain-specific lexica – Terminology learning tool

  • Input: a handful of keywords - seeds
  • Output: a set of keywords which tend to co-occur with seeds ordered by weight

Output: a set of keywords which tend to co occur with seeds, ordered by weight

  • TF.IDF like formula for term weighting:

Weight(term)=TF.IDF2 TF=Frequency (seeds, term) – the number of documents which contain both the term and at least one of the seeds IDF= log(NumberDocuments / Frequency(Term)), e.g.

  • Seeds:

sustainable development, sustainable energy, clean energy, environmental, greenhouse gases

  • Output: •

environment

  • efficiency
  • renewable
  • Output: •

environment

  • emissions
  • climate
  • carbon
  • efficiency
  • water
  • future
  • projects
  • renewable
  • quality
  • management
  • technologies
  • differ materially
  • impact
  • global
  • development

j

  • developing
  • based
  • cost
  • economic

g

  • efficient
  • developed
  • sustainability
  • industry
  • development
  • risks and uncertainties
  • resources
  • economic
  • potential
  • reducing
  • industry
  • technology
slide-17
SLIDE 17

33 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Clustering the learnt lexica automatically

  • Automatic clustering of the terms, based on their

immediate left and right-hand-side contexts immediate left and right-hand-side contexts

  • Some interesting clusters:

Some interesting clusters: 1 greenhouse gas emissions carbon emission greenhouse gases carbon 1. greenhouse gas emissions, carbon emission, greenhouse gases, carbon dioxide 2 biodegradable recycled recyclable 2. biodegradable, recycled, recyclable 3. environmentally sustainable, energy-efficient, environmentally responsible 4. energy savings, fuel economy, carbon reduction, reliability, fuel efficiency, energy efficiency

34 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Automatic acquisition of semantic classes

  • Sometimes, it is necessary to learn specific semantic classes, e.g.

disasters, types of chemicals, facilities, professions, etc. disasters, types of chemicals, facilities, professions, etc.

  • Language-independent system, only needs language-specific stop word lists.
  • Method based on:
  • E.g.
  • Seeds:

toxic, hazardous

  • Output (Top): • hazardous

77.20 toxic 73 10

  • poisonous

8.00 toxic substances 7 94

  • toxic

73.10

  • radioactive

18.67

  • harmful

13.78 l 12 18

  • toxic substances

7.94

  • highly toxic

7.37

  • solid

7.26 i i 7 21

  • nuclear

12.18

  • dangerous

9.68

  • organic

8.63

  • carcinogenic

7.21

  • noxious

6.47

  • industrial

5.73

  • chemical

8.56

  • corrosive

5.45

slide-18
SLIDE 18

35 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Some evaluation results (based on approx. 50-100 event clusters)

Dead Wounded Kidnapped Perpetrators Precision English 91% 91% 100% 69% Dead Wounded Kidnapped Arrested F1 Portuguese 0.69 0.51 0.67 0.47 F1 Spanish 0.46

  • 0.13

F1 Italian 0 87 0 62 0 67 F1 Italian 0.87 0.62

  • 0.67

Conclusion:

  • There are errors in the output, therefore manual verification is necessary.
  • Some less-reported events can remain undetected.
  • Two or more events are sometimes merged into one event description.
  • The same event can be presented via several event descriptions.

36 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Event moderation interface – to produce reliable event data

slide-19
SLIDE 19

37 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Live access to the EMM-NEXUS event extraction system

  • http://emm-labs.jrc.it/ (enter via the public site EMM-Labs)

(substitute in the URL ‘language=en’ by ‘language=all’) (substitute in the URL language en by language all )

  • http://emm.newsbrief.eu/NewsBrief/eventedition/en/latest.html

http://emm.newsbrief.eu/NewsBrief/eventedition/en/latest.html (text format)

38 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Agenda

  • JRC: Who we are – what we do – our customers.
  • Europe Media Monitor (EMM) family of applications

Europe Media Monitor (EMM) family of applications

  • Publicly accessible at http://press.jrc.it/overview.html

1. Gathering of multilingual news; clustering; classification g g g 2. Alerting and early warning 3. Event extraction

  • Adapting to new domains
  • Other text mining applications – Brief overview
  • Summary and Conclusion
  • Summary and Conclusion
slide-20
SLIDE 20

39 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Text mining in EMM-NewsExplorer

htt // l / http://emm.newsexplorer.eu/

40 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

NewsExplorer – Multilingual daily news overview

live

slide-21
SLIDE 21

41 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

NewsExplorer – Cross-lingual cluster linking

42 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

NewsExplorer – Time line: biggest clusters per day

slide-22
SLIDE 22

43 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

NewsExplorer – Aggregation of clusters into longer ‘stories’

live

44 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

NewsExplorer –Information about people

collected from multiple languages and over time

live

slide-23
SLIDE 23

45 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

NewsExplorer – Relation exploration

Example: Pervez Musharraf & Iftikhar Chaudhry

live

46 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Ongoing work Ongoing work

slide-24
SLIDE 24

47 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Ongoing: Multilingual multi-document summarisation

48 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Ongoing: Opinion mining (Sentiment Analysis)

  • E.g. Detect opinions on
  • European Constitution; EU press releases;
  • Entities (persons, organisations, EU programmes and initiatives);
  • Use for social network analysis
  • Use for social network analysis
  • Detect and display opinion differences across sources and across countries;
  • Follow trends over time
  • Follow trends over time.
slide-25
SLIDE 25

49 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Ongoing: Monitoring social media

  • Facebook:

Keyword searches on publicly available posts Keyword searches on publicly available posts e.g. search for Chikungunya on openbook.org extract publicly available friend networks.

  • Twitter:

Keyword searches on publicly available tweets e g search for Chikungunya on twitter com e.g. search for Chikungunya on twitter.com

  • Blogs

g

50 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Ongoing: Monitoring social media - Facebook

slide-26
SLIDE 26

51 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Ongoing: Monitoring social media - Twitter

52 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Usefulness of monitoring social media?

  • Case Study: Norovirus outbreak at Oldenburg University
  • Compare data on a Norovirus outbreak from

p Facebook, Twitter, media and blogs

  • Were social media faster than established methods for indicator-based surveillance
  • f Norovirus?

Report on the Oldenburg Norovirus Outbreak Information Source Date and Time

  • f Norovirus?
1 2 a u g 2 1 9 : : 1 3 1 1 6 : 4 1 : 9 1 3 a u g 2 1 8 : 5 9 : 5 1 3 a u g 2 1 8 : 5 9 : 5 1 1 7 : 2 : 3 7 1 3 a u g 2 1 1 9 : 3 1 : 9 1 4 a u g 2 1 8 : 5 9 : 2 6 1 4 a u g 2 1 2 : 4 5 : 2 1 7 : 4 6 : 5 7 1 6 a u g 2 1 9 : : 5 1 6 a u g 2 1 9 : : 5 1 6 a u g 2 1 9 : : 5 1 6 a u g 2 1 9 : : 5 1 6 a u g 2 1 1 2 : 5 6 : 4 6 1 6 a u g 2 1 1 3 : 5 1 : 2 3 1 6 a u g 2 1 1 5 : 4 : 3 6 1 6 a u g 2 1 1 6 : 1 9 : 5 6 1 6 a u g 2 1 1 7 : 2 1 : 6 1 6 a u g 2 1 1 9 : 5 : 5 7 1 6 a u g 2 1 2 : 9 : 1 8 1 7 a u g 2 1 9 : : 2 7 1 7 a u g 2 1 9 : : 2 7 1 7 a u g 2 1 1 3 : 2 9 : 8 2 a u g 2 1 1 8 : 5 7 : 5 2 1 a u g 2 1

M e d i a c u s s i

  • n

F

  • r

u m ce

Report on the Oldenburg Norovirus Outbreak, Information Source, Date and Time

1 2 a u g 2 1 1 4 : 1 6 : 5 9 5 : 1 5 : 5 8 1 2 a u g 2 1 3 : 3 : 2 7 9 : 4 7 : 5 3 1 3 a u g 2 1 1 : 3 5 : 5 7 : 5 7 : 4 8 1 3 a u g 2 1 1 1 : 4 5 : 5 1 : 5 2 : 2 4 6 : 1 : 2 7 1 3 a u g 2 1 8 : 3 2 : 1 1 : 1 : 4 4 1 3 a u g 2 1 2 : 3 8 : 5 3 : 2 9 : 1 4 a u g 2 1 2 : 5 1 : 3 6 1 5 a u g 2 1 : 4 7 : 5 3 : 3 5 : 5 6 1 6 a u g 2 1 1 5 : 1 8 : 4 6 1 6 a u g 2 1 1 7 : 4 : 4 5 1 6 a u g 2 1 1 9 : 8 : 8 2 : 5 3 : 8 8 : 4 7 : 2 1 7 a u g 2 1 1 1 : : 3 5 2 : 1 9 : 1 4 4 : 1 9 : 2 3 7 : 4 6 : 5 5 4 : 1 : 7 4 : 1 : 7 : 4 : 5 5

w s c e b

  • k

U s e r F a c e b

  • k

N e w s N e w s M e D i s c Information Sourc

1 2 a u g 2 1 1 5 : 1 1 2 a u g 2 1 2 3 : 1 3 a u g 2 1 9 : 4 1 3 a u g 2 1 1 : 5 1 3 a u g 2 1 1 1 : 5 1 3 a u g 2 1 1 6 : 1 3 a u g 2 1 1 8 : 3 1 3 a u g 2 1 2 : 1 4 a u g 2 1 1 : 2 1 6 a u g 2 1 1 : 4 1 6 a u g 2 1 1 1 : 3 1 6 a u g 2 1 2 2 : 5 1 7 a u g 2 1 8 : 4 1 7 a u g 2 1 1 2 : 1 1 7 a u g 2 1 1 4 : 1 1 7 a u g 2 1 1 7 : 4 1 9 a u g 2 1 1 4 : 1 9 a u g 2 1 1 4 : 2 a u g 2 1 1 1 : 4

T w i t t e r U s e r T w i t t e r N e w s F a c e A u g u s t 1 2 , 2 1 A u g u s t 1 3 , 2 1 A u g u s t 1 4 , 2 1 A u g u s t 1 5 , 2 1 A u g u s t 1 6 , 2 1 A u g u s t 1 7 , 2 1 A u g u s t 1 9 , 2 1 A u g u s t 2 , 2 1 A u g u s t 2 1 , 2 1 A u g A u g A u g A u g A u g A u g A u g A u g A u g Date and Time

Data from Edward Velasco, German Robert Koch Institute, FP7 project M-ECO

slide-27
SLIDE 27

53 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Ongoing: Monitoring social media – Blogs

  • EMM-BlogBrief –

EMM BlogBrief Blog monitoring in EMM

  • Retrieves blog posts

and comments from a manually selected list of blogs (no crawling)

  • The retrieved blog posts

are processed using the are processed using the same categories as in EMM.

54 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Agenda

  • JRC: Who we are – what we do – our customers.
  • Europe Media Monitor (EMM) family of applications

Europe Media Monitor (EMM) family of applications

  • Publicly accessible at http://press.jrc.it/overview.html

1. Gathering of multilingual news; clustering; classification g g g 2. Alerting and early warning 3. Event extraction

  • Adapting to new domains
  • Other text mining applications – Brief overview
  • Summary and Conclusion
  • Summary and Conclusion
slide-28
SLIDE 28

55 The Second KYOTO Workshop, Gifu, Japan, 26 January 2011

Summary - Europe Media Monitor

  • EMM gathers and processes multilingual news, etc.

1. Classification; clustering, statistics ; g, 2. Alerting and early warning 3 E i 3. Event extraction

  • Manual moderation useful for all automatic processes.
  • Adaptation to new domains such as ‘environment’ is possible (Machine learning).
  • Brief overview over other text mining applications.