Scalable Understanding of Multilingual Media Steve Renals - - PowerPoint PPT Presentation

scalable understanding of multilingual media
SMART_READER_LITE
LIVE PREVIEW

Scalable Understanding of Multilingual Media Steve Renals - - PowerPoint PPT Presentation

Scalable Understanding of Multilingual Media Steve Renals University of Edinburgh Funded by the EU H2020 ICT Programme under Grant Agreement 688139 http://summa-project.eu Funded by the EU H2020 ICT Programme under Grant Agreement 688139


slide-1
SLIDE 1

Funded by the EU H2020 ICT Programme under Grant Agreement 688139 http://summa-project.eu

Scalable Understanding of Multilingual Media

Steve Renals

University of Edinburgh

slide-2
SLIDE 2

Funded by the EU H2020 ICT Programme under Grant Agreement 688139 http://summa-project.eu

slide-3
SLIDE 3

http://summa-project.eu

SUMMA in a nutshell

  • Significantly improve media monitoring, by

the automatic

  • analysis of media streams across many

languages

  • aggregation and distillation of stream content
  • construction of knowledge bases from reported

facts

  • supply of media data visualisations at scale
slide-4
SLIDE 4

http://summa-project.eu

BBC Monitoring

300 journalists each monitoring up to 4 TV channels several online text sources

30 languages – most important include Russian Arabic Farsi

slide-5
SLIDE 5

http://summa-project.eu

Big Data

  • 250 video channels
  • 2.5Tb/day, 19Tb/week, 1Pb/year
  • BBC monitoring has access to
  • 1,500 TV channels
  • 1,350 radio sources
  • But… ~700 free-to-air Arabic satellite channels,

increases at ~100/year

  • Current monitoring processes are largely manual

and cannot keep up with the scale of the task

slide-6
SLIDE 6

http://summa-project.eu

Use cases

  • 1. External Media Monitoring
  • identify emerging trends
  • tracking people in the news
  • monitoring the evolution of storylines
  • 2. Internal Media Montoring
  • manage multilingual content creation
  • efficient reuse of content across languages
  • 3. Data Journalism
  • use SUMMA platform for data driven journalism
slide-7
SLIDE 7

http://summa-project.eu

SUMMA Prototypes

Channel ID & native language Semantic Tag word cloud- size indicates current frequency across region/ group Segment Unique timestamp Player (Sd? HD?) Player controller - tag instances marked Tools - screen grab, snip video, save, attach Translated transcript “Now playing” text highlighted Tags shown underlined Add new tag - click pencil to ‘underline’, and enter text Segment machine analysis confidence (possibly better represented graphically?)

UI Concept 1

slide-8
SLIDE 8

http://summa-project.eu

SUMMA Prototypes

slide-9
SLIDE 9

http://summa-project.eu

SUMMA Prototypes

slide-10
SLIDE 10

http://summa-project.eu

SUMMA Prototypes

slide-11
SLIDE 11

http://summa-project.eu

Platform & Technologies

Speech recognition Machine translation Segmentation & clustering Ingest audio, video, text Identify entities & relations Summarisation & distillation Sentiment detection Visualisation & prototypes

slide-12
SLIDE 12

http://summa-project.eu

Multilingual technologies

slide-13
SLIDE 13

http://summa-project.eu

SUMMA system v0.1

slide-14
SLIDE 14

Funded by the EU H2020 ICT Programme under Grant Agreement 688139 http://summa-project.eu