Introduction Image credit (https://blog.bufferapp.com/) What is - - PowerPoint PPT Presentation

introduction image credit https blog bufferapp com what
SMART_READER_LITE
LIVE PREVIEW

Introduction Image credit (https://blog.bufferapp.com/) What is - - PowerPoint PPT Presentation

Introduction Image credit (https://blog.bufferapp.com/) What is Name Tagging? [ ORG France] defeated [ ORG Croatia] in [ MISC World Cup] final at [ LOC Luzhniki Stadium]. Why important? Provide inputs to downstream applications


slide-1
SLIDE 1
slide-2
SLIDE 2

Introduction

slide-3
SLIDE 3

Image credit (https://blog.bufferapp.com/)

slide-4
SLIDE 4

[ORG France] defeated [ORG Croatia] in [MISC World Cup] final at [LOC Luzhniki Stadium].

  • Why important?

○ Provide inputs to downstream applications ○ Searching ○ Recommendation ○ Knowledge graph construction

  • What is Name Tagging?
slide-5
SLIDE 5

CR7 or TK8

News VS Tweet

  • Limited Textual Context
  • Performs much worse on

social media data

slide-6
SLIDE 6
  • Language Variations
  • Bad segmentation
  • Within word white spaces

I luv juuustin Alison wonderlandxDiploxjuaz B2B ayee’ LETS GO L A K E R S

slide-7
SLIDE 7

Difficult cases based on text only

Modern Baseball played an intimate surprise set at Shea Karl-Anthony Towns named unanimous 2015-2016 NBA Rookie of the Year

slide-8
SLIDE 8

8

Colts Have 4th Best QB Situation in NFL with Andrew Luck #ColtStrong

[ORG Colts] Have 4th Best QB Situation in [ORG NFL] with [PER Andrew Luck] #ColtStrong Multimedia Input: image-sentence pair Output: tagging results

  • n sentence
slide-9
SLIDE 9

Overview

slide-10
SLIDE 10
  • Sequence Labeling Model

○ Bidirectional Long-short-term-memory-networks (BLSTM)

■ Word representations Generations

○ Conditional-random-fields (CRF)

■ Joint tags prediction

○ State-of-the-art for news articles ( )

  • Visual attention model (Bahdanau et al., 2014)

○ Extract visual features from image regions that are most related to accompanying sentence

  • Modulation Gate before CRFs

○ Combine word representation with visual features based on their relatedness

slide-11
SLIDE 11

Model

slide-12
SLIDE 12

12

slide-13
SLIDE 13

13

and are the input, memory and hidden state at time t respectively. and are weight matrices. is the element-wise product functions and is the element-wise sigmoid function

slide-14
SLIDE 14

Input sentence Input image

Outputs from convolutional layer Attention calculate Context Vector

slide-15
SLIDE 15

Visual context Word representation Visually tuned word representation

slide-16
SLIDE 16

Experiments

slide-17
SLIDE 17
  • Snap Caption Dataset and Twitter DataSet (image+text)
  • Topics: Sports, concerts and other social events
  • Named Entity Types: Person, Organization, Location and MISC

Training Develement Testing Snap Sentence 4,817 1,032 1,033 Tokens 39,035 8,334 8,110 Twitter Sentence 4,290 1,432 1,459 Tokens 68,655 22,872 23,051 Size of the dataset in numbers of sentences and tokens

slide-18
SLIDE 18

Model Snap Captions Tweets Precision Recall F1 Precision Recall F1 BLSTM-CRF 57.71 58.65 58.18 78.88 77.47 78.17 +Global Image Vector 61.49 57.84 59.61 79.75 77.32 78.51 +Visual Attention 65.53 57.03 60.98 80.81 77.36 79.05 Gate controlled visual attention 66.67 57.84 61.94 81.62 79.90 80.75

slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

Future Work

slide-22
SLIDE 22
  • Fine Grained Name Tagging

San Francisco Giants New York Giants Belfast Giants

[PER CR7] & [PER Messi] shake hands

  • Joint Multimodal Grounding and Name Tagging

Giants won the game

slide-23
SLIDE 23