ACM MM 2010 Dong Liu , Shuicheng Yan, Yong Rui and Hong-Jiang Zhang - - PowerPoint PPT Presentation

acm mm 2010
SMART_READER_LITE
LIVE PREVIEW

ACM MM 2010 Dong Liu , Shuicheng Yan, Yong Rui and Hong-Jiang Zhang - - PowerPoint PPT Presentation

ACM MM 2010 Dong Liu , Shuicheng Yan, Yong Rui and Hong-Jiang Zhang Harbin Institute of Technology National University of Singapore Microsoft Corporation Proliferation of images and videos on the Internet Sep. 2010 : 5 billion Sep. 2010 : 120


slide-1
SLIDE 1

Dong Liu, Shuicheng Yan, Yong Rui and Hong-Jiang Zhang Harbin Institute of Technology National University of Singapore Microsoft Corporation

ACM MM 2010

slide-2
SLIDE 2

Proliferation of images and videos on the Internet

2

  • Sep. 2010: 5 billion

2000 images /minute

  • Sep. 2010 : 120 million

20 hours uploaded/minute

slide-3
SLIDE 3

3

Year

1990 2000 2001 2002 Internet Image Search

Pure Content Based Direct Text Based Semantic Based

Query by Example Query by Surrounding Text Query by Tag 1st Paradigm 2nd Paradigm 3rd Paradigm

slide-4
SLIDE 4

4 medici chapel, Firenze, Italy... Loggia dei lanzi, sword, honeymoon, ... Statue, building, sky, Italy, ... Cathedral, tower, Italy...

slide-5
SLIDE 5

5

  • D. Liu, X.-S. Hua and H.-J. Zhang. Content-Based Tag Processing for Internet

Social Images: A Survey. Multimedia Tools and Applications.

Tag Refinement

top 101 tour tiger sweet big cloud dog house tree sky ground cloud alex speed leave dog 101 tree

Tag-to-Region Auto-Tagging Tag Ranking

dog leave tree speed alex 101

To discover the relationship between the tags and the underlying semantic regions in the images.

slide-6
SLIDE 6

6

Our Strategy

Perform tag analysis at the granularity of image regions Propose a new concept of multi-edge graph to

  • model the parallel semantic

relationships between the images.

  • propagate the tags from

images to regions.

How to solve various tag analysis tasks in a unified framework?

Content-Based Tag Analysis

Auto- Tagging Tag Refinem ent Tag-to- Region

slide-7
SLIDE 7

probability that edge et is labeled as positive with tag c 7

Multi-Edge Graph A Core Equation

vertex 3

y3

vertex n

yn

vertex 1

y1

vertex 2

y2

vertex k

yk

vertex t

yt

(v, f) (v, f) (v, f) (v, f) (v, f) (v, f) (v, f) (v, f)

  • ne edge

all edges between vertex i and j labeling information of vertex i with respect to tag c

slide-8
SLIDE 8

Step 1: Bag-of-Regions Representation

8

Input Image Bag-of-Regions Segmentation 1 Segmentation 2

slide-9
SLIDE 9

Step 2: Multi-Edge Graph Construction

9

Given two images with bag-of-regions representation: Edge construction : mutual k-Nearest Neighbor Edge affinity calculation

reliable edge connection

reliable similarity measure

slide-10
SLIDE 10

10

two images with the same tag

dog, flower dog, bird at least one edge connecting the two regions corresponding to the tag

slide-11
SLIDE 11

Notations

11

slide-12
SLIDE 12

Model the cross-level tag propagation

12

Loss function Regularization Objective Function Solving F directly is of great computational challenge, we turn to the alternative optimization strategy

slide-13
SLIDE 13

Optimize sub-problems with cutting plane

13

At each iteration, solving only a subset of tag confidence vector between vertex i and vertex j : The yielded a sub-optimization problem : Since Max function is non-smooth, we solve it with the cutting plane method.

slide-14
SLIDE 14

14

f1 f2 f3 f1 f2 f3 dog 0.1 0.2 cat 0.1 0.1 0.2 apple 0.4 0.2 0.4 flower 0.3 0.3 0.2 tree 0.1 0.2 0.1 apple flower apple

Majority Voting: apple (2 times) > flower (1 time)

apple

By doing so, a series of tag analysis tasks can be performed in a coherent way.

slide-15
SLIDE 15

15

The cutting plane iteration will terminate in a constant number of steps.

The optimization objective is convex, resulting in a

globally optimal solution.

slide-16
SLIDE 16

In term of pixel-level accuracy.

MSRC-100 and Corel-350 datasets.(Benchmarks for tag-to-region assignment task) Comparison with kNN-1 (k=49), kNN-2 (k=99) and Bi-layer sparse coding [1].

16

Dataset kNN-1 kNN-2 Bi-layer [1] M-E Graph MSRC-350 0.45 0.37 0.63 0.73 COREL-100 0.52 0.44 0.61 0.67

[1] Liu, Cheng, Yan and Chua . Label to region by bi-layer sparsity priors. MM 2009.

slide-17
SLIDE 17

17

slide-18
SLIDE 18

In terms of Average F-Score

On the NUS-WIDE-SUB datasets with 18, 325 Flickr images Comparison with Baseline (initial user provided tags ), CBAR [1] and TRVSC [2].

18

Method Baseline CBAR [1] TRVSC [2] M-E Graph Precision 0.47 0.50 0.52 0.54 Recall 0.49 0.52 0.53 0.57 F-Score 0.44 0.47 0.49 0.53 [1] C. Wang, L. Zhang and H.-J. Zhang . Content-based Image Annotation Refinement. CVPR 2007. [2] D. Liu, X.-S. Hua and H.-J. Zhang. Retagging Social Images based on Visual and Semantic Consistency . WWW 2010.

slide-19
SLIDE 19

In terms of Average Per-tag Precision and Recall.

MSRC, COREL and NUS-WIDE-SUB datasets. Comparison with the state-of-the-art multi-label auto-tagging methods.

19

slide-20
SLIDE 20

20

Unified Tag Analysis with Multi-Edge Graph

Perform tag analysis at the granularity of image regions Model the parallel semantic relationship between the images Realize cross-level tag propagation

slide-21
SLIDE 21

21

Scalability

Large-scale testing

Correlative cross-level tag propagation

Semantic correlation among the tags

More applications

User behavior analysis in social network Knowledge mining from rich information cues of multimedia document

slide-22
SLIDE 22