jointly aligning and segmenting multiple web photo
play

Jointly Aligning and Segmenting Multiple Web Photo Streams for the - PowerPoint PPT Presentation

Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo Storylines Gunhee Kim Eric P. Xing School of Computer Science, Carnegie Mellon University June 19, 2013 1 Outline Problem


  1. Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo Storylines Gunhee Kim Eric P. Xing School of Computer Science, Carnegie Mellon University June 19, 2013 1

  2. Outline • Problem Statement • Algorithm  Dataset and preprocessing  Alignment of Multiple Photo Streams  Large-scale Cosegmentation • Experiments • Conclusion 2

  3. Background Query scuba+diving from Flickr Any meaningful structural summary? Taken in different spatial, temporal, and personal perspective Likely to share common storylines 3

  4. Our Ultimate Goal An example of scuba+diving storyline beach on boat diving underwater dinner boat coral sunset cf) ranking and retrieval by Narrative structural summary vs. independently retrieved images Reconstructing photo storylines from large-scale online images 4

  5. Objective of This Paper As a first technical step, jointly perform two crucial tasks... Mutually rewarding ! Alignment Cosegmentation Segment K common regions Match images from • from aligned M images • different photo streams PS2 User 1 at 10/19/2008 (Cayman Islands) PS14 User 2 at 03/19/2005 (Phuket, Thailand) PS3 User 3 at 08/27/2008 (Cozumel, Mexico) 5

  6. Objective of This Paper As a first technical step, jointly perform two crucial tasks... Mutually rewarding ! Alignment Cosegmentation • Online images are too diverse to segment together at once • The alignment discovers the images that share common regions PS2 User 1 at 10/19/2008 (Cayman Islands) PS14 User 2 at 03/19/2005 (Phuket, Thailand) PS3 User 3 at 08/27/2008 (Cozumel, Mexico) 6

  7. Objective of This Paper As a first technical step, jointly perform two crucial tasks... Mutually rewarding ! Alignment Cosegmentation • Improve image matching by a better image similarity measure Closing a loop between the two tasks 7

  8. Outline • Problem Statement • Algorithm  Dataset and preprocessing  Alignment of Multiple Photo Streams  Large-scale Cosegmentation • Experiments • Conclusion 8

  9. Flickr Dataset Flickr dataset of 15 outdoor recreational activities • Experiments with more than 100K images of 1K photo streams • Larger than those of previous work by orders of magnitude # photo streams # of images ( 13,157 ) ( 1,514,976 ) S urfing H orse A ir RA fting YA cht B each R iding B allooning RO wing S cuba F ormula SN ow S afari D iving O ne boarding P ark M ountain R ock L ondon T our de F ly 9 C amping C limbing M arathon F rance F ishing

  10. Image Descriptor and Similarity Measure Image description • HSV color SIFT and HOG features on regular grid • L1 normalized spatial pyramid histogram using 300 visual words Image similarity measure : • (Our assumption) Segmentation enhances the image alignment. 2. Segmentation available 1. No segmentation available • Histogram intersection on SPH Histogram intersection on the • best assignment of segments Not robust against location/pose • changes 10

  11. Outline • Problem Statement • Algorithm  Dataset and preprocessing  Alignment of Multiple Photo Streams  Large-scale Cosegmentation • Experiments • Conclusion 11

  12. Alignment of Photo Streams Input: A set of photo streams (PS): P = { P 1,…, P L} Photo Stream: a set of photos taken in sequence by a single user • in a single day Idea: Align all photo streams at once after building K-NN graph • Naïve-Bayes Nearest Neighbor(NBNN) [Boiman et al. 08] for similarity metric P 1 P 2 P 3 P 4 12

  13. Alignment of Photo Streams Input: A set of photo streams (PS): P = { P 1,…, P L} Photo Stream: a set of photos taken in sequence by a single user • in a single day Idea: Align all photo streams at once after building K-NN graph • Naïve-Bayes Nearest Neighbor(NBNN) [Boiman et al. 08] for similarity metric For simplicity, first consider pairwise alignment of two photo streams P 3 Pairwise alignment P 1 P 2 P 4 13

  14. Pairwise Alignment f : P 1 → P 2 ∪ { ∅ } Goal of alignment: find a matching btw a pair of PS f ( I ) = ∅ • means I in P 1 has no match in P 2 . Optimization: MRF-based energy minimization • Flexibility: Various energy terms • Solved by discrete BP ∑ ∑ ∑ d ( I i , ˆ η min( t ( I i ) − t ( ˆ ρ min( t ( ˆ I i ) − t ( ˆ E ( P 1 , P 2 ) = + I i ), τ ) + I j ), ν ) I i ) Ii ∈ P 1 Ii ∈ P 1 ( Ii , I j ) ∈ δ 14

  15. Pairwise Alignment Objective function ∑ ∑ ∑ d ( I i , ˆ η min( t ( I i ) − t ( ˆ ρ min( t ( ˆ I i ) − t ( ˆ E ( P 1 , P 2 ) = + I i ), τ ) + I j ), ν ) I i ) Ii ∈ P 1 Ii ∈ P 1 ( Ii , I j ) ∈ δ Data term : The Smoothness term : The Time term : The matched image pairs matched images to matched image pairs should be visually neighbors in P 1 should be should be temporally similar. neighbors in P 2. similar. 9AM 6PM 10AM 15

  16. Alignment of Multiple Photo Streams Objective : MRF-based energy minimization ∑ E All = E ( P i , P j ) ( P i , P j ) ∈ Ξ : All pairs of NN photo streams Message-passing based optimization • until convergence or for fixed iterations P 3 Pairwise alignment Pairwise P 1 Pairwise alignment alignment Pairwise alignment P 2 P 4 16

  17. Alignment of Multiple Photo Streams Objective : MRF-based energy minimization ∑ E All = E ( P i , P j ) ( P i , P j ) ∈ Ξ : All pairs of NN photo streams Message-passing based optimization • until convergence or for fixed iterations P 3 P 1 P 2 P 4 17

  18. Outline • Problem Statement • Algorithm  Dataset and preprocessing  Alignment of Multiple Photo Streams  Large-scale Cosegmentation • Experiments • Conclusion 18

  19. Build an Image Graph Idea: Connect the images that are similar enough to be cosegmented Image Graph G = ( I , E ) • I : The set of images. E : The set of edges. • E = EB U Ew EB : Edges between different photo streams (results of alignment) EW : Edges within a photo stream t ( I ) − t ( I i ) ≤ δ For each image I , consider the images such that links I with the K-NN of I ( EW ). Ew EB 19

  20. Scalable Cosegmentation Iteratively run the MFC algorithm [Kim and Xing, 2012] on the image graph Review of MFC algorithm Cosegmentation: Jointly segment M images into K+1 regions • ( K foregrounds (FG) + background (BG)) Foreground Region Modeling Assignment Learn appearance Iterate Allocate the regions models of K FGs of image into one of • and BG • K FGs or BG Any region classifiers Very efficiently solve • or their combination using the idea of • combinatorial auction Ex. Gaussian mixture on • RGB, linear SVM on SPH 20

  21. Scalable Cosegmentation on Image Graph Message-passing based optimization • Learn FG Models from neighbors of Ii . • Run region assignment on Ii . Iteratively solve… FG 1 (car) FG 3 FG 2 (BG) (road) 21

  22. Scalable Cosegmentation on Image Graph Initialization Message-passing based optimization • Supervised: start from seed • Learn FG Models from neighbors of Ii . labels Unsupervised: use the algorithm • Run region assignment on Ii . • of CoSand [Kim et al. 2011]. Iteratively solve… FG 1 (car) FG 3 FG 2 (BG) (road) 22

  23. Outline • Problem Statement • Algorithm  Dataset and preprocessing  Alignment of Multiple Photo Streams  Large-scale Cosegmentation • Experiments • Conclusion 23

  24. Evaluation – Two Experiments Evaluation for Alignment Very hard to obtain groundtruth! • Correspondences btw two sets of thousands of images? Task: Temporal localization (inspired by geo-location estimation) Where is it likely to be taken? When are they likely to be taken? P 1 Timeline [Hays and Efros. 2008] Evaluation for cosegmentation Task: Foreground detection • We manually annotate 100 images per class i ∩ R Acc = GT i • Accuracy is measured by intersection-over-union i ∪ R GT 24 i

  25. Evaluation of Alignment Procedures of temporal localization 1. Given a set of photo streams , Training (80% ) randomly split training and test sets Test (20% ) 2. Run alignment 3. Estimate timestamps of all images in test photo streams 4. Temporal localization is correct if t gt − t st ≤ ε e Better temporal localization ≠ Better Alignment Baselines • BPS: Our Alignment + Cosegmentation Justify closing a loop • BP: Our alignment only • KNN: K-nearest neighbors – Image similarity only (the simplest) • HMM: Hidden Markov Models Popular multiple sequence alignment • DTW: Dynamic Time Windows 25

  26. Evaluation of Alignment ε = 60 min. • Temporal localization is correct if t gt − t st ≤ ε e BPS: Our Alignment + Cosegmentation BP: Our alignment only KNN: K-nearest neighbors HMM: Hidden Markov Models DTW: Dynamic Time Windows ε 26

  27. Evaluation of Cosegmentation Task: Foreground detection BP+MFC: (Proposed) Alignment + Cosegmentation MFC: Our cosegmentation without alignment Examples COS : Submodular optimization [Kim et al. ICCV11] LDA: LDA-based localization [Russell et al. CVPR06] 27

  28. Outline • Problem Statement • Algorithm  Dataset and preprocessing  Alignment of Multiple Photo Streams  Large-scale Cosegmentation • Experiments • Conclusion 28

  29. Conclusion Ultimate goal: building photo storylines from large-scale online images horse+riding safari+park 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend