Telefonica Research @ Trecvid 2011 Xavier Anguera, Daru Xu - PowerPoint PPT Presentation

Telefonica ¡Research ¡ @ ¡Trecvid ¡2011 ¡ Xavier ¡Anguera, ¡Daru ¡Xu 1 ¡and ¡Tomasz ¡ Adamek ¡ (With ¡the ¡collaboraBon ¡of ¡Juan ¡Manuel ¡ Barrios, ¡Prisma ¡Group) ¡ 1 Daru ¡Xu ¡is ¡a ¡graduate ¡student ¡at ¡the ¡Ming-‑Hsieh ¡Department ¡of ¡Electrical ¡Engineering, ¡ University ¡of ¡Southern ¡California, ¡USA ¡ ¡ ¡

Outline ¡of ¡the ¡talk ¡ • Telefonica ¡2011 ¡Video-‑copy ¡detecBon ¡system ¡ – Overall ¡system ¡ – Video-‑copy ¡detecBon ¡ – Audio-‑copy ¡detecBon ¡ – Fusion ¡algorithm ¡ – Results ¡ • MulB-‑systems ¡fusion ¡experiment ¡

MulBmodal ¡Video-‑copy ¡detecBon ¡

Video-‑based ¡System ¡ DART* ¡local ¡features ¡extracBon ¡ Video ¡query ¡ Inserted ¡staBc ¡ Key-‑frame ¡ DART* ¡ text ¡& ¡banners ¡ extracBon ¡ extracBon ¡ filtering ¡ SubBtle ¡ Temporal ¡stability ¡& ¡ Key-‑frame ¡ filtering ¡ scale ¡filtering ¡ matching ¡ ¡ Ref. ¡Video ¡ indexing ¡ info. ¡ Matched ¡video ¡ segments ¡ Differences ¡from ¡last ¡year: ¡ Temporal ¡ consistency ¡ ¡ • So]ware ¡refactoring ¡ post-‑processing ¡ • EliminaBon ¡of ¡temporary ¡files ¡ * ¡D. ¡Marimon, ¡A. ¡Bonnin, ¡T. ¡Adamek, ¡and ¡R ¡.Gimeno, ¡“DARTs:Efficient ¡scale-‑space ¡ extracBon ¡of ¡daisy ¡key-‑points”, ¡CVPR ¡2009. ¡

Audio-‑based ¡System ¡

MASK ¡fingerprint ¡extracBon ¡(I) ¡ 1) ¡Audio ¡track ¡extracBon ¡using ¡FFMPEG ¡

AcousBc ¡fingerprint ¡extracBon ¡(I) ¡ 1) ¡Audio ¡track ¡extracBon ¡using ¡FFMPEG ¡ 10ms, ¡100ms ¡window ¡ 2) ¡FFT, ¡bandwidth ¡ limited ¡to ¡ 300-‑3KHz ¡ 32 ¡MEL-‑spectrum ¡bands ¡

AcousBc ¡fingerprint ¡extracBon ¡(I) ¡ 1) ¡Audio ¡track ¡extracBon ¡using ¡FFMPEG ¡ 10ms, ¡100ms ¡window ¡ 2) ¡FFT, ¡bandwidth ¡ limited ¡to ¡ 3) ¡Find ¡spectrogram ¡ 300-‑3KHz ¡ peaks. ¡ 32 ¡MEL-‑spectrum ¡bands ¡

AcousBc ¡fingerprint ¡extracBon ¡(II) ¡ 4) ¡Apply ¡a ¡mask ¡in ¡each ¡ maxima ¡locaBon ¡

AcousBc ¡fingerprint ¡extracBon ¡(II) ¡ 5) ¡Construct ¡the ¡fingerprint ¡

MulBmodal ¡Fusion ¡Algorithm ¡ • Fusion ¡of ¡different ¡modaliBes ¡at ¡decision ¡level ¡ – AgnosBc ¡of ¡internal ¡system’s ¡behaviors ¡ • No ¡limit ¡on ¡the ¡number ¡of ¡systems ¡to ¡be ¡combined ¡ – provided ¡each ¡system ¡is ¡bejer ¡than ¡random ¡ • To ¡work ¡opBmally ¡it ¡needs ¡N-‑best ¡matches ¡from ¡each ¡ system. ¡It ¡returns ¡the ¡best ¡fused ¡matches ¡(N=20) ¡ – Makes ¡use ¡of ¡the ¡individual ¡scores ¡and ¡the ¡rank ¡within ¡ each ¡modality. ¡ Paper ¡on ¡ACM ¡MM ¡2011: ¡“MulBmodal ¡Fusion ¡for ¡Video ¡Copy ¡DetecBon”, ¡Xavier ¡Anguera, ¡ ¡ Juan ¡Manuel ¡Barrios, ¡Tomasz ¡Adamek ¡and ¡Nuria ¡Oliver ¡

Data ¡preprocessing ¡ Audio ¡scores ¡histogram ¡ Local ¡video ¡scores ¡histogram ¡ Global ¡video ¡scores ¡histogram ¡

N-‑best ¡flooring ¡and ¡L1 ¡NormalizaBon ¡(I) ¡ MScores ¡ MScores ¡ 0.7 ¡ 0.3 ¡ L1 ¡normalizaBon ¡ MScores ¡ MScores ¡ MScore i MScore i = 0.6 ¡ Nbest ! MScore j j = 1 0.3 ¡

N-‑best ¡flooring ¡and ¡L1 ¡NormalizaBon ¡(II) ¡ MScores ¡ MScores ¡ 0.7 ¡ 0.3 ¡ N-‑best ¡flooring ¡and ¡ ¡ L1 ¡normalizaBon ¡ MScores ¡ MScores ¡ MScore i MScore i = Nbest ! MScore j j = 1 0.3 ¡ 0.2 ¡ N-‑best ¡Flooring ¡

Overlapping ¡Segments ¡Merge ¡ Q Q B r E r Segment ¡Q ¡ + ¡ R R B r E r min { E Q k ( r ) } − max { B Q Segment ¡R ¡ k ( r ) , E R k ( r ) , B R k ( r ) } = ¡ > 0 . 5 E r max { E Q k ( r ) } − min { B Q B r k ( r ) , E R k ( r ) , B R k ( r ) } Merged ¡segment ¡ Examples: ¡ MuBmodal ¡ Missing ¡ ¡ Non-‑overlapping ¡ overlap ¡ modality ¡ modaliBes ¡

Output ¡score ¡computaBon ¡ Number ¡of ¡matches ¡ Rank ¡[1 ¡to ¡N k ] ¡ ResulBng ¡score ¡ Normalized ¡ for ¡fused ¡match ¡ matching ¡score ¡ at ¡rank ¡r ¡ A-‑priori ¡weight ¡for ¡ Best ¡normalized ¡ each ¡modality ¡ matching ¡score ¡for ¡ each ¡modality ¡

Official ¡evaluaBon ¡results ¡ OpBmum ¡scores, ¡balanced ¡profile: ¡ Profile ¡ Min ¡ FA ¡count ¡ Miss ¡ True ¡ Opt ¡F1 ¡ NDCR ¡ count ¡ posi:ves ¡ score ¡ Audio ¡system ¡ BALANCED ¡ 0.662 ¡ 0.66 ¡ 54.75 ¡ 54.78 ¡ 0.729 ¡ MulBmodal ¡ BALANCED ¡ 0.610 ¡ 0.80 ¡ 11.73 ¡ 63.69 ¡ 0.947 ¡ Joint ¡ BALANCED ¡ 0.268 ¡ 0.23 ¡ 4.71 ¡ 101.4 ¡ 0.957 ¡ Choosing ¡only ¡1 st -‑best ¡results: ¡ ¡ Profile ¡ Min ¡ FA ¡count ¡ Miss ¡ True ¡ Opt ¡F1 ¡ NDCR ¡ count ¡ posi:ves ¡ score ¡ Audio ¡system ¡ BALANCED ¡ 0.477 ¡ 0.14 ¡ 55.89 ¡ 72.05 ¡ 0.712 ¡

MulB-‑systems ¡fusion ¡experiment ¡ • We ¡tested ¡the ¡fusion ¡algorithm ¡with ¡many ¡ system ¡outputs ¡ • We ¡asked ¡parBcipants ¡in ¡TRECVID ¡2011 ¡for ¡their ¡ submijed ¡runs ¡ – 10 ¡teams ¡contributed ¡their ¡results: ¡PKU-‑IDM, ¡CRIM, ¡ INRIA-‑TEXMEX/LEAR, ¡FT, ¡prisma, ¡ATTLabs, ¡kddi, ¡iupr-‑ dti, ¡brno, ¡Telefonica ¡Research ¡ – I ¡used ¡the ¡“Balanced” ¡runs: ¡17 ¡runs ¡

Status ¡of ¡the ¡runs ¡ • The ¡fusion ¡algorithm ¡works ¡opBmally ¡when ¡ Nbest ¡results ¡are ¡available ¡for ¡each ¡fused ¡ output. ¡ – Results ¡for ¡the ¡used ¡systems ¡had ¡(many ¡Bmes) ¡ only ¡1best ¡results, ¡resulBng ¡subopBmal ¡for ¡the ¡ fusion. ¡

Individual ¡results ¡(Min ¡NDCR) ¡ 0.991 ¡ 1 ¡ 0.9 ¡ 0.8 ¡ Min_NDCR ¡ 0.7 ¡ 0.6 ¡ 0.5 ¡ 0.4 ¡ 0.3 ¡ 0.2 ¡ 0.053 ¡ 0.1 ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ 8 ¡ 9 ¡ 10 ¡ 11 ¡ 12 ¡ 13 ¡ 14 ¡ 15 ¡ 16 ¡ 17 ¡ • Labeled ¡from ¡1 ¡to ¡17, ¡to ¡anonymize ¡them. ¡

Individual ¡results ¡(opBmum ¡F1) ¡ 1 ¡ 0.95 ¡ 0.9 ¡ Min_NDCR ¡ 0.85 ¡ 0.8 ¡ 0.75 ¡ 0.7 ¡ 0.65 ¡ 0.6 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ 8 ¡ 9 ¡ 10 ¡ 11 ¡ 12 ¡ 13 ¡ 14 ¡ 15 ¡ 16 ¡ 17 ¡

Incremental ¡fusion ¡ 0.16 ¡ 0.14 ¡ 0.12 ¡ Min_NDCR ¡ 0.1 ¡ Min ¡NDCR ¡ 0.08 ¡ 0.0532 ¡ 0.06 ¡ 0.0333 ¡ 0.04 ¡ 0.02 ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ 8 ¡ 9 ¡ 10 ¡ 11 ¡ 12 ¡ 13 ¡ 14 ¡ 15 ¡ 16 ¡ 17 ¡ Number ¡of ¡systems ¡ • We ¡incrementally ¡added ¡systems ¡and ¡computed ¡the ¡fusion ¡ ¡ • Systems ¡5 ¡and ¡15 ¡are ¡the ¡only ¡ones ¡making ¡the ¡fusion ¡worse ¡ • Final ¡Min_NDCR=0.0333 ¡

Fusion ¡of ¡all ¡minus ¡1 ¡ 0.05 ¡ 0.045 ¡ 0.04 ¡ Min_NDCR ¡ 0.035 ¡ 0.03 ¡ Baseline ¡(fusion ¡of ¡all) ¡ 0.025 ¡ 0.02 ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ 8 ¡ 9 ¡ 10 ¡ 11 ¡ 12 ¡ 13 ¡ 14 ¡ 15 ¡ 16 ¡ 17 ¡ We ¡obtain ¡an ¡order ¡from ¡worse ¡to ¡best ¡in ¡the ¡ fusion ¡(worse ¡in ¡here ¡is ¡system ¡15) ¡

Incremental ¡eliminaBon ¡ 0.45 ¡ 0.4 ¡ 0.35 ¡ Min_NDCR ¡ 0.3 ¡ 0.25 ¡ 0.2 ¡ 0.15 ¡ 0.1 ¡ 0.0333 ¡ 0.0685 ¡ 0.0195 ¡ 0.05 ¡ 0 ¡ 0 ¡ 15 ¡ 16 ¡ 11 ¡ 7 ¡ 6 ¡ 14 ¡ 3 ¡ 4 ¡ 9 ¡ 10 ¡ 2 ¡ 17 ¡ 13 ¡ 5 ¡ 1 ¡ 8 ¡ • With ¡only ¡5 ¡systems ¡we ¡achieve ¡prejy ¡decent ¡results ¡ • The ¡best ¡result ¡is ¡0.0195, ¡although ¡this ¡is ¡“cheaBng” ¡

Conclusions ¡ • The ¡fusion ¡algorithm ¡can ¡extract ¡knowledge ¡ and ¡make ¡results ¡bejer ¡ – Even ¡if ¡fusing ¡systems ¡which ¡have ¡weaker ¡NDCR ¡ results, ¡the ¡fusion ¡results ¡in ¡good ¡scores. ¡ • FUTURE ¡WORK: ¡automaBcally ¡idenBfy ¡which ¡ modaliBes ¡bring ¡novelty. ¡

Telefonica Research @ Trecvid 2011 Xavier Anguera, Daru Xu - PowerPoint PPT Presentation

Telefonica Research @ Trecvid 2011 Xavier Anguera, Daru Xu 1 and Tomasz Adamek (With the collaboraBon of Juan Manuel Barrios, Prisma Group) 1 Daru Xu

Leveraging ARchitect to support Telefonica AR way of life Gemma Liaez, Telefonica Industry

Recommender Systems Alexandros Karatzoglou Research Scientist @ Telefonica Research, Barcelona

Roberto Rodriguez 5G TELEFONICA TRIALS AND 5G FIRST EXPERIENCES GSMA CITEL Seminar in WRC-19

Its time to execute Andrs Padilla New IOT Businesses Director Why does Telefonica believe

The abstract art of composing SDN applications Pedro A. Aranda Telefonica

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

Scaling Social Nets Pablo Rodriguez Telefonica Research, Barcelona Social Networks: Rapid Growth

AN INTRODUCTION . Wessel Kraaij TNO, Radboud University Nijmegen Paul Over NIST TRECVID

Telefonica Research Mul1modal Video copy detec1on Xavier Anguera,

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Kernel Spectrogram Models for source separation Antoine Liutkus 1 , Zafar Rafii 2 , Bryan Pardo 2

Audio Data Representations Juhan Nam Types of Music Data Audio MP3, WAV Score

E9 205 Machine Learning for Signal Processing Non-negative Matrix Factorization 16-09-2019 Audio

CSE 562: Mobile Systems & Applications Quals Course Systems Area Shyam Gollakota First

Semi-Supervised Adversarial Audio Source Separation applied to Singing Voice Extraction Daniel

DNN Based TTS Systems TTS Architecture: Traditional Pipeline Typical statistical parametric

Two-photon laser spectroscopy of antiprotonic helium and the antiproton-electron mass ratio

Sambuz

Useful Links

Newsletter

Mail Us

Telefonica Research @ Trecvid 2011 Xavier Anguera, Daru Xu - PowerPoint PPT Presentation

Telefonica Research @ Trecvid 2011 Xavier Anguera, Daru Xu 1 and Tomasz Adamek (With the collaboraBon of Juan Manuel Barrios, Prisma Group) 1 Daru Xu

Leveraging ARchitect to support Telefonica AR way of life Gemma Liaez, Telefonica Industry

Recommender Systems Alexandros Karatzoglou Research Scientist @ Telefonica Research, Barcelona

Roberto Rodriguez 5G TELEFONICA TRIALS AND 5G FIRST EXPERIENCES GSMA CITEL Seminar in WRC-19

Its time to execute Andrs Padilla New IOT Businesses Director Why does Telefonica believe

The abstract art of composing SDN applications Pedro A. Aranda Telefonica

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen &amp; Alex Hauptmann School of Computer Science

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

Scaling Social Nets Pablo Rodriguez Telefonica Research, Barcelona Social Networks: Rapid Growth

AN INTRODUCTION . Wessel Kraaij TNO, Radboud University Nijmegen Paul Over NIST TRECVID

Telefonica Research Mul1modal Video copy detec1on Xavier Anguera,

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Kernel Spectrogram Models for source separation Antoine Liutkus 1 , Zafar Rafii 2 , Bryan Pardo 2

Audio Data Representations Juhan Nam Types of Music Data Audio MP3, WAV Score

E9 205 Machine Learning for Signal Processing Non-negative Matrix Factorization 16-09-2019 Audio

CSE 562: Mobile Systems &amp; Applications Quals Course Systems Area Shyam Gollakota First

Semi-Supervised Adversarial Audio Source Separation applied to Singing Voice Extraction Daniel

DNN Based TTS Systems TTS Architecture: Traditional Pipeline Typical statistical parametric

Two-photon laser spectroscopy of antiprotonic helium and the antiproton-electron mass ratio

Sambuz

Useful Links

Newsletter

Mail Us

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

CSE 562: Mobile Systems & Applications Quals Course Systems Area Shyam Gollakota First