 
              Understand Basketball Games 2018.6.15 吴浩贤 朱⽂斈韬
Sports Videos Large quantity, high quality Practical utility Stereotypical
Sports Videos Stereotypical:
Sports Videos Stereotypical:
Sports Videos Stereotypical: [Pass, Score(2-Pointer)] [Pass, Pass, Score(3-Pointer)] [Pass, Pass, Score(3-Pointer)] [Pass, Score(2-Pointer)]
Sports Videos Recognition
Google Basketball Dataset ‣ 100+ GB Video on Youtube ‣ 250+ NCAA Basketball games from 1988 to 2011 ‣ 14,000+ Event annotations (Endpoints) ‣ Player bounding boxes (Optional) ‣ Event classes: 11 → 7 ‣ Free Throw Made/Miss ‣ 2-pointer Made/Miss ‣ 3-Pointer Made/Miss ‣ Steal http://basketballattention.appspot.com/dataset_browser.html Detecting events and key actors in multi-person videos, Vignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija, Alexander Gorban, Kevin Murphy, Li Fei-Fei, CVPR 2016, https://arxiv.org/abs/1511.02917
Google Basketball Dataset Challenges: ‣ Low resolution & noisy ‣ Imbalanced categories ‣ Variant person number in a frame http://basketballattention.appspot.com/dataset_browser.html Detecting events and key actors in multi-person videos, Vignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija, Alexander Gorban, Kevin Murphy, Li Fei-Fei, CVPR 2016, https://arxiv.org/abs/1511.02917
Basic Idea Long-term Recurrent Convolutional Networks Event Label Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR2015 Je ff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell
Basic Idea 🏁 ⛹ Event Label
Basic Idea Detection and/or Tracking 🏁⛹
Ball Detection In your imagination…
Ball Detection In dataset…
Ball Detection 球的颜⾊艳,形状相对固定,考虑传统⽅斺法 Canny 边缘检测 +Hough 变换应⽤甩于曲线检测 (x-c 1 ) 2 +(y-c 2 ) 2 = r 2 图像⼆亍值化, Canny 边缘检测 , 在边缘像素点( x,y )上枚举 c1,c2 累加在像素点 (x,y) 下的三元组( c1,c2,r) ,检测圆形
快速运动中球的形变,⾊艳变;复杂背景下多个候选圆⽬盯 标。 Ball Detection
Ball Detection •
Ball Detection YOLO = You Only Look Once
Ball Detection YOLO = You Only Look Once
Model Architecture 🏁 ⛹ Event Label
Feature Extraction Frame Feature: CNN (2048,) ResNet (no top)
Feature Extraction Player Feature: CNN (512,) for player VGG19 (no top) spatial histogram with pyramid (1365,) for player Concat Weighted (1877,) for player
LSTM (2048,) frame feature … (1877,) player feature
Model Architecture In a clip Trajectory Extra Constant Vector As Context
Model Architecture 🏁 In a clip 🏁 Trajectory ⛹ Extra constant vector as context of sequence
Model Architecture 🏁 🏁 🏁 🏁 In a clip 🏁 Trajectory ⛹ Extra constant vector as context of sequence
Model Optimization Bidirectional LSTM: Compute a global(clip-level) context feature for each frame
Model Optimization Next we use a unidirectional LSTM with extra input to represent the state of the event at time t
Model Optimization Gradient Clipping Gradient Explode ❌ Clip the gradient before parameter update I Goodfellow, The cli ff Y Bengio, Deep Learning
Results Spatial Combined Combined LRCN Only (Context) (Concat) Top_1_acc 0.44 0.35 0.47 0.41 Top_2_acc 0.69 0.59 0.70 0.62
Thank You
Recommend
More recommend