Xiao CHU ( ) Supervisor: Xiaogang Wang The Chinese University of - - PowerPoint PPT Presentation

xiao chu
SMART_READER_LITE
LIVE PREVIEW

Xiao CHU ( ) Supervisor: Xiaogang Wang The Chinese University of - - PowerPoint PPT Presentation

Xiao CHU ( ) Supervisor: Xiaogang Wang The Chinese University of Hong Kong 4 th year Ph.D. student Computer vision, Human pose estimation 1. Structured Feature Learning for Pose Estimation Xiao Chu , Wanli Ouyang,


slide-1
SLIDE 1

Xiao CHU (初晓)

  • Supervisor: Xiaogang Wang
  • The Chinese University of Hong Kong
  • 4th year Ph.D. student
  • Computer vision, Human pose estimation
  • 1. Structured Feature Learning for Pose Estimation

Xiao Chu, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang, CVPR, 2016

  • 2. CRF-CNN: Modelling Structured Information in Human Pose Estimation

Xiao Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang, NIPS, 2016.

slide-2
SLIDE 2

Structured Feature Learning for Human Pose Estimation

Xiao Chu, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang

slide-3
SLIDE 3

Human Pose estimation is estimate the joint location of each body part.

slide-4
SLIDE 4

Structured Prediction

CNN + Structured Prediction

  • Tompson et al., NIPS'2014
  • Chen&Yuille, NIPS'2014
  • Yang et al., CVPR’2016
  • Fan et al., CVPR’2015

Structured Feature

  • 1. Build up structure

at feature level

  • 2. Pass message with

geometrical transfer kernel

  • 3. Bidirectional tree
slide-5
SLIDE 5

VGG

𝑑𝑝𝑜𝑤1~6\{𝑞𝑝𝑝𝑚4,5} Input Image 448 × 448 Fully Convolutional layers 1 × 1 kernel Prediction 56 × 56

Fully convolutional net for Human pose estimation

𝑔𝑑𝑝𝑜𝑤7

0.9 0.02 Head Neck Wrist

slide-6
SLIDE 6

VGG

𝑑𝑝𝑜𝑤1~6\{𝑞𝑝𝑝𝑚4,5}

Fully convolutional net for Human pose estimation Exclusive Consistent

e1 e2 e3 e4 e5 e6 e7 h1 h2 h3 h4 h5 h6 h7 e1 e2 e3 e4 e5 e6 e7 h1 h2 h3 h4 h5 h6 h7

e5 e4 h2 h6

slide-7
SLIDE 7

VGG

𝑑𝑝𝑜𝑤1~6\{𝑞𝑝𝑝𝑚4,5}

Structured Prediction Structured Feature

slide-8
SLIDE 8

VGG

𝑑𝑝𝑜𝑤1~6\{𝑞𝑝𝑝𝑚4,5} A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10

Structured Feature Learning

Positive Direction Revert Direction

slide-9
SLIDE 9

Input image Feature maps for downward lower arm

ℎ𝑛

Feature maps for elbow

𝑓𝑜

Learned kernel Shifted feature maps

Updated feature maps for elbow

slide-10
SLIDE 10

Experimental Results on FLIC dataset

84.4 52.1 68.3 93.7 80.9 87.3 94.7 82.8 88.8 97 86.8 91.9 97.9 92.4 95.2 U.ARMS L.ARMS MEAN

MODE[1] Tompson et al. [2] Tompson et al. [3] Chen&Yuille [4] Ours

Percentage of Correct Parts (strict PCP)

fi fl fi fl ⇥ ⌦ fi fi − − − − fi fi ⇥ ⇥ fi “ Fr am es C i nem a” “ Leeds Poses”

Percentage of Detected joints (PDJ)

slide-11
SLIDE 11

Experimental Results on LSP dataset

80.9 74.9 46.5 67.1 60.7 55.7 82.9 79.3 56 39.8 70.3 67 62.8 87.5 78.1 54.2 33.9 75.7 68 62.9 86.2 80.1 56.5 37.4 74.3 69.3 64.3 85.8 83.1 63.3 46.6 76.5 72.2 68.6 88.7 85.1 61.8 45 78.9 73.2 69.2 92.7 87.8 69.2 55.4 82.9 77 75 95.4 89.6 77 65.2 87.6 83.2 81.1

TORSO HEAD U.ARMS L.ARMS U.LEGS L.LEGS MEAN Andriluka et al. [5] Yang&Ramanan [6] Pishchulin et al. [7] Eichner&Ferrari et al.[8] Ouyang et al. [9] Pishchulin et al. [10] Chen&Yuille[4] Ours

Percentage of Correct Parts (strict PCP)

6.1%

slide-12
SLIDE 12

Robust to disturbance Robust to occlusion

Pose estimation results on the FLIC dataset

slide-13
SLIDE 13

Correct reasoning on extreme poses.

Pose estimation results on the LSP dataset

slide-14
SLIDE 14

Xiao Chu, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang

CRF-CNN: Modeling Structured Information in Human Pose Estimation

slide-15
SLIDE 15

Structure

  • 1. Making CNN deeper with advanced structured design
  • 2. Build up structure at feature level or prediction level

… … … … … …

fi fl

fi fi fi fi ⇥ ⇥ ⇥ ⇥ ⇥

⇥ ⇥

⌦ ⌦ fl fi fi

fi

fi fi ⇥ fi ⇥ fl fi

Lower Arm: Upper Arm: Elbow: Wrist:

fi

fl fi − fi fi φ ✓ φ ✓ ✓ fi

fi

fi fi

FCN CRF-RNN

fi fi fi fi fi R N N ’ C R F- R N N ’ fi fi fi fi

fi

We need a graphical model at feature level to guide the design

  • f structured feature

Tree-Structured graphical model Conditional Random field solved with mean field approximation

slide-16
SLIDE 16

𝒊 𝒜 𝑱

… … … … … …

𝜁𝑨ℎ 𝜁𝑨 𝜁ℎ

… …

(a) Multi-layer neural network (b) Structured

  • utput space

(c) Structured hidden layer (d) Our implementation 𝐹𝑜 𝐴, 𝐢, 𝐉, Θ =

(𝑗,𝑙)∈𝜁𝑨ℎ

𝜔𝑨ℎ(𝐴𝑗, ℎ𝑙) +

𝑙

∅ℎ(ℎ𝑙, 𝐉) Model (a) 𝐹𝑜 𝐴, 𝐢, 𝐉, Θ =

(𝑗,𝑘)∈𝜁𝑨 𝑗<𝑘

𝜔𝑨(𝐴𝑗, 𝒜𝑘) +

(𝑗,𝑙)∈𝜁𝑨ℎ

𝜔𝑨ℎ(𝐴𝑗, ℎ𝑙) +

𝑙

∅ℎ(ℎ𝑙, 𝐉) Model (b)

𝐹𝑜 𝐴, 𝐢, 𝐉, Θ =

(𝑗,𝑘)∈𝜁𝑨 𝑗<𝑘

𝜔𝑨(𝐴𝑗, 𝒜𝑘) +

(𝑙,𝑚)∈𝜁ℎ 𝑙<𝑚

𝜔𝑨(ℎ𝑙, ℎ𝑚) +

(𝑗,𝑙)∈𝜁𝑨ℎ

𝜔𝑨ℎ(𝐴𝑗, ℎ𝑙) +

𝑙

∅ℎ(ℎ𝑙, 𝐉)

Model (c)

𝐹𝑜 𝐴, 𝐢, 𝐉, Θ =

(𝑙,𝑚)∈𝜁ℎ 𝑙<𝑚

𝜔𝑨(ℎ𝑙, ℎ𝑚) +

(𝑗,𝑘)∈𝜁𝑨 𝑗<𝑘

𝜔𝑨(𝐴𝑗, 𝒜𝑘) +

𝑗

𝜔𝑨ℎ(𝐴𝑗, ℎ𝑗) +

𝑙

∅ℎ(ℎ𝑙, 𝐉)

Model (d)

slide-17
SLIDE 17

𝑅 𝐢𝑗 𝐉, Θ = 1 𝑨ℎ,𝑗 𝑓𝑦𝑞 −

ℎ𝑙∈𝐢𝑗

∅ℎ ℎ𝑙, 𝐉 −

(𝑗,𝑘)∈𝛇ℎ 𝑗<𝑘

𝜒ℎ(𝐢𝑗, 𝑅(𝐢𝑘|𝐉, Θ)

Mean Field Approximation

𝑞 𝐢 𝐉, Θ =

𝑗

𝑅(𝐢𝑗|𝐉, Θ)

Target

slide-18
SLIDE 18

ℎ3 ℎ2 ℎ1 ℎ4 𝑔

𝑐

𝑔

𝑏

𝑔

𝑑

ℎ1

′ < −ℎ2

< −ℎ4 < −ℎ5

Flooding update

𝑔

𝑒

ℎ2

′ < −ℎ1

< −ℎ3 < −ℎ4 ℎ3

′ < −ℎ2

ℎ4

′ < −ℎ1

< −ℎ2

𝑅𝑢+1 𝐢𝑗 = 𝜐 ∅ 𝐢𝑗 +

𝑗′∈𝒲𝑂(𝑗)\i

𝑅𝑢 𝐢𝑗′ ⨂𝐱𝑗′→𝑗

ℎ5

ℎ5

′ < −ℎ1

slide-19
SLIDE 19

ℎ3 ℎ2 ℎ1 ℎ4 𝑔

𝑐

𝑔

𝑏

𝑔

𝑑

ℎ1 → ℎ2

Serial update

slide-20
SLIDE 20

ℎ3 ℎ2 ℎ1 ℎ4 𝑔

𝑐

𝑔

𝑏

𝑔

𝑑

ℎ1 → ℎ2 ℎ4 → ℎ2

Serial update

slide-21
SLIDE 21

ℎ3 ℎ2 ℎ1 ℎ4 𝑔

𝑐

𝑔

𝑏

𝑔

𝑑

ℎ1 → ℎ2 ℎ4 → ℎ2 ℎ2′ → ℎ3

ℎ3 is marginalized.

Serial update

slide-22
SLIDE 22

ℎ3 ℎ2 ℎ1 ℎ4 𝑔

𝑐

𝑔

𝑏

𝑔

𝑑

ℎ1 → ℎ2 ℎ4 → ℎ2 ℎ2′ → ℎ3

ℎ3 is marginalized.

ℎ3 → ℎ2′

ℎ2 is marginalized.

Serial update

slide-23
SLIDE 23

ℎ3 ℎ2 ℎ1 ℎ4 𝑔

𝑐

𝑔

𝑏

𝑔

𝑑

ℎ1 → ℎ2 ℎ4 → ℎ2 ℎ2′ → ℎ3

ℎ3 is marginalized.

ℎ3 → ℎ2

′ ′ ℎ2 is marginalized. ℎ2

′′ \ℎ1 −> ℎ1

ℎ2

′′ \ℎ4 −> ℎ4

ℎ1 and ℎ4 is marginalized.

Serial update

slide-24
SLIDE 24

CVPR’16 V.S. NIPS’16

A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10

Positive Direction Revert Direction 1 path

slide-25
SLIDE 25

92.7 87.8 69.2 55.4 82.9 77 75 96.5 83.1 78.8 66.7 88.7 81.7 81.1 95.4 89.6 76.9 65.2 87.6 83.2 81.1 96 91.3 80 67.1 89.5 85 83.1 TORSO HEAD

  • U. ARMS

L.ARMS U.LEGS L.LEGS MEAN

RESULTS ON LSP (PCP)

Chen&Yuille NIPS'2014 Yang et al. CVPR'2016 Chu et al. CVPR'2016 Ours

slide-26
SLIDE 26

93.5 86.7 73 59.8 83.7 79 77.1 94 88.2 74.4 62.1 84.3 80 78.4 95.5 88.9 75.9 63.8 87.1 81.4 80.1 96 91.3 80 67.1 89.5 85 83.1 TORSO HEAD U.ARMS L.ARMS U.LEGS L.LEGS MEAN

COMPONENT ANALYSIS (PCP)

Flooding-2itrs-tree Flooding-2itrs-loopy Serial-tree(ReLU) Serial-tree(Softmax)

slide-27
SLIDE 27

(a) (a) (b) (c) (b) (c)

(a) Flooding-2itr-tree (b) Flooding-2itr-loopy (c) Final model

slide-28
SLIDE 28
slide-29
SLIDE 29

Thank you!

slide-30
SLIDE 30

Conditional Random Field

𝑞 𝐴 𝐉, Θ =

𝑞(𝐴, 𝐢|𝐉, Θ) 𝑞 𝐴, 𝐢 𝐉, Θ = 𝑓−𝐹𝑜(𝐴,𝐢,𝐉,Θ) 𝑨∈𝒶,ℎ∈ℋ 𝑓−𝐹𝑜(𝐴,𝐢,𝐉,Θ) Where,