‐ On Seeing Stuff: The Perception of Materials by Humans and Machines,
By Adelson
‐ Semantic Texton Forests for Image Categorization and Segmentation,
By Shotton et al.
Presented by Mani Golparvar‐Fard
4/9/2009 1 CS598 ‐ Visual Scene Understanding
On Seeing Stuff: The Perception of Materials by Humans and Machines, - - PowerPoint PPT Presentation
On Seeing Stuff: The Perception of Materials by Humans and Machines, By Adelson Semantic Texton Forests for Image Categorization and Segmentation, By Shotton et al. Presented by Mani Golparvar Fard 4/9/2009 CS598 Visual Scene
4/9/2009 1 CS598 ‐ Visual Scene Understanding
4/9/2009 2 CS598 ‐ Visual Scene Understanding
Concrete Foundation Wall
4/9/2009 3 CS598 ‐ Visual Scene Understanding
4/9/2009 4 CS598 ‐ Visual Scene Understanding
Different illumination and viewing directions
Plaster‐a
Crumpled Paper
Concrete Plaster‐b (zoomed)
Source: Leung and Malik, ICCV '99, Corfu, Greece
4/9/2009 5 CS598 ‐ Visual Scene Understanding
4/9/2009 CS598 ‐ Visual Scene Understanding 6
As‐planned Material
Under Progress Material Other Material
Materials Database (Concrete, Forms, Steel, etc.)
Check Material Process/Result Schedule Information
WorkAwaitingQuality Management WorkReleased WorktoDo WorkAwaitingRFIReply WorkRate RequestFor InformationRate UPChange AccomodateRate InitialWork IntroduceRate WorkRelease Rate WorkPendingduetoUP Change PendingWork ReleaseRate. UPAction RequestRate. ReprocessRequeston WorkReleasedRate. ReprocessRequeston WorknotReleasedRate. WorkAwaitingQuality Management WorkReleased WorktoDo WorkAwaitingRFIReply WorkRate RequestFor InformationRate UPChange AccomodateRate InitialWork IntroduceRate WorkRelease Rate WorkPendingduetoUP Change PendingWork ReleaseRate. UPAction RequestRate. ReprocessRequeston WorkReleasedRate. ReprocessRequeston WorknotReleasedRate. Upstream DownstreamCheck Time
Material‐Based Image Retrieval Engine
Relevancy to concrete: 96%
4/9/2009 7 CS598 ‐ Visual Scene Understanding
4/9/2009 CS598 ‐ Visual Scene Understanding 8
Photographed in the Same room with the same lighting
4/9/2009 9 CS598 ‐ Visual Scene Understanding
Shiny sphere (with and without specularities), generated by computer graphics Visual cues tell more than Optical Qualities – Maybe mechanic property of material?
4/9/2009 10 CS598 ‐ Visual Scene Understanding
Blobs of Hand cream vs. Cheese cream
4/9/2009 11 CS598 ‐ Visual Scene Understanding
Initial State Intrinsic mechanics Extrinsic mechanics shape Intrinsic
Extrinisic
4/9/2009 CS598 ‐ Visual Scene Understanding 12
4/9/2009 13 CS598 ‐ Visual Scene Understanding
4/9/2009 14 CS598 ‐ Visual Scene Understanding
– We expect these properties to manifest themselves in the Specular reflections
4/9/2009 15 CS598 ‐ Visual Scene Understanding
‐ A grassfire algorithm was used to compute distance from the contour, and then apply a smoothing algorithm
4/9/2009 16 CS598 ‐ Visual Scene Understanding
4/9/2009 17 CS598 ‐ Visual Scene Understanding
4/9/2009 CS598 ‐ Visual Scene Understanding 18 18
Material‐Based Image Retrieval Engine
As‐planned Material
Under Progress Material Other Material
Materials Database (Concrete, Forms, Steel, etc.)
Check Material Process/Result Schedule Information
WorkAwaitingQuality Management WorkReleased WorktoDo WorkAwaitingRFIReply WorkRate RequestFor InformationRate UPChange AccomodateRate InitialWork IntroduceRate WorkRelease Rate WorkPendingduetoUP Change PendingWork ReleaseRate. UPAction RequestRate. ReprocessRequeston WorkReleasedRate. ReprocessRequeston WorknotReleasedRate. WorkAwaitingQuality Management WorkReleased WorktoDo WorkAwaitingRFIReply WorkRate RequestFor InformationRate UPChange AccomodateRate InitialWork IntroduceRate WorkRelease Rate WorkPendingduetoUP Change PendingWork ReleaseRate. UPAction RequestRate. ReprocessRequeston WorkReleasedRate. ReprocessRequeston WorknotReleasedRate. Upstream DownstreamCheck Time Relevancy to forms: 94% Concrete Rejections: 20%
4/9/2009 CS598 ‐ Visual Scene Understanding 19
[shotton‐eccv‐08] [shotton‐cvpr‐06]
4/9/2009 20 CS598 ‐ Visual Scene Understanding
4/9/2009 CS598 ‐ Visual Scene Understanding 21
– Compute feature descriptors – Cluster – Nearest‐neighbor assignment
– Inference always a bottle‐neck
4/9/2009 22 CS598 ‐ Visual Scene Understanding
4/9/2009 23 CS598 ‐ Visual Scene Understanding
Daniel Munoz’s slide at CMU
4/9/2009 24 CS598 ‐ Visual Scene Understanding
4/9/2009 CS598 ‐ Visual Scene Understanding 25
Slide from CLSP, Johns Hopkins University
α α α α α α β β β β β T1 T2 T3
CS598 ‐ Visual Scene Understanding 26 4/9/2009
4/9/2009 27 CS598 ‐ Visual Scene Understanding
4/9/2009 28 CS598 ‐ Visual Scene Understanding
Daniel Munoz’s slide at CMU
Potential Features
(1) Its value in a color channel (CIELab) (2) The sum of two points in the patch (3) The difference of two points in the patch (4) The absolute difference of two points in the patch
training data
4/9/2009 29 CS598 ‐ Visual Scene Understanding
Daniel Munoz’s slide at CMU
– Take random subset of training data – Generate random features f from above – Generate random threshold t – Split data into left Il and right Ir subsets according to – Repeat for each side
–
This feature maximizes information gain
4/9/2009 30 CS598 ‐ Visual Scene Understanding
patches from the training data that fell into that leaf.
4/9/2009 31 CS598 ‐ Visual Scene Understanding
– MSRC‐21 dataset
– Increase one bin in the histogram at a time
– Increase multiple bins in the histogram at a time
4/9/2009 32 CS598 ‐ Visual Scene Understanding
1) Average leaf histograms in region r together P(c|r)
2) Create hierarchy histogram of node counts Hr(n) visited in the tree for each classified pixel in region r
to match
4/9/2009 33 CS598 ‐ Visual Scene Understanding
Daniel Munoz’s slide at CMU
4/9/2009 34 CS598 ‐ Visual Scene Understanding
Level 0
Slides from Grauman’s ICCV talk
4/9/2009 35 CS598 ‐ Visual Scene Understanding
Level 1
Slides from Grauman’s ICCV talk
4/9/2009 36 CS598 ‐ Visual Scene Understanding
Level 2
Slides from Grauman’s ICCV talk 4/9/2009 37 CS598 ‐ Visual Scene Understanding
– Using histogram matching approach – End result is an Image‐level Prior
– Unfair? RBF uses only leaf‐level counts, PMK uses entire histogram
– Kc = An idea to account for unbalanced classes
4/9/2009 38 CS598 ‐ Visual Scene Understanding
with neighboring regions’ classes
– Use different weak features:
– Shape‐filters learn: cow is adjacent to green‐like texture – Segmentation forest learn: cow is adjacent to grass
– Convert SVM decision to probability
4/9/2009 39 CS598 ‐ Visual Scene Understanding
Daniel Munoz’s slide at CMU
4/9/2009 CS598 ‐ Visual Scene Understanding 40
distributions (34.5%) this shows the power of the localized BoSTs that exploit semantic context.
adding invariance.
STFs allow good segmentations.
4/9/2009 41 CS598 ‐ Visual Scene Understanding
27- TextonBoost, Shotton et al. 2007 32 – Verbeek and Triggs – Classification with markow field aspect models, cvpr 2007
4/9/2009 42 CS598 ‐ Visual Scene Understanding
4/9/2009 CS598 ‐ Visual Scene Understanding 43
4/9/2009 CS598 ‐ Visual Scene Understanding 44
4/9/2009 CS598 ‐ Visual Scene Understanding 45
4/9/2009 CS598 ‐ Visual Scene Understanding 46
4/9/2009 CS598 ‐ Visual Scene Understanding 47
4/9/2009 CS598 ‐ Visual Scene Understanding 48
4/9/2009 CS598 ‐ Visual Scene Understanding 49
– Simple concept – Good result – Works fast (testing and training)
– Difficult to understand – Low‐resolution classification
– Test‐time inference is dependent on amount of training
– Many “Implementation Details”.
4/9/2009 50 CS598 ‐ Visual Scene Understanding
Partly based on Daniel Munoz’s slide at CMU
– I have been to the demo show of the semantic texton forests at CVPR 2008. It was very cool. It could recognize and segment objects in real time and with reasonable accuracy. Random forests is a powerful and efficient tool, even for such a low level feature representation.
– For classification, they are using nonlinear kernels, which make it difficult to generalize to training on large amount of data in reality.
– Upon inspection of the segmentation performance results for the background class in Pascal VOC 2007, the "image level prior" decreases performance significantly. Ideally, this prior should be used to suppress classes that the image wide statistics don't support. One would expect the background to appear in almost all images, and since modeling a background model is difficult, perhaps this prior could be excluded from the background predictor.
4/9/2009 CS598 ‐ Visual Scene Understanding 51
distributions of class labels), even with the normalization, won't some trees be better at identifying some classes over others? Why average then? Why not weight the output P(C|L) with the confidence in predicting that class label.
them to ensure they do not overfit. In the trees here there is obviously lot of variance. Since the splits made at each stage necessarily increase the "purity" of the children nodes I wonder if there is a danger of overfitting the data, i.e. the decision rules/thresholds chosen may not translate well to new novel examples.
viewpoint and appearance from natural categories. If we have more black dogs than black cats in our training won't it infer that black patches => high likelihood of dogs vs. cats?
bother accumulate statistics at node n across all trees? Don't they represent different things? It doesn't make sense to me.
4/9/2009 CS598 ‐ Visual Scene Understanding 52