7 video databases
play

7. Video databases Video data representations Video = time-ordered - PowerPoint PPT Presentation

7. Video databases Video data representations Video = time-ordered sequence of correlated images ( frames ) Video signal representations originate from TV technology; different standards in USA (NTSC) and Europe (PAL, SECAM) 25-30


  1. 7. Video databases Video data representations � Video = time-ordered sequence of correlated images ( frames ) � Video signal representations originate from TV technology; different standards in USA (NTSC) and Europe (PAL, SECAM) � 25-30 frames/sec � Interlaced presentation of even/odd rows to avoid flickering. � Frame size levels: 352 x 240, 768 x 576 (PAL), 720 x 576 (CCIR 601), 720 x 480 (NTSC), 1440 x 1152, 1920 x 1080 (HDTV) � Aspect ratios: 4:3, 16:9 (widescreen) � Color videos: Decomposition into luminance and chrominance. � Typical sampling rates for SD video: 720 samples per line for luminance, 360 samples per line for chrominance signals. MMDB-7 J. Teuhola 2012 168

  2. Video compression � Not just coding of a sequence of images ( � Motion-JPEG), because the subsequent images are correlated (temporal redundancy ) . � Motion compensation : blocks (e.g. 8 x 8 pixels) in a frame are predicted by blocks in a previously reconstructed frame. � Compression artifacts disturbing the human eye may be different from those in still images. � Different techniques for different application areas (tv, dvd/bd, internet, videoconferencing) � Important issues: � Speed of compression/decompression � Robustness (error sensitivity) � Most of the standards are based on DCT (Discrete Cosine Transform) � Typical compression ratios from 50:1 to 100:1; the decompressed video is almost indistinguishable from the original. MMDB-7 J. Teuhola 2012 169

  3. Standardization of video compression ISO/IEC MPEG (Moving Pictures Experts Group) � Standard includes both video and audio compression. � Started 1988; steps: � MPEG-1: Rates up to 1.5 Mbits / sec (VHS quality) � MPEG-2: Rates up to 10 Mbits / sec (Digi-TV, DVD, HDTV) � MPEG-3: Planned but dropped (found to be unnecessary) � MPEG-4: Object-based (separation from scene, animation, 3D, face modelling, interactivity, etc.) ITU-T (International Telecommunication Union): � H.261: Low bit-rates (e.g. videoconferencing) � H.262 = MPEG-2 � H.263: Low bit-rates (improved) � H.264 = MPEG 4 / Part 10, high compression power MMDB-7 J. Teuhola 2012 170

  4. Random access from compressed video � Broadcasting or accessing video from storage: It should be possible to start from (almost) any frame. � MPEG solution: Three kinds of frames: � I-frame : Coded without temporal correlation (prediction); � gives lowest compression gain. � P-frame : Motion-compensated prediction from the last (closest) I- or P-frame. � B-frame : Bidirectional prediction from the previous and/or the next I- or P-frame; � highest compression gain � gets over sudden changes � errors do not propagate. � GOP = Group Of Pictures = smallest random-access unit, must be decodable independently (starts usually with an I-frame). MMDB-7 J. Teuhola 2012 171

  5. Example of frame order in MPEG Bidirectional prediction I B B B P B B B P B B B I Forward prediction � Two orders of frames: � Display order � Bitstream order � Buffering is needed to convert from bitstream order into display order; a small delay is involved. � The predictor and predicted frame need not be adjacent. MMDB-7 J. Teuhola 2012 172

  6. Organizing and querying content of a video database Questions to be answered: � Which aspects of videos are likely to be of interest? � How should these aspects be represented and stored? � What kind of query languages are suitable? � Is the content extraction process manual or automatic? Possible aspects of interest: � Animate objects (people, etc.) � Inanimate objects (houses, cars, etc.) � Activities and events (walking, driving, etc.) Properties of objects: � Frame-dependent : valid in a subset of frames. � Frame-independent : valid for the video as a whole. MMDB-7 J. Teuhola 2012 173

  7. Query types from a video database (a)Retrieve a complete video by name (b)Find frame sequences (‘clips’; ’shots’) containing certain objects or activities . (c) Find all videos/sequences containing objects/activities with certain properties . (d)Given a frame sequence, find all objects (of a certain type) occurring in some or all of the frames of the segment. (e)Given a frame sequence, find all activities (of a certain type) occurring in it. NOTE: Video is a multimedia tool: images + audio + possible text. Audio channel can be extremely important in detecting events. Textual components (e.g. subtitles are invaluable keyword sources) MMDB-7 J. Teuhola 2012 174

  8. Indexing of video content � Content descriptions are not usually built on a frame-by-frame basis, due to the high number of frames. � Compact representations are needed. � Concepts: � Frame sequence : A contiguous subset of frames (e.g. a ‘shot’) � Well-ordered set of frame sequences : Temporal order, no overlaps � Solid set of frame sequences : Well-ordered, non-empty gaps between sequences (‘scene’) � Frame sequence association map : For each object and activity, a solid set of frame sequences is attached, showing frames in which they appear. MMDB-7 J. Teuhola 2012 175

  9. Frame segment tree � Binary tree � Special (1-dimensional) case of the spatial clipping approach. � Leaves represent basic intervals of the frame sequence: � Leaves are well ordered, and they cover the whole video. � Their endpoints include all endpoints of the sequences. � An internal node represents the concatenation of its children � The root represents the whole video. � Example of objects and activities: obj. 1 obj. 2 act. 1 frame no 1000 2000 3000 4000 5000 MMDB-7 J. Teuhola 2012 176

  10. Frame segment tree: example 0- 1 0- 5000 3000- 3000 2 3 5000 0- 2000- 3000- 4000- o2 o1 4 5 6 7 a1 2000 3000 4000 5000 o1 o2 9 10 11 a1 13 o2 14 o2 15 o1 8 12 a1 a1 500- 2000- 2500- 3500- 4000- 4500- 0- 3000- 2000 2500 3000 4000 4500 5000 500 3500 Indexing: Note: Actually the intervals are � Obj. 1 → 6, 9, 15 half-open, e.g. [0, 500) = 0..499 � Obj. 2 → 4, 10, 13, 14 � Act. 1 → 7, 9, 10, 12 MMDB-7 J. Teuhola 2012 177

  11. Indexing in the frame segment tree � For each object and activity record, there is a list of pointers to the nodes of the frame segment tree. � Objects and activities themselves may be indexed in traditional ways. � Each node of the frame segment tree points to a linked list of pointers to the objects and activities that appear throughout the whole segment that this node represents (but only partially in the parent segment). In the previous example: node 4 → obj. 2, node 6 → obj. 1 node 7 → act. 1 node 9 → obj.1, act. 1 node 10 → obj. 2, act. 1 node 12 → act. 1 node 13 → obj. 2 node 14 → act.2 node 15 → obj. 1 � This can be generalized to a set of videos (common frame segment tree, combined object/activity set, extended pointers). MMDB-7 J. Teuhola 2012 178

  12. Queries using a frame segment tree (a)Find segments where a given object/activity occurs (trivial; just follow the pointers.) (b)Find objects occurring between frames s and e : Walk the tree in preorder, denote the current node interval by I . � If I ∩ [ s, e ) = ∅ , then this subtree can be skipped. � If I ⊆ [ s, e ), then walk through the whole subtree (including the current node) and report all its objects. � Otherwise report the objects and activities of the current node, and continue the search to both subtrees. (c) Find objects/activities occurring together with object x : Scan the segments where x occurs, and report the objects/activities occurring in these segments and their ancestors. MMDB-7 J. Teuhola 2012 179

  13. R-segment tree (RS-tree) � Special case of R-tree � Two possible implementations: (a) 1-dimensional space (dimension = time) (b) 2-dimensional space, where the other dimension is just enumeration of objects/activities (not a true spatial dimension): R2 R1 obj. 1 obj. 2 R3 act. 1 1000 2000 3000 4000 5000 MMDB-7 J. Teuhola 2012 180

  14. Computer-assisted video analysis Video segmentation: � Division of videos into homogeneous sequences. � Typical segments are often so called shots , filmed without interrupts � Segmentation = detection of shot boundaries � Sharp cuts are easier than gradual transitions (e.g. crossfade) � Features for automatic segmentation: � Similarity of c olor histograms of subsequent frames: simple and effective, but sensitive to varying illumination. � Edge features : similarity of shapes � Motion vectors : restricted vector lengths within a shot. � Corner points : similarity of landmark points in frames � The actual segmentation can be based on thresholds for similarity, but also machine learning techniques have been used widely. � Higher-level segmentation into scenes , called also story units. MMDB-7 J. Teuhola 2012 181

  15. Computer-assisted video analysis (cont.) Keyframes: � Representative frames within shots, containing the essential elements for retrieval � Scene-level segmentation often uses keyframe features, and operates e.g. in top-down or bottom-up manner. Choosing keyframes: � Fuzzy task – no definite optimum � Can be based on the same features as segmentation � Various algoritmic approaches: � Sequential comparison � Clustering � Trajectory-based � Decision in the context of object/event detection MMDB-7 J. Teuhola 2012 182

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend