Certifying Video Provenance Ashish Gehani I3P/SRI 1 INTRODUCTION - - PowerPoint PPT Presentation

certifying video provenance
SMART_READER_LITE
LIVE PREVIEW

Certifying Video Provenance Ashish Gehani I3P/SRI 1 INTRODUCTION - - PowerPoint PPT Presentation

Certifying Video Provenance Ashish Gehani I3P/SRI 1 INTRODUCTION : Why certify? Video cameras / editors ubiquitous Data originates from multiple sources Consumer trusts some producers Must be able to validate provenance


slide-1
SLIDE 1

Certifying Video Provenance

Ashish Gehani I3P/SRI

1

slide-2
SLIDE 2

INTRODUCTION : Why certify?

  • Video cameras / editors ubiquitous
  • Data originates from multiple sources
  • Consumer trusts some producers
  • Must be able to validate provenance
  • Examples:

– Surveillance camera, police matches, lawyer uses – Tank camera, classification edits, WMD analyst uses

2

slide-3
SLIDE 3

INTRODUCTION : Why embed?

  • Auxiliary files:

– Consistent copies – Synchronized access – Storage overhead – Loose coupling → Error prone

  • Embedding agnostic to:

– Administrative domain – Transfer between hosts, operating systems

  • Video Embedding of Information for Lineage (VEIL):

– Inband encoding – Interoperable with legacy applications, libraries – Facilitate uptake

3

slide-4
SLIDE 4

PROVENANCE : Desired properties

  • Sound

– Signed elements

  • Necessary

– Prune elements when possible

  • Complete

– Don’t prune when output changes

4

slide-5
SLIDE 5

PROVENANCE : Lineage tree

Output Operation Input 1 Input n

(a) Primitive operation (b) Compound operation tree

5

slide-6
SLIDE 6

PROVENANCE : Granularity

  • (Output, Executor, Input1, . . . , Inputn)
  • Level - Assembler, System call, File?
  • Fine → High overhead
  • Coarse → False positives
  • Can’t prune intra-process side-effects dynamically

6

slide-7
SLIDE 7

PROVENANCE : Metadata format

  • Primitive operation format:

IP Address Inode Time 1 Signature Output End Input n Executor Input

  • Executor: 32 bit IPv4 address, 32 bit user ID
  • Signature: 160 bits

[ S = SIGNKE(O, I1, . . . , In) ]

  • Input / Output File:

– 32 bit IPv4 address – 32 bit inode – 32 bit time (Unix seconds count from 1 Jan, 1970)

7

slide-8
SLIDE 8

PROVENANCE : Collection

  • Manually:

> veil -o Output.mov -i Input 1.mov Input 2.mov Input 3.mov

  • Automatically:
  • pen()

File 1 Read File 2 Read close()

  • pen()
  • pen()

close() File 3 Write Process execution Time close()

8

slide-9
SLIDE 9

RELATED WORK :

  • Provenance

– Single host semantics

  • Steganography

– Secrecy → Low capacity

  • Watermarking

– Robustness → Low capacity

9

slide-10
SLIDE 10

EMBEDDING : Interposition point

Signal to Frame Decomposition Motion Estimation VEIL Interpolation Residue Calculation Entropy Coding Decoding Entropy Reconstitution Frame Block VEIL Subpixel Analysis Decompression Interpolation Compression Quantization Spatial to Signal Domain Dequantization Spatial Domain 10

slide-11
SLIDE 11

EMBEDDING : Overview

Metadata encoding operation

1 Signature Output End Input n Executor Input 1 Signature Output End Input n Executor Input 1 Signature Output End Input n Executor

  • Primitive

Operation Video Frame

Input

11

slide-12
SLIDE 12

EMBEDDING : Subpixel displacement

  • Human visual system tolerates subpixel shift:
  • riginal frame
  • riginal frame

interpolated frame interpolated frame Object in Object in VEIL Outline of Outline of block in

12

slide-13
SLIDE 13

EMBEDDING : Registration error compensation

Surface of object Points on surface mapped to first frame Points on surface mapped to second frame Points on surface mapped to third frame Camera Registration Grid

  • v1, v2 : Adjacent pixel intensities
  • vi : Interpolated pixel intensity
  • vi = v1 + (δx − δi)(v2 − v1)

13

slide-14
SLIDE 14

EXTRACTION : Checking lineage

Algorithm : CHECKLINEAGE(D) {E, S, O, I1, . . . , In} ← GETROOT(D) OUTPUT(E) PE ← PKILOOKUP(E) if I1, . . . , In = {} then VERIFYSIGNATURE(PE, S, O) else 8 > > > > > > > > < > > > > > > > > : Result ← VERIFYSIGNATURE(PE, S, O|I1| . . . |In) if Result = TRUE then 8 < : for i ← 1 to n do CHECKLINEAGE(Ii) else CheckFailed

14

slide-15
SLIDE 15

EXTRACTION : Interpolation estimation

  • Subpixel alignment minimizes visual distance:

dx

15

slide-16
SLIDE 16

EXTRACTION : Calculating subpixel displacement

Theorem: δx can be calculated in O(β) time using O(log β) space, where β is block size. Proof: MSE(δx) = 1 β

xmax

  • x=xmin

ymax

  • y=ymin

[F(x + δx, y) − G(x, y)]2 d d(δx)MSE(δx) = 0 δx =xmax

x=xmin

ymax

y=ymin [ [F(x + 1, y) − F(x, y)] [F(x, y) − G(x, y)] ]

xmax

x=xmin

ymax

y=ymin [F(x + 1, y) − F(x, y)]2

  • 16
slide-17
SLIDE 17

CONCLUSION :

  • – No auxiliary files to manage

– No storage overhead – Legacy decoders work

  • – Fraction of data needed for lineage

– All data needed to bind lineage to data – Legacy writes detectable

  • – Real-time speed

– 21 writers, 4 inputs each, 8x8 blocks, 4 bits/block – Tree extraction: 0.06 seconds

17

slide-18
SLIDE 18

More?

18

slide-19
SLIDE 19

DESIGN : Why not watermark?

  • Security

– Watermark protects producer – VEIL protects consumer

  • Robustness

– Watermark is: ∗ Robust - survives aggressive distortion ∗ Fragile - localizes distortion – Consumer not adversary, constraints irrelevant

  • Visibility

– Watermark introduces flickering, frame noise – VEIL avoids visual distraction

19

slide-20
SLIDE 20

DESIGN : Why not watermark?

[Continued]

  • Capacity

– Video has heterogeneous signal, noise characteristics – Limited areas with: ∗ Low visual distortion ∗ High robustness ∗ High capacity – VEIL needs large capacity

20

slide-21
SLIDE 21

PROVENANCE : Space requirement

  • Space requirement (in KB):

Fan-in 1 2 3 4 Levels 2 0.09 0.14 0.19 0.24 3 0.14 0.34 0.65 1.05 4 0.19 0.75 2.02 4.30 5 0.24 1.56 6.13 17.30

21

slide-22
SLIDE 22

EVALUATION : Extraction time

  • δx extraction in C, Mac OS 10.4, 2 GHz Intel
  • Time depends on block size:

Block size Time to compute δx (in µs) Overhead (in sec) for 1 sec of video (640x480 resolution, 30 fps) 4x4 1.3 0.723 8x8 5.2 0.746 16x16 18.2 0.648 32x32 72.2 0.649 64x64 292.1 0.657 22

slide-23
SLIDE 23

EVALUATION : Embedding capacity

  • Tree - Fan-in: 4, Levels: 4, Storage: 35, 264 bits
  • I frames: 5%

Bits encoded Block size Redundant copies (ρ) Video length needed (in sec) per block (α) (β in bits) in 1 min video to reconstruct lineage tree 2 8x8 465 0.13 4 8x8 931 0.06 4 16x16 232 0.26 6 8x8 1396 0.04 23