SLIDE 1
Certifying Video Provenance
Ashish Gehani I3P/SRI
1
SLIDE 2 INTRODUCTION : Why certify?
- Video cameras / editors ubiquitous
- Data originates from multiple sources
- Consumer trusts some producers
- Must be able to validate provenance
- Examples:
– Surveillance camera, police matches, lawyer uses – Tank camera, classification edits, WMD analyst uses
2
SLIDE 3 INTRODUCTION : Why embed?
– Consistent copies – Synchronized access – Storage overhead – Loose coupling → Error prone
– Administrative domain – Transfer between hosts, operating systems
- Video Embedding of Information for Lineage (VEIL):
– Inband encoding – Interoperable with legacy applications, libraries – Facilitate uptake
3
SLIDE 4 PROVENANCE : Desired properties
– Signed elements
– Prune elements when possible
– Don’t prune when output changes
4
SLIDE 5
PROVENANCE : Lineage tree
Output Operation Input 1 Input n
(a) Primitive operation (b) Compound operation tree
5
SLIDE 6 PROVENANCE : Granularity
- (Output, Executor, Input1, . . . , Inputn)
- Level - Assembler, System call, File?
- Fine → High overhead
- Coarse → False positives
- Can’t prune intra-process side-effects dynamically
6
SLIDE 7 PROVENANCE : Metadata format
- Primitive operation format:
IP Address Inode Time 1 Signature Output End Input n Executor Input
- Executor: 32 bit IPv4 address, 32 bit user ID
- Signature: 160 bits
[ S = SIGNKE(O, I1, . . . , In) ]
– 32 bit IPv4 address – 32 bit inode – 32 bit time (Unix seconds count from 1 Jan, 1970)
7
SLIDE 8 PROVENANCE : Collection
> veil -o Output.mov -i Input 1.mov Input 2.mov Input 3.mov
File 1 Read File 2 Read close()
close() File 3 Write Process execution Time close()
8
SLIDE 9 RELATED WORK :
– Single host semantics
– Secrecy → Low capacity
– Robustness → Low capacity
9
SLIDE 10
EMBEDDING : Interposition point
Signal to Frame Decomposition Motion Estimation VEIL Interpolation Residue Calculation Entropy Coding Decoding Entropy Reconstitution Frame Block VEIL Subpixel Analysis Decompression Interpolation Compression Quantization Spatial to Signal Domain Dequantization Spatial Domain 10
SLIDE 11 EMBEDDING : Overview
Metadata encoding operation
1 Signature Output End Input n Executor Input 1 Signature Output End Input n Executor Input 1 Signature Output End Input n Executor
Operation Video Frame
Input
11
SLIDE 12 EMBEDDING : Subpixel displacement
- Human visual system tolerates subpixel shift:
- riginal frame
- riginal frame
interpolated frame interpolated frame Object in Object in VEIL Outline of Outline of block in
12
SLIDE 13 EMBEDDING : Registration error compensation
Surface of object Points on surface mapped to first frame Points on surface mapped to second frame Points on surface mapped to third frame Camera Registration Grid
- v1, v2 : Adjacent pixel intensities
- vi : Interpolated pixel intensity
- vi = v1 + (δx − δi)(v2 − v1)
13
SLIDE 14
EXTRACTION : Checking lineage
Algorithm : CHECKLINEAGE(D) {E, S, O, I1, . . . , In} ← GETROOT(D) OUTPUT(E) PE ← PKILOOKUP(E) if I1, . . . , In = {} then VERIFYSIGNATURE(PE, S, O) else 8 > > > > > > > > < > > > > > > > > : Result ← VERIFYSIGNATURE(PE, S, O|I1| . . . |In) if Result = TRUE then 8 < : for i ← 1 to n do CHECKLINEAGE(Ii) else CheckFailed
14
SLIDE 15 EXTRACTION : Interpolation estimation
- Subpixel alignment minimizes visual distance:
dx
15
SLIDE 16 EXTRACTION : Calculating subpixel displacement
Theorem: δx can be calculated in O(β) time using O(log β) space, where β is block size. Proof: MSE(δx) = 1 β
xmax
ymax
[F(x + δx, y) − G(x, y)]2 d d(δx)MSE(δx) = 0 δx =xmax
x=xmin
ymax
y=ymin [ [F(x + 1, y) − F(x, y)] [F(x, y) − G(x, y)] ]
xmax
x=xmin
ymax
y=ymin [F(x + 1, y) − F(x, y)]2
SLIDE 17 CONCLUSION :
- – No auxiliary files to manage
– No storage overhead – Legacy decoders work
- – Fraction of data needed for lineage
– All data needed to bind lineage to data – Legacy writes detectable
– 21 writers, 4 inputs each, 8x8 blocks, 4 bits/block – Tree extraction: 0.06 seconds
17
SLIDE 18
More?
18
SLIDE 19 DESIGN : Why not watermark?
– Watermark protects producer – VEIL protects consumer
– Watermark is: ∗ Robust - survives aggressive distortion ∗ Fragile - localizes distortion – Consumer not adversary, constraints irrelevant
– Watermark introduces flickering, frame noise – VEIL avoids visual distraction
19
SLIDE 20 DESIGN : Why not watermark?
[Continued]
– Video has heterogeneous signal, noise characteristics – Limited areas with: ∗ Low visual distortion ∗ High robustness ∗ High capacity – VEIL needs large capacity
20
SLIDE 21 PROVENANCE : Space requirement
- Space requirement (in KB):
Fan-in 1 2 3 4 Levels 2 0.09 0.14 0.19 0.24 3 0.14 0.34 0.65 1.05 4 0.19 0.75 2.02 4.30 5 0.24 1.56 6.13 17.30
21
SLIDE 22 EVALUATION : Extraction time
- δx extraction in C, Mac OS 10.4, 2 GHz Intel
- Time depends on block size:
Block size Time to compute δx (in µs) Overhead (in sec) for 1 sec of video (640x480 resolution, 30 fps) 4x4 1.3 0.723 8x8 5.2 0.746 16x16 18.2 0.648 32x32 72.2 0.649 64x64 292.1 0.657 22
SLIDE 23 EVALUATION : Embedding capacity
- Tree - Fan-in: 4, Levels: 4, Storage: 35, 264 bits
- I frames: 5%
Bits encoded Block size Redundant copies (ρ) Video length needed (in sec) per block (α) (β in bits) in 1 min video to reconstruct lineage tree 2 8x8 465 0.13 4 8x8 931 0.06 4 16x16 232 0.26 6 8x8 1396 0.04 23