ImageProof: Enabling Authentication for Large-Scale Image Retrieval - - PowerPoint PPT Presentation

imageproof enabling authentication for large scale image
SMART_READER_LITE
LIVE PREVIEW

ImageProof: Enabling Authentication for Large-Scale Image Retrieval - - PowerPoint PPT Presentation

ImageProof: Enabling Authentication for Large-Scale Image Retrieval Shangwei Guo 1 Jianliang Xu 1 Ce Zhang 1 Cheng Xu 1 Tao Xiang 2 1 Department of Computer Science, Hong Kong Baptist University 2 College of Computer Science, Chongqing University


slide-1
SLIDE 1

ImageProof: Enabling Authentication for Large-Scale Image Retrieval

Shangwei Guo1 Jianliang Xu1 Ce Zhang1 Cheng Xu1 Tao Xiang2

1Department of Computer Science, Hong Kong Baptist University 2College of Computer Science, Chongqing University

{csswguo,xujl,cezhang,chengxu}@comp.hkbu.edu.hk, txiang@cqu.edu.cn ICDE 2019

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval

slide-2
SLIDE 2

Background

  • Content-based image retrieval (CBIR) has been widely used in business
  • Data-as-a-Service (DaaS) enables companies to build and then outsource

image retrieval systems to cloud platforms

Image Owner Client Service Provider Similar images Database

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 1/17

slide-3
SLIDE 3

Background

  • Content-based image retrieval (CBIR) has been widely used in business
  • Data-as-a-Service (DaaS) enables companies to build and then outsource

image retrieval systems to cloud platforms

Image Owner Client Service Provider Similar images Database

  • Security Threat:
  • Query result integrity not guaranteed due to software/hardware malfunctions,

hack attacks

  • Examples
  • Product image search
  • Medical image search

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 1/17

slide-4
SLIDE 4

SIFT-Based Image Retrieval

  • Detect and extract local features using scale invariant feature transform

(SIFT) and its variants

Feature Extraction

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 2/17

slide-5
SLIDE 5

SIFT-Based Image Retrieval

  • Detect and extract local features using scale invariant feature transform

(SIFT) and its variants

  • Twe Steps
  • Bag-of-visual-words (BoVW) encoding
  • Approximate k-means (AKM) using randomized k-d trees

Feature Extraction BoVW Encoding AKM

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 2/17

slide-6
SLIDE 6

SIFT-Based Image Retrieval

  • Detect and extract local features using scale invariant feature transform

(SIFT) and its variants

  • Twe Steps
  • Bag-of-visual-words (BoVW) encoding
  • Approximate k-means (AKM) using randomized k-d trees
  • Inverted index search: search similar images with impact-ordered inverted

index

Feature Extraction BoVW Encoding AKM Inverted Index Search

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 2/17

slide-7
SLIDE 7

Problem Model

  • Malicious threat model
  • The service provider (SP) could return incorrect results (e.g., faked or low-

ranked images)

Image Owner Client Service Provider Similar images Database

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 3/17

slide-8
SLIDE 8

Problem Model

  • Malicious threat model
  • The service provider (SP) could return incorrect results (e.g., faked or low-

ranked images)

Image Owner Client Service Provider Similar images Database

  • Query authentication for SIFT-based image retrieval and top-k query

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 3/17

slide-9
SLIDE 9

Problem Model

  • Malicious threat model
  • The service provider (SP) could return incorrect results (e.g., faked or low-

ranked images)

Image Owner Client Service Provider Similar images Database

  • Query authentication for SIFT-based image retrieval and top-k query
  • Challenges
  • Designing a query authentication scheme for a large and complex retrieval

system is a big challenge in itself

  • The client usually has only limited storage, communication, and computation

resources

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 3/17

slide-10
SLIDE 10

Problem Model

Image Owner Client Service Provider Similar images & VO Database and ADS

  • Our Solution:
  • Taking the advantage of the authenticated data structures (ADSs), the SP

returns a verification object (VO) to prove

  • Soundness: The results must be the images which have not been tampered

with

  • Completeness: The results include the k most similar images

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 4/17

slide-11
SLIDE 11

Our Contributions

  • Propose an efficient authentication scheme, ImageProof, for SIFT-based

image retrieval with large or medium-sized codebooks

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 5/17

slide-12
SLIDE 12

Our Contributions

  • Propose an efficient authentication scheme, ImageProof, for SIFT-based

image retrieval with large or medium-sized codebooks

  • Two novel ADS components:
  • Merkle randomized k-d tree
  • Merkle inverted index with cuckoo filters

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 5/17

slide-13
SLIDE 13

Our Contributions

  • Propose an efficient authentication scheme, ImageProof, for SIFT-based

image retrieval with large or medium-sized codebooks

  • Two novel ADS components:
  • Merkle randomized k-d tree
  • Merkle inverted index with cuckoo filters
  • Develop several optimization techniques to further reduce the costs of both

the SP and the client

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 5/17

slide-14
SLIDE 14

Preliminaries

  • Merkle Hash Tree
  • An authenticated binary tree, enabling users to verify individual data objects

without retrieving the entire database

N1 N2 N3 N4 N5 N6 N7

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

= h(h( )|h( )) hN7

  • 7
  • 8

= h( | ) hN2 hN4 hN5 = h( | ) hN1 hN2 hN3 si = sign(sk, ) gmht hN1

Figure 1: An example of a Merkle hash tree.

1 2 3 4 5 6 7

x (x) h1 (x) h2

fpz fpx

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 6/17

slide-15
SLIDE 15

Preliminaries

  • Merkle Hash Tree
  • An authenticated binary tree, enabling users to verify individual data objects

without retrieving the entire database

  • Cuckoo Filter
  • An efficient data structure for approximate set membership tests
  • Two hash values per item
  • Support delete operation

N1 N2 N3 N4 N5 N6 N7

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

= h(h( )|h( )) hN7

  • 7
  • 8

= h( | ) hN2 hN4 hN5 = h( | ) hN1 hN2 hN3 si = sign(sk, ) gmht hN1

Figure 1: An example of a Merkle hash tree.

1 2 3 4 5 6 7

x (x) h1 (x) h2

fpz fpx

Insert Delete

Figure 2: A cuckoo filter, two hash values per item.

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 6/17

slide-16
SLIDE 16

Scheme Overview

  • Ensure the integrity of query processing for each step

Authenticated BoVW Encoding Authenticated Inverted Index Search

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 7/17

slide-17
SLIDE 17

Scheme Overview

  • Ensure the integrity of query processing for each step
  • Two novel ADS components:
  • Merkle randomized k-d tree
  • Merkle inverted index with cuckoo filters

Merkle Randomized k-d Tree Merkle Inverted Index

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 7/17

slide-18
SLIDE 18

Merkle Randomlized k-d Tree (MRKD-tree)

  • ADS
  • Internal nodes and leaf nodes

l1 l2 l3 l4 l5 l6 l7

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

c3 c4 c2 c1 c5 c6 c7 c8 l1 l2 l5 l3 l4 l6 l7 q1 q2 = { , }, = h( | )

  • 8

c8 hΓc8 ho8 c8 hΓc8 [ , [ [ , ], ]] l1 h2 l3 l6 hΓc5 hΓc6 h7 = h( | | ) h1 l1 h2 h3 = h( | | ) h7 l7 ho7 ho8 The V for , : OC q1 q2

Figure 3: An example of the MRKD-tree and VO generation for query q1, q2.

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 8/17

slide-19
SLIDE 19

Merkle Randomlized k-d Tree (MRKD-tree)

  • ADS
  • Internal nodes and leaf nodes

l1 l2 l3 l4 l5 l6 l7

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

c3 c4 c2 c1 c5 c6 c7 c8 l1 l2 l5 l3 l4 l6 l7 q1 q2 = { , }, = h( | )

  • 8

c8 hΓc8 ho8 c8 hΓc8 [ , [ [ , ], ]] l1 h2 l3 l6 hΓc5 hΓc6 h7 = h( | | ) h1 l1 h2 h3 = h( | | ) h7 l7 ho7 ho8 The V for , : OC q1 q2

Figure 3: An example of the MRKD-tree and VO generation for query q1, q2.

  • Authenticated Query Processing
  • Given a set of feature vectors, calculate the BoVW vector
  • Generate a single verification object (VO) for all feature vectors by maximiz-

ing the use of shared tree nodes

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 8/17

slide-20
SLIDE 20

Merkle Inverted Index With Cuckoo Filters

  • ADS
  • Each Merkle inverted list Γci consists of five components, i.e., the associated

cluster ci, the digest h(Θci), the cluster weight wci, the cuckoo filter Θi and its posting list Table 1: An example of the Merkle inverted lists.

ci hΓci wci Θi

Posting Lists

c5 h(2 √ 2|h(Θc5)|hpos5,1) 2 √ 2 Θc5 → 1, 0.34, hpos5,1 3, 0.26, hpos5,2 4, 0.25, hpos5,3

...

c6 h( √ 2|h(Θc6)|hpos6,1) √ 2 Θc6 → 5, 0.41, hpos6,1 8, 0.32, hpos6,2 3, 0.28, hpos6,3

...

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 9/17

slide-21
SLIDE 21

Merkle Inverted Index With Cuckoo Filters

  • ADS
  • Each Merkle inverted list Γci consists of five components, i.e., the associated

cluster ci, the digest h(Θci), the cluster weight wci, the cuckoo filter Θi and its posting list Table 1: An example of the Merkle inverted lists.

ci hΓci wci Θi

Posting Lists

c5 h(2 √ 2|h(Θc5)|hpos5,1) 2 √ 2 Θc5 → 1, 0.34, hpos5,1 3, 0.26, hpos5,2 4, 0.25, hpos5,3

...

c6 h( √ 2|h(Θc6)|hpos6,1) √ 2 Θc6 → 5, 0.41, hpos6,1 8, 0.32, hpos6,2 3, 0.28, hpos6,3

...

  • Authenticated Query Processing
  • Find top-k most similar images and generate the VO of inverted index search
  • Ensure the integrity of top-k search with fewer postings with the help of

cuckoo filters

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 9/17

slide-22
SLIDE 22

Merkle Inverted Index With Cuckoo Filters

  • Main Idea
  • Termination conditions:
  • 1. sL

k ≥ SU(Q, I), the upper bound of the similarity scores of the images popped,

where sL

k is the lower bound of the k-th similar score

  • 2. sL

k ≥ the upper bound of the similarity scores of the images not popped

ci hΓci wci Θi

Posting Lists

c5 h(2 √ 2|h(Θc5)|hpos5,1) 2 √ 2 Θc5 → 1, 0.34, hpos5,1 3, 0.26, hpos5,2 4, 0.25, hpos5,3 10, 0.17, hpos5,4 7, 0.11, hpos5,5

...

c6 h( √ 2|h(Θc6)|hpos6,1) √ 2 Θc6 → 5, 0.41, hpos6,1 8, 0.32, hpos6,2 3, 0.28, hpos6,3 6, 0.25, hpos6,4 4, 0.10, hpos6,5

...

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 10/17

slide-23
SLIDE 23

Merkle Inverted Index With Cuckoo Filters

  • Main Idea
  • Termination conditions:
  • 1. sL

k ≥ SU(Q, I), the upper bound of the similarity scores of the images popped,

where sL

k is the lower bound of the k-th similar score

  • 2. sL

k ≥ the upper bound of the similarity scores of the images not popped

ci hΓci wci Θi

Posting Lists

c5 h(2 √ 2|h(Θc5)|hpos5,1) 2 √ 2 Θc5 → 1, 0.34, hpos5,1 3, 0.26, hpos5,2 4, 0.25, hpos5,3 10, 0.17, hpos5,4 7, 0.11, hpos5,5

...

c6 h( √ 2|h(Θc6)|hpos6,1) √ 2 Θc6 → 5, 0.41, hpos6,1 8, 0.32, hpos6,2 3, 0.28, hpos6,3 6, 0.25, hpos6,4 4, 0.10, hpos6,5

...

  • Estimate the similarity bounds using the cuckoo filters

Table 2: Example: the postings for S(Q, 5).

Without cuckoo filter:

SU(Q, 5) → 5, 0.41, hpos6,1, 4, 0.25, hpos5,3 SL(Q, 5) → 5, 0.41, hpos6,1

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 10/17

slide-24
SLIDE 24

Merkle Inverted Index With Cuckoo Filters

  • Main Idea
  • Termination conditions:
  • 1. sL

k ≥ SU(Q, I), the upper bound of the similarity scores of the images popped,

where sL

k is the lower bound of the k-th similar score

  • 2. sL

k ≥ the upper bound of the similarity scores of the images not popped

ci hΓci wci Θi

Posting Lists

c5 h(2 √ 2|h(Θc5)|hpos5,1) 2 √ 2 Θc5 → 1, 0.34, hpos5,1 3, 0.26, hpos5,2 4, 0.25, hpos5,3 10, 0.17, hpos5,4 7, 0.11, hpos5,5

...

c6 h( √ 2|h(Θc6)|hpos6,1) √ 2 Θc6 → 5, 0.41, hpos6,1 8, 0.32, hpos6,2 3, 0.28, hpos6,3 6, 0.25, hpos6,4 4, 0.10, hpos6,5

...

  • Estimate the similarity bounds using the cuckoo filters

Table 2: Example: the postings for S(Q, 5).

Without cuckoo filter:

SU(Q, 5) → 5, 0.41, hpos6,1, 4, 0.25, hpos5,3 SL(Q, 5) → 5, 0.41, hpos6,1

With cuckoo filter:

SU(Q, 5) → 5, 0.41, hpos6,1 SL(Q, 5) → 5, 0.41, hpos6,1

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 10/17

slide-25
SLIDE 25

ImageProof

  • ADS Generation
  • Build Merkle inverted lists {Γci} and

MRKD-trees {Ti}

1 c00 c01 c0nc c10 c11 c1nc ... ... ... c 0

nt c 1 nt

cntnc ... Γc2 Γc1 Γc3 ... ΓcnC MRKD­trees Merkle Inverted Lists

...

BoVW Vector 2 nt sign(sk, h( | | ⋯ | )) hRoot1 hRoot2 hRootnt

Figure 4: An overview of ADSs for ImageProof.

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 11/17

slide-26
SLIDE 26

ImageProof

  • ADS Generation
  • Build Merkle inverted lists {Γci} and

MRKD-trees {Ti}

  • Authenticated Query Processing
  • Search the top-k images and generate

the VOs for both the BoVW encoding and the inverted index search

  • Send the VOs, together with the top-k re-

sults the client

1 c00 c01 c0nc c10 c11 c1nc ... ... ... c 0

nt c 1 nt

cntnc ... Γc2 Γc1 Γc3 ... ΓcnC MRKD­trees Merkle Inverted Lists

...

BoVW Vector 2 nt sign(sk, h( | | ⋯ | )) hRoot1 hRoot2 hRootnt

Figure 4: An overview of ADSs for ImageProof.

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 11/17

slide-27
SLIDE 27

ImageProof

  • ADS Generation
  • Build Merkle inverted lists {Γci} and

MRKD-trees {Ti}

  • Authenticated Query Processing
  • Search the top-k images and generate

the VOs for both the BoVW encoding and the inverted index search

  • Send the VOs, together with the top-k re-

sults the client

  • Result Verification
  • Check the integrity of image retrieval
  • Verify the integrity of raw image data

1 c00 c01 c0nc c10 c11 c1nc ... ... ... c 0

nt c 1 nt

cntnc ... Γc2 Γc1 Γc3 ... ΓcnC MRKD­trees Merkle Inverted Lists

...

BoVW Vector 2 nt sign(sk, h( | | ⋯ | )) hRoot1 hRoot2 hRootnt

Figure 4: An overview of ADSs for ImageProof.

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 11/17

slide-28
SLIDE 28

Optimization

  • Compressing nearest neighbor candidates

Candidates Nearest Candidates Nearest

Optimization

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 12/17

slide-29
SLIDE 29

Optimization

  • Compressing nearest neighbor candidates

Candidates Nearest Candidates Nearest

Optimization

  • Frequency-grouped inverted index

Component Value

ci c5 hΓf

ci

h(2 √ 2|h(Θc5)|hf

pos5,1)

wci 2 √ 2 Θci Θc5

Posting List

1, 0.34, hpos5,1 3, 0.26, hpos5,2 4, 0.25, hpos5,3 · · ·

Component Value

ci c5 hΓf

ci

h(2 √ 2|h(Θc5)|hf

pos5,1)

wci 2 √ 2 Θci Θc5

Posting List

4, (1, 33.3; 10, 66.6), hposf

5,1

5, (3, 54.4), hposf

5,2

3, (4, 33.9; 7, 77.1; 2, 94.3), hposf

5,3

· · ·

Optimization

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 12/17

slide-30
SLIDE 30

Performance Evaluation

  • Experimental Setup
  • Dataset: MirFlickr1M
  • Algorithms
  • Baseline: The scheme that combines the proposed MRKD-trees without shar-

ing nodes and the authenticated inverted index search in PVLDB2008

  • ImageProof: The proposed scheme
  • Optimized: The optimized ImageProof

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 13/17

slide-31
SLIDE 31

BoVW Performance

400 600 800 1000 # Feature Vectors 0.00 0.25 0.50 0.75 1.00 1.25 SP CPU Time (s) Without Sharing MRKDSearch Optimized 400 600 800 1000 # Feature Vectors 2 4 6 Client CPU Time (s) Without Sharing MRKDSearch Optimized 400 600 800 1000 # Feature Vectors 2 4 6 8 10 VO Size (MB) 0.0 0.2 0.4 0.6 0.8 1.0 Ratio Without Sharing MRKDSearch Optimized Ratio

Figure 5: BoVW performance as the number of feature vectors increases.

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 14/17

slide-32
SLIDE 32

Overall Performance

0.4 0.6 0.8 1.0 Dataset Size (Million) 100 200 300 400 500 SP CPU Time (s) Baseline ImageProof Optimized (BoVW) Optimized (Both) 0.4 0.6 0.8 1.0

Dataset Size (Million) 5 10 15 20 Client CPU Time (s) Baseline ImageProof Optimized (BoVW) Optimized (Both) 0.4 0.6 0.8 1.0 Dataset Size (Million) 5 10 15 VO Size (MB) Baseline ImageProof Optimized (BoVW) Optimized (Both)

Figure 6: Overall performance as dataset size increases.

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 15/17

slide-33
SLIDE 33

Summary

  • Focus on the query authentication problem in SIFT-based image retrieval
  • Two authenticated data structures (ADSs) for both BoVW encoding and

inverted index search

  • Extensive experiments on real-world image dataset

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 16/17

slide-34
SLIDE 34

Thanks Q&A

Guo et al. | ImageProof: Enabling Authentication for Large-Scale Image Retrieval 17/17