Asymmetric Caching: Improved Network Deduplication for Mobile - - PowerPoint PPT Presentation

asymmetric caching improved network deduplication for
SMART_READER_LITE
LIVE PREVIEW

Asymmetric Caching: Improved Network Deduplication for Mobile - - PowerPoint PPT Presentation

Asymmetric Caching: Improved Network Deduplication for Mobile Devices Shruti Sanadhya, 1 Raghupathy Sivakumar, 1 Kyu-Han Kim, 2 Paul Congdon, 2 Sriram Lakshmanan, 1 Jatinder P Singh 3 1 Georgia Institute of Technology, Atlanta, GA, USA 2 HP Labs,


slide-1
SLIDE 1

Asymmetric Caching: Improved Network Deduplication for Mobile Devices

Shruti Sanadhya,1 Raghupathy Sivakumar,1 Kyu-Han Kim,2 Paul Congdon,2 Sriram Lakshmanan,1 Jatinder P Singh3

1Georgia Institute of Technology, Atlanta, GA, USA 2HP Labs, Palo Alto, CA, USA 3Xerox PARC, Palo Alto, CA, USA

1

slide-2
SLIDE 2
  • Network traffic has a lot of redundancy

– 20% HTTP content accessed on smartphones is redundant1

  • Network deduplication (dedup) leverages this redundancy to

conserve network bandwidth

1 Qian et al,, “Web Caching on Smartphones: Ideal vs. Reality “ , MobiSys 2012

Mobile

Introduction

2

C1 C 2 C3 Rabin Fingerprinting H1 H2 H3 Hashing Sender Receiver Dedup Source Dedup Destination C1 C3 H2 Compress C1 C3 H2 H2 Regular Cache H2 C2 Mobile Cache SGSN Inflate packet C1 C 2 C3

slide-3
SLIDE 3
  • What happens when the mobile cache is more populated

than the cache at dedup source? How can all the past cached information at the mobile be successfully leveraged for dedup by any given dedup source?

The Asymmetry Problem

3

Sender Receiver Dedup Source Dedup Destination H2 H2 C2 Regular Cache Mobile Cache Mobile Cache H3 C3 H4 C4 H5 C5 H6 C6 H7 C7 H8 C8 H2 C2

slide-4
SLIDE 4
  • Multi-homed devices

Motivational Scenarios

4

WiFi Access Point 3G Base Station (BS) Mobile Device Cache Cache Cache

slide-5
SLIDE 5
  • Multi-homed devices
  • Resource pooling

– BS: Base Station – RNC: Radio Network Controller – SGSN: Serving GPRS Support Node

Motivational Scenarios

5

SGSN Mobile Device BS RNC BS SGSN RNC Cache Cache Cache

slide-6
SLIDE 6
  • Multi-homed devices
  • Resource pooling
  • Memory scalability

– BS: Base Station – RNC: Radio Network Controller – SGSN: Serving GPRS Support Node

Motivational Scenarios

6

BS RNC SGSN Cache Cache Cache Cache

slide-7
SLIDE 7

Scope and Goals

  • Scope

– Laptops/smartphones using 3G/WiFi – Conserving cellular bandwidth – Downstream and unencrypted traffic

  • Goals

– Overall efficiency: Using downstream and upstream more efficiently – Application agnostic: Applicable to any application – Limited overheads: Deployable computational and memory complexities

7

slide-8
SLIDE 8
  • Mobile cache is more populated than dedup source
  • On receiving downstream traffic, the mobile selectively

advertises portions of its cache to dedup source

  • Dedup source also maintains a feedback cache
  • Both regular and feedback cache is used for dedup

Asymmetric Caching - Overview

8

Sender Receiver Dedup Source Dedup Destination Regular Cache Mobile Cache

H2

Feedback Cache Feedback

H4 H3 C3 H4 C4 H5 C5 H2 C2

slide-9
SLIDE 9
  • Feedback is sent reactively
  • Feedback is sent only when there is downstream traffic
  • Feedback sent is specific to the ongoing traffic

When is feedback sent?

9

Dedup Source Dedup Destination Feedback Downstream traffic Downstream traffic Feedback

slide-10
SLIDE 10
  • Hashes at dedup destination can be organized as per:

– Order of arrival – Same flow (Src IP, Dest IP, Src Port, Dest Port ) – Same object (HTML , JPEG or CSS)

  • Objects help in effectively matching new and old content
  • Application agnostic estimate of objects are flowlets

Where from is feedback selected?

10

H3 H4 H5 H7 H6 H8 H1 H2 H9 H10 H7 H6 H8 H9 H5 H3 H4 H1 H2 H10 H5 H3 H4 H7 H6 H8 H9 H10 H1 H2

slide-11
SLIDE 11
  • Sequence of bytes in a flow is a time-series
  • Flowlets are piecewise stationary segments of a flow
  • Check for flowlet boundary at start of each packet
  • Consider byte series B[0:m](1st packet), B[m+1:n] (2nd packet) and

B[0:n] as autoregressive processes of order p: Bi= Σ1<=j<=p aiBi-j + σε , ε is white noise

  • d[0:m:n] = gain(B[0:n]) – gain(B[0:m]) – gain(B[m+1:n])

Gain in the noise power when B[0:n] is in one flowlet instead of different flowlets: B[0:m] and B[m+1:n]

  • If d[0:m:n] > dthresh , then flowlet boundary exists at m

How are flowlets extracted?

11

B0, B1, ….., Bm, Bm+1, ……, Bn B[0:n] B[0:m] B[m+1:n]

slide-12
SLIDE 12
  • Find best matching past flowlet

How is feedback selected?

12

slide-13
SLIDE 13
  • Find best matching past flowlet

How is feedback selected?

13

H1 H2

slide-14
SLIDE 14
  • Find best matching past flowlet

How is feedback selected?

14

F1, F3 F1, F2, F3 H1 H2 H1 H2

slide-15
SLIDE 15

Last hash matched

  • Find best matching past flowlet

How is feedback selected?

15

F1, F3 F1, F2, F3 H1 H2 H1 H2 F1: H1, H2, H4, H5, H6, H7, H8, H9, H10, H11, H12, ……. F2: H2, H5, H10, …. F3: H5, H8, H11, H12, …..

slide-16
SLIDE 16

Last hash matched

  • Find best matching past flowlet

How is feedback selected?

16

F1, F3 F1, F2, F3 H1 H2 H1 H2 F1: H1, H2, H4, H5, H6, H7, H8, H9, H10, H11, H12, ……. F2: H2, H5, H10, …. F3: H5, H8, H11, H12, …..

slide-17
SLIDE 17

Last hash matched

  • Find best matching past flowlet

– Flowlet 1 (F1) is best matched

How is feedback selected?

17

F1, F3 F1, F2, F3 H1 H2 H1 H2 F1: H1, H2, H4, H5, H6, H7, H8, H9, H10, H11, H12, ……. F2: H2, H5, H10, …. F3: H5, H8, H11, H12, …..

slide-18
SLIDE 18

Last hash matched

  • Find best matching past flowlet

– Flowlet 1 (F1) is best matched

  • Find start of next feedback in the best matching flowlet

How is feedback selected?

18

F1, F3 F1, F2, F3 H1 H2 H1 H2 F1: H1, H2, H4, H5, H6, H7, H8, H9, H10, H11, H12, ……. F2: H2, H5, H10, …. F3: H5, H8, H11, H12, …..

slide-19
SLIDE 19

Last hash matched

  • Find best matching past flowlet

– Flowlet 1 (F1) is best matched

  • Find start of next feedback in the best matching flowlet

How is feedback selected?

19

H1, H2,

Best matching past flowlet

H4, H5, H6,H7, H8, H9,H10, H11, H12, H13,

F1, F3 F1, F2, F3 H1 H2 H1 H2 F1: H1, H2, H4, H5, H6, H7, H8, H9, H10, H11, H12, ……. F2: H2, H5, H10, …. F3: H5, H8, H11, H12, …..

slide-20
SLIDE 20

Last hash matched

  • Find best matching past flowlet

– Flowlet 1 (F1) is best matched

  • Find start of next feedback in the best matching flowlet

How is feedback selected?

20

Last hash matched

H1, H2,

Best matching past flowlet

H4, H5, H6,H7, H8, H9,H10, H11, H12, H13,

F1, F3 F1, F2, F3 H1 H2 H1 H2 F1: H1, H2, H4, H5, H6, H7, H8, H9, H10, H11, H12, ……. F2: H2, H5, H10, …. F3: H5, H8, H11, H12, …..

slide-21
SLIDE 21

Last hash matched

  • Find best matching past flowlet

– Flowlet 1 (F1) is best matched

  • Find start of next feedback in the best matching flowlet

How is feedback selected?

21

Last hash matched Last hash advertised

H1, H2,

Best matching past flowlet

H4, H5, H6,H7, H8, H9,H10, H11, H12, H13,

F1, F3 F1, F2, F3 H1 H2 H1 H2 F1: H1, H2, H4, H5, H6, H7, H8, H9, H10, H11, H12, ……. F2: H2, H5, H10, …. F3: H5, H8, H11, H12, …..

slide-22
SLIDE 22

Last hash matched

  • Find best matching past flowlet

– Flowlet 1 (F1) is best matched

  • Find start of next feedback in the best matching flowlet

– δ : temporal offset

How is feedback selected?

22

Last hash matched Last hash advertised δ

H1, H2,

Best matching past flowlet

H4, H5, H6,H7, H8, H9,H10, H11, H12, H13,

F1, F3 F1, F2, F3 H1 H2 H1 H2 F1: H1, H2, H4, H5, H6, H7, H8, H9, H10, H11, H12, ……. F2: H2, H5, H10, …. F3: H5, H8, H11, H12, …..

slide-23
SLIDE 23

Last hash matched

  • Find best matching past flowlet

– Flowlet 1 (F1) is best matched

  • Find start of next feedback in the best matching flowlet

– δ : temporal offset

How is feedback selected?

23

Last hash matched Last hash advertised Start of next feedback δ

H1, H2,

Best matching past flowlet

H4, H5, H6,H7, H8, H9,H10, H11, H12, H13,

F1, F3 F1, F2, F3 H1 H2 H1 H2 F1: H1, H2, H4, H5, H6, H7, H8, H9, H10, H11, H12, ……. F2: H2, H5, H10, …. F3: H5, H8, H11, H12, …..

slide-24
SLIDE 24
  • Dedup source maintains a feedback cache along with

regular cache of baseline dedup

How is the feedback used?

24

Dedup Source Regular Cache Feedback Cache

slide-25
SLIDE 25
  • Dedup source maintains a feedback cache along with

regular cache of baseline dedup

  • Regular cache is populated by downstream data

How is the feedback used?

25

Dedup Source Regular Cache Feedback Cache H1 H2

slide-26
SLIDE 26
  • Dedup source maintains a feedback cache along with

regular cache of baseline dedup

  • Regular cache is populated by downstream data

How is the feedback used?

26

Dedup Source Regular Cache Feedback Cache H1 H2

slide-27
SLIDE 27
  • Dedup source maintains a feedback cache along with

regular cache of baseline dedup

  • Regular cache is populated by downstream data
  • Feedback hashes are inserted in feedback cache

How is the feedback used?

27

Dedup Source Regular Cache Feedback Cache H3 H4 H1 H2

slide-28
SLIDE 28
  • Dedup source maintains a feedback cache along with

regular cache of baseline dedup

  • Regular cache is populated by downstream data
  • Feedback hashes are inserted in feedback cache

Every downstream packet is deduped using both regular and feedback cache

How is the feedback used?

28

Dedup Source Regular Cache Feedback Cache H3 H4 H1 H2

slide-29
SLIDE 29

Design Summary

  • When is the feedback sent?
  • Where from is the feedback

chosen?

  • How are flowlets extracted?
  • How is the feedback

selected?

  • How is the feedback used?
  • Reactively
  • Flowlets at dedup destination
  • Stationarity properties
  • Best matching flowlet and

pointers in past flowlet

  • Stored in the feedback cache

for dedup

29

slide-30
SLIDE 30
  • Data collection

– 25 laptop and 5 smartphone users over 3 months giving 26GB of unsecured downlink data – WiFi as well as 3G network – Packet sniffing through Wireshark and Tcpdump

  • Trace analysis

– Custom analyzer implemented in Python – Mimic mobility by splitting trace into two halves: past and present – Past trace populates the initial cache at the dedup destination This is the data remembered from previous networks access – 30 random connections from the present create ongoing traffic – Dedup is performed using asymmetric caching

Trace Based Analysis

30

slide-31
SLIDE 31
  • Redundancy identified

Trace Analysis Results - I

31

Asymmetric caching leverages significant portion of the achievable redundancy

# Redundant bytes found by asymmetric caching Actual # redundant bytes x 100 Averageasymmetric caching Averagesymmetric caching

slide-32
SLIDE 32
  • Feedback efficiency

Trace Analysis Results - II

32

Asymmetric caching generates efficient and relevant feedback

# Bytes saved downstream # Bytes sent upstream Split of total hits across the caches at dedup source

slide-33
SLIDE 33
  • Network layer approaches

– Spring et al, “A protocol-independent technique for eliminating redundant network traffic”. SIGCOMM, 2000. – Aggarwal et al, “EndRE: an end-system redundancy elimination service for enterprises”. NSDI, 2010. – Shen et al, “REfactor-ing content overhearing to improve wireless performance”. MobiCom, 2011

  • Transport layer approaches

– Zohar et al, “The power of prediction: cloud bandwidth and cost reduction”, SIGCOMM 2011

  • Application layer approaches

– Web browser caches and proxies – Content Distribution Networks (CDNs)

Related Work

33

slide-34
SLIDE 34
  • A dedup strategy that leverages past remembered on mobile devices

to perform dedup at any dedup source

  • Application agnostically estimate different objects in a flow by using

stationarity properties of different content

  • Trace analysis of 30 users shows that asymmetric caching:

– Leverages 89% of achievable redundancy – Gives 6x feedback efficiency

  • Prototype implementation on Linux desktop and Android

smartphone with deployable overheads

  • Future Work:

– Upstream dedup, i.e. reduce redundant content sent upstream – Extending dedup to end-to-end encrypted traffic – Study energy impact of asymmetric caching on mobile devices

Conclusion and Future Work

34

slide-35
SLIDE 35

Questions?

35