Application- -specific specific Application Compression for - - PowerPoint PPT Presentation

application specific specific application compression for
SMART_READER_LITE
LIVE PREVIEW

Application- -specific specific Application Compression for - - PowerPoint PPT Presentation

Application- -specific specific Application Compression for Remote Compression for Remote Visualization of Genomics Visualization of Genomics Applications Applications Lars Ailo Ailo Bongo, Kai Li, Olga Bongo, Kai Li, Olga Lars


slide-1
SLIDE 1

Application Application-

  • specific

specific Compression for Remote Compression for Remote Visualization of Genomics Visualization of Genomics Applications Applications

Lars Lars Ailo Ailo Bongo, Kai Li, Olga Bongo, Kai Li, Olga Troyanskaya Troyanskaya, Tore Larsen and , Tore Larsen and Grant Wallace Grant Wallace

slide-2
SLIDE 2

Outline Outline

  • Motivation

Motivation

  • Genomics applications

Genomics applications

  • WAN challenges

WAN challenges

  • Compression

Compression

  • Methodology

Methodology

  • Compression results

Compression results

  • System

System

  • Conclusion and future work

Conclusion and future work

slide-3
SLIDE 3

Functional Genomics Functional Genomics

  • Describe the function and

Describe the function and interaction of genes. interaction of genes.

  • Search for patterns in

Search for patterns in microarray microarray data. data.

  • Hundreds of

Hundreds of measurements for measurements for thousands of genomes. thousands of genomes.

  • Visualizations important.

Visualizations important.

slide-4
SLIDE 4

Genomic Applications Genomic Applications

  • Example screenshots

Example screenshots

slide-5
SLIDE 5

Remote Collaboration Remote Collaboration

  • Important.

Important.

  • "The sequence of the human genome,"

"The sequence of the human genome," by J. Craig by J. Craig Venter and 284 others Venter and 284 others, Science, 291(5507):1304 , Science, 291(5507):1304-

  • 51,

51, 16 February 2001. 16 February 2001.

  • Challenges:

Challenges:

  • Performance: bandwidth and latency.

Performance: bandwidth and latency.

  • Privacy.

Privacy.

  • Security.

Security.

  • Ease of use.

Ease of use.

slide-6
SLIDE 6

Goal Goal

WAN WAN

slide-7
SLIDE 7

Thin Thin-

  • client Remote Visualization

client Remote Visualization

  • Only share pixels

Only share pixels

  • Put rectangle of pixel data at a given x, y position

Put rectangle of pixel data at a given x, y position

  • Examples

Examples

  • VNC

VNC

  • Microsoft Remote Desktop

Microsoft Remote Desktop

  • Microsoft

Microsoft Livemeeting Livemeeting

  • Advantages:

Advantages:

  • Only share visualizations, not raw data.

Only share visualizations, not raw data.

  • Very simple clients

Very simple clients portable portable

  • Thick servers

Thick servers easy data management easy data management

  • Disadvantage

Disadvantage

  • Low Performance

Low Performance

  • High bandwidth requirements

High bandwidth requirements

slide-8
SLIDE 8

Bandwidth Requirements Bandwidth Requirements

slide-9
SLIDE 9

Outline Outline

  • Motivation

Motivation

  • Compression

Compression

  • Lossless compression algorithms

Lossless compression algorithms

  • Rabin fingerprints

Rabin fingerprints

  • 2D anchoring schemes

2D anchoring schemes

  • Methodology

Methodology

  • Compression results

Compression results

  • System

System

  • Conclusion and future work

Conclusion and future work

slide-10
SLIDE 10

Compression Compression

  • Lossy

Lossy

  • Update frequency reduction

Update frequency reduction

  • Color reduction

Color reduction

  • Jpeg (frequency

Jpeg (frequency downsampling downsampling) )

  • Lossless

Lossless

  • Diff

Diff

  • Run

Run-

  • length encoding (RLE)

length encoding (RLE)

  • Fingerprinting (FP)

Fingerprinting (FP)

  • Our approach: diff + FP + RLE

Our approach: diff + FP + RLE

slide-11
SLIDE 11

Diff Diff

  • Only send what has been changed since last

Only send what has been changed since last update update

  • VNC does this: works well for text editing

VNC does this: works well for text editing

slide-12
SLIDE 12

Diff (2) Diff (2)

  • Problem: how to

Problem: how to detect what has been detect what has been changed? changed?

  • Scrolling only moves

Scrolling only moves pixels. pixels.

  • Scrolling important for

Scrolling important for Genomics Genomics applications. applications.

slide-13
SLIDE 13

Bits Changed Between Synchronization Events Bits Changed Between Synchronization Events

slide-14
SLIDE 14

Run Run-

  • length encoding (RLE)

length encoding (RLE)

  • Zlib

Zlib

  • DEFLATE = LZ77 + Huffman

DEFLATE = LZ77 + Huffman

  • Example:

Example:

  • AAAAABBBBCCCDDE

AAAAABBBBCCCDDE 5*A 4*B 3*C DDE 5*A 4*B 3*C DDE

  • A = 10001001

A = 10001001 A = 01 A = 01

  • VNC

VNC Hextile Hextile

  • Raw pixels and encoded pixels

Raw pixels and encoded pixels

  • Rectangles with a single color

Rectangles with a single color

  • Rectangles with same color as previous

Rectangles with same color as previous

slide-15
SLIDE 15

Fingerprinting Fingerprinting -

  • Example

Example

  • All work and no play makes Jack a dull

All work and no play makes Jack a dull

  • boy. All work and no play makes Jack a
  • boy. All work and no play makes Jack a

dull boy. All work and no play makes Jack dull boy. All work and no play makes Jack a dull boy. All work and no play makes a dull boy. All work and no play makes Jack a dull boy. All work and no play Jack a dull boy. All work and no play makes Jack a dull boy. All work and no makes Jack a dull boy. All work and no play makes Jack a dull boy. All work and play makes Jack a dull boy. All work and no play makes Jack a dull boy. All work no play makes Jack a dull boy. All work and no play makes Jack a dull boy. and no play makes Jack a dull boy.

slide-16
SLIDE 16

Select Anchor Points Select Anchor Points

  • All

All work and no play makes Jack a dull work and no play makes Jack a dull boy.

  • boy. All

All work and no play makes Jack a work and no play makes Jack a dull boy. dull boy. All All work and no play makes Jack work and no play makes Jack a dull boy. a dull boy. All All work and no play makes work and no play makes Jack a dull boy. Jack a dull boy. All All work and no play work and no play makes Jack a dull boy. makes Jack a dull boy. All All work and no work and no play makes Jack a dull boy. play makes Jack a dull boy. All All work and work and no play makes Jack a dull boy. no play makes Jack a dull boy. All All work work and no play makes Jack a dull boy. and no play makes Jack a dull boy.

slide-17
SLIDE 17

Calculate Hash for Regions Calculate Hash for Regions

  • hash(

hash(“ “All All work and no play makes Jack a work and no play makes Jack a dull boy. dull boy. “ “) ) 0x12ad82b3 0x12ad82b3

  • Send:

Send:

  • (

(0x12ad82b3, 0x12ad82b3, “ “All work and no play makes All work and no play makes Jack a dull boy. Jack a dull boy. “ “) )

  • 0x12ad82b3

0x12ad82b3

  • 0x12ad82b3

0x12ad82b3

slide-18
SLIDE 18

Rabin Fingerprints Rabin Fingerprints

  • Fast sliding window algorithm

Fast sliding window algorithm

  • All work and no play makes Jack a dull boy.

All work and no play makes Jack a dull boy.

0x83af

slide-19
SLIDE 19

Rabin Fingerprints Rabin Fingerprints

  • Fast sliding window algorithm

Fast sliding window algorithm

  • All work and no play makes Jack a dull boy.

All work and no play makes Jack a dull boy.

0x83af, 0x3241

slide-20
SLIDE 20

Rabin Fingerprints Rabin Fingerprints

  • Fast sliding window algorithm

Fast sliding window algorithm

  • All work and no play makes Jack a dull boy.

All work and no play makes Jack a dull boy.

0x83af, 0x3241, 0x31fa

slide-21
SLIDE 21

Rabin Fingerprints (2) Rabin Fingerprints (2)

  • All work and no play makes Jack a dull boy.

All work and no play makes Jack a dull boy.

  • Fixed window size [Spring and

Fixed window size [Spring and Wetherall Wetherall]: ]:

  • Use Rabin fingerprints as cache index

Use Rabin fingerprints as cache index

  • Select fingerprints based on

Select fingerprints based on k k least significant bits. least significant bits.

  • Variable window size [

Variable window size [Manber Manber]: ]:

  • Use Rabin fingerprints as anchor points

Use Rabin fingerprints as anchor points

  • Select fingerprints based on

Select fingerprints based on k k least significant bits. least significant bits.

  • Calculate SHA

Calculate SHA-

  • 1 hash value for region between

1 hash value for region between anchor points anchor points

  • Use SHA

Use SHA-

  • 1 value as cache index

1 value as cache index

0x83af, 0x3241, 0x31fa, 0x32ab, 0x3210, 0x9421, 0xab21, 0x32da, 0x31ab

slide-22
SLIDE 22

2D Data 2D Data

  • Previous work mostly

Previous work mostly for 1D for 1D bytestreams bytestreams. .

  • How to

How to vectorize vectorize 2D 2D array? array?

  • Our approach: anchor

Our approach: anchor then then vectorize vectorize 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 Z Z Y Y X X W W V V U U T T S S R R Q Q P P O O N N M M L L K K J J I I H H G G F F E E D D C C B B A A

slide-23
SLIDE 23

Anchoring Schemes Anchoring Schemes

  • Fixed to glass (VNC

Fixed to glass (VNC hextile hextile with caching) with caching)

  • Splatter (fixed window size)

Splatter (fixed window size)

  • (

(Supertile Supertile) )

  • Supercolumn

Supercolumn

slide-24
SLIDE 24

Fixed to Glass

slide-25
SLIDE 25

Splatter

slide-26
SLIDE 26

Supercolumn

slide-27
SLIDE 27

Outline Outline

  • Motivation

Motivation

  • Compression

Compression

  • Methodology

Methodology

  • Trace capturing and playback

Trace capturing and playback

  • Compression results

Compression results

  • System

System

  • Conclusion and future work

Conclusion and future work

slide-28
SLIDE 28

Methodology Methodology

  • Record user interaction, playback trace

Record user interaction, playback trace

  • Trace capturing

Trace capturing

  • Java GUI instrumentation

Java GUI instrumentation

  • VNC client recorder

VNC client recorder

  • Trace playback

Trace playback

  • Simulation: compress screenshots

Simulation: compress screenshots

  • Java Robot class

Java Robot class

  • Pixel based synchronization (slow down emulated user if

Pixel based synchronization (slow down emulated user if system is slow) system is slow)

  • VNC Playback

VNC Playback

slide-29
SLIDE 29

Traces Traces

  • Java GUI events for three Genomics

Java GUI events for three Genomics applications applications

  • Synthetic traces

Synthetic traces

  • ~10

~10-

  • 15 minutes each

15 minutes each

  • Single screen and on display wall

Single screen and on display wall

  • VNC client recordings

VNC client recordings

  • Six biologist doing real work

Six biologist doing real work

  • Several hour long traces

Several hour long traces

  • Work in progress

Work in progress

slide-30
SLIDE 30

Outline Outline

  • Motivation

Motivation

  • Compression

Compression

  • Methodology

Methodology

  • Compression results

Compression results

  • Applications

Applications

  • Compression ratios

Compression ratios

  • System

System

  • Conclusion and future work

Conclusion and future work

slide-31
SLIDE 31

GeneVaND

slide-32
SLIDE 32

Genomic Applications Genomic Applications

  • Example screenshots

Example screenshots

Treeview

slide-33
SLIDE 33

TIGR MeV

slide-34
SLIDE 34

Compression Results Compression Results

17.3 17.3 90.0 90.0 23.8 23.8 Supercolumn Supercolumn 18.3 18.3 Supertile Supertile 9.5 9.5 Splatter Splatter 16.1 16.1 30.6 30.6 15.9 15.9 Fixed Fixed-

  • to

to-

  • glass

glass 14.8 14.8 19.2 19.2 13.6 13.6 Zlib Zlib 1 (3.2) 1 (3.2) 1 (6.3) 1 (6.3) 1 (1.6) 1 (1.6) Hextile Hextile* * Tigr Tigr MeV MeV TreeView TreeView GeneVaND GeneVaND Compression Compression

slide-35
SLIDE 35

Compression Cost Breakdown Compression Cost Breakdown (Server) (Server)

30 ms (38%) 30 ms (38%) 88 ms (82%) 88 ms (82%) Zlib Zlib 77 ms 77 ms 106 ms 106 ms SUM SUM 0.06 ms (0%) 0.06 ms (0%) 0 ms 0 ms Cache lookup Cache lookup 2 ms (3 %) 2 ms (3 %) 0 ms 0 ms SHA SHA-

  • 1

1 18 ms (23%) 18 ms (23%) 18 ms (18%) 18 ms (18%) Diff Diff 1 ms (2%) 1 ms (2%) 0 ms 0 ms Select anchors Select anchors 26 ms (34%) 26 ms (34%) 0 ms 0 ms Rabin fingerprint Rabin fingerprint Supertile Supertile Zlib Zlib Operation Operation

slide-36
SLIDE 36

Bandwidth Reduction Bandwidth Reduction

slide-37
SLIDE 37

Outline Outline

  • Motivation

Motivation

  • Compression

Compression

  • Methodology

Methodology

  • Compression results

Compression results

  • System

System

  • Design

Design

  • Conclusion and future work

Conclusion and future work

slide-38
SLIDE 38

System Architecture System Architecture

VNC Server VNC Client Data Wrapper Anchor FP & Cache Client Cache VNC*/LAN DLS2VNC VNC/LAN Custom/WAN VNC Server VNC Client Data Storage Anchor FP & Cache Client Cache VNC/LAN DSL2VNC VNC/LAN VNC2DSL Custom/WAN

slide-39
SLIDE 39

Outline Outline

  • Motivation

Motivation

  • Compression

Compression

  • Methodology

Methodology

  • Compression results

Compression results

  • System

System

  • Conclusion and future work

Conclusion and future work

slide-40
SLIDE 40

Conclusion Conclusion

  • Genomics applications have higher bandwidth

Genomics applications have higher bandwidth requirements than Office applications. requirements than Office applications.

  • Fingerprint +

Fingerprint + Zlib Zlib compression ratio is up to 4.7 compression ratio is up to 4.7 times better than times better than Zlib Zlib. .

  • Anchoring scheme important.

Anchoring scheme important.

  • Works best for scrolling.

Works best for scrolling.

  • Fingerprint +

Fingerprint + Zlib Zlib compression can have higher compression can have higher throughput than throughput than Zlib Zlib. .

  • VNC bandwidth usage limited by VNC server

VNC bandwidth usage limited by VNC server performance. performance.

slide-41
SLIDE 41

Current and Future Work Current and Future Work

  • Improve fingerprinting performance

Improve fingerprinting performance

  • Faster Rabin implementation

Faster Rabin implementation

  • Reorder compression operations

Reorder compression operations

  • Full system performance evaluation

Full system performance evaluation

  • Real trace evaluation

Real trace evaluation

  • Planetlab

Planetlab deployment deployment

  • Access control

Access control