Fast and Secure H T Y T O H Laptop Backups F G R E U D - PowerPoint PPT Presentation

I V N E U R S E I Fast and Secure H T Y T O H Laptop Backups F G R E U D B I N with Encrypted De-duplication Le Zhang <zhang.le@ed.ac.uk> Paul Anderson <dcspaul@ed.ac.uk> LISA 2010

Laptop Backup Options External Hard Drive No offsite storage ? What if I have a break-in? Or there is a fire? I need a very large capacity to handle archival storage as well ...

Laptop Backup Options DVDs are only small - I can only backups subsets of files ... Recordable CD/DVD I have to make multiple copies if I want offsite storage ...

Laptop Backup Options Broadband upload speeds are slow - 30 DAYS to upload 300Gb to cloud storage is typical ... Often, there is a transfer cost as well as a storage cost ... Cloud Storage

Laptop Backup Options External Hard Drive Recordable CD/DVD Cloud Storage

What do people do? Store no vital data Regular full backups Partial backups Keep copy on University machine Don’t do backups 11% 11% Don’t use laptop 5% When people bother keeping 25% 16% backups, they are mostly ad-hoc - and usually only involve hand- 33% selected subsets

What kind of data? User files Applications System files Perhaps a lot of the 29% system files and application files (at least) are common? 63% 8% From our sample of academic Mac laptop users

Shared Data SYS Storage Saving It seems like there is a good Actual Storage (TB) 0.5 Saved Storage (TB) deal of duplication among the SYS Storage (TB) 0.4 system and application files. 0.3 0.2 0.1 And this increases with the 0 0 5 10 15 20 25 number of machines Number of machines added APP Storage Saving But it is interesting that a 0.15 Actual Storage (TB) Saved Storage (TB) good many files are not APP Storage (TB) 0.1 common! So is it a good idea not to back up these 0.05 categories? 0 0 5 10 15 20 25 Number of machines added

Shared Data Obviously, there is less USR Storage Saving 1.4 Actual Storage (TB) sharing among the user data Saved Storage (TB) 1.2 USR Storage (TB) 1 - but the overall saving is still 0.8 significant 0.6 0.4 0.2 And we might expect a higher 0 0 5 10 15 20 25 Number of machines added degree of sharing among the user data for different Overall Storage Saving 2 communities - Actual Storage (TB) Saved Storage (TB) 1.5 for example, common music Storage (TB) files would make a big 1 difference ... 0.5 0 0 5 10 15 20 25 Number of machines added

Deduplication “Deduplication” is becoming very popular for saving space when storing multiple copies of the same file A “hash” (digital signature) is generated from the contents of the file Two files with the same content will have the same hash Two files with different contents have a very high chance of having different hashes Use the hash as the name of the stored file

Block sizes Deduplicating at the 6 x 10 3.5 block level is more 3 efficient than the file 2.5 level. Frequency 2 1.5 What is an appropriate 1 0.5 block size? 0 0 10Bytes 1K 100K 1MB 1GB 10GB File size distribution (in log10 domain) a. Data duplication rate vs block size b. Actual storage needed vs block size c. Number of backup objects vs block size 32.5 1.4 40 All Objs 32 Stored Objs Actual Storage (TB) Duplication Rate % 31.5 30 Million Objects 1.35 31 30.5 20 30 1.3 29.5 10 29 28.5 1.25 0 128K 256K 512K 1024K File 128K 256K 512K 1024K File 128K 256K 512K 1024K File

Deduplication problems? Most de-duplication systems work at the storage level This has two problems in our application .. If the data is encrypted “at source” (with different keys) then the deduplication is defeated (the cipher text will be different) The full data still has to be transmitted to the “server” - and this time is a more significant problem than the storage!

Convergent Encryption “Convergent Encryption” neatly solves the first problem ... Files are encrypted using the hash of the data as the key Files containing the same data will encrypt to the same cypher text and hence deduplication continues to work File owners will have the key (because they originally had the data) and will be able to decrypt the data - others won’t

Managing keys Each (unique) file now has a separate key which we need to manage Our solution creates a “data object” for each directory which contains the keys for the children, as well as their metadata The directory object is then encoded and stored in the same was as a normal file The user only has to record the key for the root object Entire duplicate subtrees can be detected

Avoiding Transmission To avoid transmitting data which already exists on the server, we need to do the deduplication on the source system Many services (eg. Amazon) don’t provide the necessary interfaces for the client to communicate directly There are several approaches to this, depending on specific application ... • A private server • A local “caching” server for a remote cloud service

A Protoype Changed  les Local FS Events Disk Meta Update Local Meta DB Files A Mac OsX client List of  les to backup Backup status update Backup Manager A local (departmental, home) Data Compression (Optional) server which performs hash Symmetric Encryption with key checking, authentication and generated from block content high-speed caching before Encrypted blocks forwarding unique blocks to the cloud Upload Queue Upload threads Backup Server

Where next? Performance depends heavily on the characteristics of the data itself, and the underlying network/storage (eg. latency) • We would like to study this more We would like to develop a production quality client, and investigate a possible service in a datacentre • we are looking for possible funding/partners

I V N E U R S E I Fast and Secure H T Y T O H Laptop Backups F G R E U D B I N with Encrypted De-duplication Le Zhang <zhang.le@ed.ac.uk> Paul Anderson <dcspaul@ed.ac.uk> LISA 2010

Fast and Secure H T Y T O H Laptop Backups F G R E U D - PowerPoint PPT Presentation

I V N E U R S E I Fast and Secure H T Y T O H Laptop Backups F G R E U D B I N with Encrypted De-duplication Le Zhang <zhang.le@ed.ac.uk> Paul Anderson <dcspaul@ed.ac.uk> LISA 2010 Laptop Backup Options

How Secure are Secure How Secure are Secure Interdomain Routing Protocols? Interdomain Routing

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Fast & Secure LTE Connectivity Solutions Fast & Secure LTE Connectivity Solutions Remote

Ubiquitous and Secure Networks and Services Ubiquitous and Secure Networks and Services

Secure Returns SAFE AS A VAULT Secure SECONDS TO UPLOAD Efficient ACCESSIBLE ANYWHERE

Fast Food and Your Health www.ddssafety.net Last updated October 2009 What is fast food?

Community Update MST T Fast st Facts cts MST T Fast st Facts cts MST T Fast st Facts

Lurssen 32,9 A classic fast Lurssen 32,9 A classic fast A F T D E C K Lurssen 32,9 A

Be Be Good! Good! WHA WHAT IS T IS SECURE? SECURE? SECURe is a school-wide program designed

and Web Applications 06 Secure Coding Alexandros Labrinidis University of Pittsburgh Secure

Semi-Secure Semi-Secure where a hash is a secure one-way function. A cookie is a bit of state

Toward a Field Study on the Impact of Hacking Competitions on Secure Development Daniel Votipka ,

SECURE Act Brian Graff SECURE Act Setting Every Community Up for Retirement Security (SECURE)

Advanced Tools from Modern Cryptography Lecture 12 MPC: UC-secure OT UC-Secure OT UC-secure

Petros Maniatis , Devdatta Akhawe, Kevin Fall, Elaine Shi, Stephen McCamant, Dawn Song Secure

Concurrently Secure Protocols Huijia (Rachel) Lin Rafael Pass MIT & BU Cornell Secure MPC

TB Slide Set, 2016 Surveillance Data NCHHSTP AtlasPlus National Center for HIV/AIDS, Viral

S chool Rove r STAND AND DELIVER TALA MORSEU FONT The weeks in Term 3 are flashing by and almost

History and Principles of Steganography CSM25 Secure Information Hiding Dr Hans Georg Schaathun

The Future Security Challenges in RFID Gildas Avoine, UCL Belgium Workshop in Information

GraFBoost: Using accelerated flash storage for external graph analytics Sang-Woo Jun, Andy

Sampling Plans and Initial Condition Problems For Continuous Time Duration Models James J.

tb NLO predictions on the ratio of t b and t tjj cross sections at the LHC Giuseppe

GES DISC Data Operations and Services Mike Theobald GES DISC Production Highlights

Sambuz

Useful Links

Newsletter

Mail Us

Fast and Secure H T Y T O H Laptop Backups F G R E U D - PowerPoint PPT Presentation

I V N E U R S E I Fast and Secure H T Y T O H Laptop Backups F G R E U D B I N with Encrypted De-duplication Le Zhang <zhang.le@ed.ac.uk> Paul Anderson <dcspaul@ed.ac.uk> LISA 2010 Laptop Backup Options

How Secure are Secure How Secure are Secure Interdomain Routing Protocols? Interdomain Routing

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Fast &amp; Secure LTE Connectivity Solutions Fast &amp; Secure LTE Connectivity Solutions Remote

Ubiquitous and Secure Networks and Services Ubiquitous and Secure Networks and Services

Secure Returns SAFE AS A VAULT Secure SECONDS TO UPLOAD Efficient ACCESSIBLE ANYWHERE

Fast Food and Your Health www.ddssafety.net Last updated October 2009 What is fast food?

Community Update MST T Fast st Facts cts MST T Fast st Facts cts MST T Fast st Facts

Lurssen 32,9 A classic fast Lurssen 32,9 A classic fast A F T D E C K Lurssen 32,9 A

Be Be Good! Good! WHA WHAT IS T IS SECURE? SECURE? SECURe is a school-wide program designed

and Web Applications 06 Secure Coding Alexandros Labrinidis University of Pittsburgh Secure

Semi-Secure Semi-Secure where a hash is a secure one-way function. A cookie is a bit of state

Toward a Field Study on the Impact of Hacking Competitions on Secure Development Daniel Votipka ,

SECURE Act Brian Graff SECURE Act Setting Every Community Up for Retirement Security (SECURE)

Advanced Tools from Modern Cryptography Lecture 12 MPC: UC-secure OT UC-Secure OT UC-secure

Petros Maniatis , Devdatta Akhawe, Kevin Fall, Elaine Shi, Stephen McCamant, Dawn Song Secure

Concurrently Secure Protocols Huijia (Rachel) Lin Rafael Pass MIT &amp; BU Cornell Secure MPC

TB Slide Set, 2016 Surveillance Data NCHHSTP AtlasPlus National Center for HIV/AIDS, Viral

S chool Rove r STAND AND DELIVER TALA MORSEU FONT The weeks in Term 3 are flashing by and almost

History and Principles of Steganography CSM25 Secure Information Hiding Dr Hans Georg Schaathun

The Future Security Challenges in RFID Gildas Avoine, UCL Belgium Workshop in Information

GraFBoost: Using accelerated flash storage for external graph analytics Sang-Woo Jun, Andy

Sampling Plans and Initial Condition Problems For Continuous Time Duration Models James J.

tb NLO predictions on the ratio of t b and t tjj cross sections at the LHC Giuseppe

GES DISC Data Operations and Services Mike Theobald GES DISC Production Highlights

Sambuz

Useful Links

Newsletter

Mail Us

Fast & Secure LTE Connectivity Solutions Fast & Secure LTE Connectivity Solutions Remote

Concurrently Secure Protocols Huijia (Rachel) Lin Rafael Pass MIT & BU Cornell Secure MPC