design implementation of a portable file synchronisation
play

Design & Implementation of a Portable File Synchronisation - PowerPoint PPT Presentation

National Technical University of Athens Design & Implementation of a Portable File Synchronisation Mechanism for a Cloud Storage Environment Supervisor Prof. Nektarios Koziris Assistant Supervisor Dr. Vangelis Koukis Candidate Vasilis


  1. National Technical University of Athens Design & Implementation of a Portable File Synchronisation Mechanism for a Cloud Storage Environment Supervisor Prof. Nektarios Koziris Assistant Supervisor Dr. Vangelis Koukis Candidate Vasilis Gerakaris 2/9/2015

  2. Table of Contents Future Work . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 2 of 21 Comparison with existing sofuware Introduction Local deduplication - FUSE Local Block Storage Directory Monitoring Request Qveuing Optimisations Core Classes / API Syncing algorithm Design & Implementation .

  3. Table of Contents Future Work . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 3 of 21 Comparison with existing sofuware Introduction Local deduplication - FUSE Local Block Storage Directory Monitoring Request Qveuing Optimisations Core Classes / API Syncing algorithm Design & Implementation .

  4. Introduction 4 of 21 . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment . (i) - The problem Important Qvalities Why is it needed? difgerent locations, following certain rules. File Synchronisation : The process of updating files in two or more • Copying files between difgerent computers • Backups ✓ Needs to detect & handle update conflicts/renames/deletions ✓ Needs to be reliable (no errors) ✓ Needs to be efgicient

  5. Introduction 4 of 21 . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment We focus on a more specific aspect of the problem. (i) - The problem (cont) Sofuware designed for that purpose already exists, namely: difgerent locations, following certain rules. File Synchronisation : The process of updating files in two or more . • rsync • ownCloud • Dropbox • Google Drive

  6. Large Similar Files their images and snapshots. . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 5 of 21 ~okeanos, etc) and there should be a way to efgiciently synchronise (i) - Definition Many VMs are being used on cloud service providers (Amazon AWS, Why are they important? Examples: VM images, VM snapshots Files that satisfy the following two requirements: What are they? . • Are large in size (several GBs) • Have a lot of their data in common

  7. Large Similar Files (ii) - Definition (cont) . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 5 of 21 . Custom image User B User A Connect Upload Compute Service Object Storage Service Clone Register File Image File Files Snapshot VMs Store Snapshot File

  8. Large Similar Files (iii) - Definition (cont) We can use these similarities to optimise the synchronisation! 5 of 21 A Portable File Synchronisation Mechanism for a Cloud Storage Environment . . . . . . Snapshot t 0 Snapshot t 1 Snapshot t 2

  9. Table of Contents Future Work . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 6 of 21 Comparison with existing sofuware Introduction Local deduplication - FUSE Local Block Storage Directory Monitoring Request Qveuing Optimisations Core Classes / API Syncing algorithm Design & Implementation .

  10. Syncing algorithm 7 of 21 . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment . (i) - Modification detection Need to know what to do in the following cases: Why we need history data: Faster alternative: Use last modification time as an indicator. Modification detection : Comparison of hash digests ✓ Reliable ✗ Very slow, especially on large files • File exists on both locations and is difgerent • File exists on A but not on B (or vice-versa)

  11. Syncing algorithm No Change Created (ETag = J) Created (ETag = K) Deleted Deleted No Action Deleted No Change Delete B Modified Update B (ii) - Initial algorithm Modified (ETag = J) Modified (ETag = K) (b) Syncing actions based on file states 7 of 21 A Portable File Synchronisation Mechanism for a Cloud Storage Environment . . . . . No Action Created (ETag = J) Created (ETag = J) No Change Change Does not Exist Exists Created Exists Does not Exist Deleted Exists (ETag = J) No Action Exists (ETag = J) Exists (ETag = J) Exists (ETag = K) Modified (a) File change detection between two points in time File replica A File replica B Action No Change No Change . Time T 1 Time T 2 Merge ∗ Merge ∗

  12. Syncing algorithm 3. Detect updates from Remote Directory . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 7 of 21 2. Detect updates from StateDB (iii) - What we propose 1. Detect updates from Local Directory (Remote) in three steps: successful sync on a local state database (StateDB). Our solution for syncing with a central metadata server Limitations . ✗ Can't detect renames (or worse, renames & modifications) • Store the metadata of all files, as they were during the last • Reconcile local directory replicas (Local) and remote server replicas • FCFS updates on conflicts, with conflicting copies being renamed.

  13. 3-step synchronisation yes yes no yes no yes no yes no no New yes no 8 of 21 A Portable File Synchronisation Mechanism for a Cloud Storage Environment . . . . . local file Conflict (i) - Updates from Local Directory File exists phash exists in StateDB? Local modtime == StateDB modtime? inode exists in StateDB? No local change on Remote? on remote? StateDB ETag == Remote Etag? Local modified Local modified Conflict Renamed File exists .

  14. 3-step synchronisation no StateDB modtime? Deleted Local modified yes no yes yes modified no 8 of 21 A Portable File Synchronisation Mechanism for a Cloud Storage Environment . . . . . Local modtime == Remote (ii) - Updates from StateDB Local exists, File exists on local/remote? Local exists, Remote exists Local doesn't exist, Remote exists Local doesn't exist, Remote doesn't exist Remote doesn't exist Deleted No change Deleted inode exists in StateDB? Renamed / Deleted Remote ETag == StateDB Etag? .

  15. 3-step synchronisation yes . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 8 of 21 no yes no yes no Conflict (iii) - Updates from Remote Directory modified Remote StateDB modtime? Local modtime == changes No remote file remote New StateDB ETag? Remote ETag == in StateDB? phash exists .

  16. Core Classes / API with the Pithos+ service ofgered by ~okeanos. . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 9 of 21 . What we have done: example. directories and cloud storage services. API functions are implemented. synchronise files with any cloud storage service, as long as some • Built a cross-platform framework in Python that can be used to • Created abstract classes for representations of files, filesystem • Implemented a class that uses the Synnefo (Pithos) API as an • Created a proof-of-concept application that syncs a local directory

  17. Core Classes / API indexing in the StateDB. Assumed unique . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 10 of 21 Assumed unique for each file version. for each file path. relative path string. It is used for fast (i) - FileStat represent file objects The core class used in this framework to . FileStat phash: int • phash : The (integer) hash digest of the path: str inode: int modtime: int type: int etag: str • etag : The ETag (sha-256 digest) of the file.

  18. Core Classes / API returns None . . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 10 of 21 were modified since the last sync. (ii) - LocalDirectory objects. . LocalDirectory sync_dir: str + get_all_objects_fstat() + get_modified_objects_fstat() + get_file_fstat(str path) • get_all_objects_fstat : Returns all local files' metadata as FileStat • get_modified_objects_fstat : Return file metadata only for the files that • get_file_fstat : Returns the FileStat object for the file path if it exists, else

  19. Core Classes / API (iii) - CloudClient . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 10 of 21 update_object() is used for existing files. upload_object() is used for new files To properly handle race conditions: Closely resembles the OpenStack API (used by synnefo as well). . CloudClient PithosClient pithos: PithosClient + get_object_fstat(str path) + init(str auth_URL, str auth_token, str ca_certs_path) + get_all_objects_fstat() - _modtime_from_remote(dict remote_obj) + download_object(str path, file fd) + upload_object(str rel_path, str sync_dir) - _is_directory_from_remote(dict remote_obj) + update_object(str rel_path, str sync_dir, str etag) - _etag_from_remote(dict remote_obj) + delete_object(str path) - _fstat_from_metadata(dict obj_metadata, str path) + rename_object(str old_path, str new_path)

  20. Table of Contents Future Work . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 11 of 21 Comparison with existing sofuware Introduction Local deduplication - FUSE Local Block Storage Directory Monitoring Request Qveuing Optimisations Core Classes / API Syncing algorithm Design & Implementation .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend