Synchronisation solutions, NDS2 Cryptobox Kamil Guryn, BIAMAN, - - PowerPoint PPT Presentation

synchronisation solutions
SMART_READER_LITE
LIVE PREVIEW

Synchronisation solutions, NDS2 Cryptobox Kamil Guryn, BIAMAN, - - PowerPoint PPT Presentation

Synchronisation solutions, NDS2 Cryptobox Kamil Guryn, BIAMAN, www.pb.edu.pl Maciej Brzeniak, PSNC, www.psnc.pl & NDS2 project partners: Presentation agenda Background Features of good synchronisation mechanisms Discussion of


slide-1
SLIDE 1

Synchronisation solutions,

NDS2 Cryptobox

Kamil Guryn, BIAMAN, www.pb.edu.pl Maciej Brzeźniak, PSNC, www.psnc.pl & NDS2 project partners:

slide-2
SLIDE 2
  • Background
  • Features of good synchronisation mechanisms
  • Discussion of features, example scenarios
  • Comparison of existing solutions
  • Cryptobox:
  • Motivation: why yet another solution?
  • Features of Cryptobox
  • Live demo

Presentation agenda

slide-3
SLIDE 3
  • Some NRENs offer data sync & share solutions
  • Some NRENs consider providing them:
  • Purchase of hosted services (e.g. NorduNET: box.com)
  • Purchase of licenses/support for private/community cloud services:
  • wnCloud
  • PowerFolder
  • Other initiatives
  • We need to understand technical aspects of sync solutions
  • The aim of this presentation is to:
  • Discuss selected features of synchronisation algorithms
  • Present algorithms presented in Cryptobox of NDS2project

Background

slide-4
SLIDE 4

Steps of the typical synchronisation process:

How it works:

○ filespace enumeration ○ detection = comparison of current and past state ○ reconciliation ■ planning propagation ■ conflicts resolution ○ propagation

slide-5
SLIDE 5

Synchronization problems - overview

  • track changes on the stores:

file identifiers support not present everywhere

async events reporting mechanism: inotify or FSwatcher

  • anchor-based sync needs specialized datastore:
  • e.g. simple SFTP server not capable
  • need to deal with concurrency during sync

which can break anchor-based logic;

  • consistency, missing changes, complexity
  • conflicts detection and resolving

unresolved items may lead to sync endless loop

  • concurrency of data access and synchronisation
  • locking for sync can interfere with user activities
  • concurrent file activities should be supported
  • unreliable networks, sync interrupts & errors
  • sync endless loops, statedb loss or corruption
  • and many more…
slide-6
SLIDE 6

What is needed to synchronise two data stores:

  • change detection:
  • client and/or server-side rename/move detection:

id-based vs hash-based

relative ordering of asynchronous events, e.g.:

rename within same dir detected

move outside dir = delete + create

combination of id/hash and fs events (e.g. Dropbox)

  • full-enumeration vs anchor-based change detection
  • client-side state database
  • needed for change detection
  • data transmission
  • proprietary vs open protocols

Data synchronisation methods and algorithms

slide-7
SLIDE 7
  • Intelligent name change / move detection on both sides
  • Concurrency resistant during detection and

synchronization

  • Conflicts resolving
  • Large datastores indexing efficiency
  • Support for advanced scenarios:

○ graceful cancellation of running sync ○ preview mode of the incremental sync operation without committing changes to the replicas ○ support for sync with partially equal file hierarchies

  • Security and privacy: client-side cryptography

○ Wuala and Spideroak does it client-side ○ Dropbox/Box does it server-side (PRISM….) ○

  • wnCloud does not plan supporting it

○ Cryptobox does it client-side

Important features of sync solution

slide-8
SLIDE 8

Intelligent rename/move detection - synchronization clue

Example scenario:

  • directory folderX contains 10GB of data
  • user on machine A moves folderX to folderZ
  • changes synchronized to server as move operation
  • in some solutions:

changes badly interpreted by sync client on machine B

this may result in unnecessary download of the data instead of name change / move propagation

while acceptable for small files, may be a killer for big data volumes

A B

slide-9
SLIDE 9

FileID-based - lightweight:

○ no need for files’ content analysis ○ does not require a lot of I/O ○ does not load CPU

Hash-based - resource consuming:

○ needs files’ contents analysis ○ I/O needed to read files content ○ load on CPU for calculation hashes

Intelligent change detection explained: FileID-based vs hash-based

slide-10
SLIDE 10
  • FileID must be provided/supported by (file)system
  • Works in:
  • Windows: NTFS, ReFS
  • Linux: EXT2/3/4
  • OS X: HFS+

Intelligent change detection explained: FileID-based

slide-11
SLIDE 11
  • FS events reporting (inotify, Fswatcher) is helper in detecting changes
  • BUT reliable interpretation of users activity is not trivial:

○ Common issues: ○ FS events are reported asynchronously (unordered)! ○ all events have to be analysed to ensure consistency ○ reliable interpretations of events not trivial ○ sync on concurrent environment (opposite concurrent file activities events, opposite folder hierarchical events) ○ Client-side issues: ○ Buffer overflows when many operations in short period of time may lead to missed events… ○ Server-side issues: ○ needs specialized server functionality ○ server-client events notify mechanisms needed to avoid costly full namespace scans

Sync issues: FS events analysis (1):

slide-12
SLIDE 12
  • peration

event logged rename folder\a.txt -> folder\b.txt rename move folder\a.txt -> folder2\a.txt delete folder\a.txt & create folder2\a.txt copy a.txt to local sync folder create (there is no event about relasing file lock - close a file) Client side: copy a.txt to folderA\folderB\folderC Sever-side: rename folderA to folderX copy a.txt to folderA\folderB\folderC is not valid anymore, should be propagated in correct order (plus concurrency resistance…) Example scenario where event logging fails to provide consistent state

Sync issues: FS events analysis (2):

slide-13
SLIDE 13

Scenario:

  • 1. File A.txt is synchronized across replicas
  • 2. Bob updates A.txt on machine1 (client)
  • 3. Alice updates A.txt on server

(by updating it on machine2 and performing sync), while Bob changes are currently pending This should not lead to lost of Alice changes.

Concurrency during detection and synchronization

slide-14
SLIDE 14

Scenario:

  • 1. Bob creates file A.txt on local machine1
  • 2. Alice also creates A.txt on server

(by creating it locally on machine2 and performing sync) while Bob changes are currently pending This leads to name conflict and should be automatically resolved with a no-data loss policy

Conflicts resolving

slide-15
SLIDE 15

Security and privacy: client-side cryptography

  • Support for client-side encryption desired these days:
  • People concerned about privacy
  • Some organisation ;) known to analyse our data
  • AES-256 CTR encryption
  • AES-256 recommended by NIST to 2031
  • CTR mode recommended by Niels Ferguson and Bruce Schneier
  • AES-NI supported on increasing number of CPU platforms
  • File names to be encrypted too! as they may be meaningfull
  • Integrity control

Lets users verify consistency of the data

Enables detecting failures and intended tweaks

SHA-512 considered to be safe beyond 2012 (SHA-1 only to 2012)

Fast implementations possible using regular CPUs

slide-16
SLIDE 16

Synchronisation solution – comparison *

Feature NDS Cryptobox

  • wnCloud

BOX.com Dropbox Spider Oak Power Folder Client & server-side move detection YES NO YES

YES

YES NO Real-time change detection YES NO

(client side only)

YES YES YES YES Files under user control YES YES NO NO NO YES Client-side encryption YES NO NO NO YES NO Concurrency resistant on detection and propagation YES NO

(may lead to chaos)

YES

(but goes through temp)

YES ??? YES Sync files of any size YES YES

(in multiple parts, then merge)

NO YES YES YES Synchronisation of any folder YES YES NO NO YES YES NO file-locking policy YES

(drobbox way)

NO NO

(sync goes through temp)

YES ??? NO Preview mode YES NO NO NO NO NO Live-sync NO NO NO YES NO NO

* Comparison made to our best knowledge. Based on documentation analysis and tests. If you notice any inconsistency, please contact us.

slide-17
SLIDE 17
  • Motivation: we were not happy with existing solutions: services,

tools, applications and libraries (we tested most of them)

  • Issues:

Simplistic / unreliable sync algorithms

Lack of client&server-side move/rename discovery

Lack of client-side encryption

Slow detection and propagation of changes

Reliability vs concurrency changes issues

File-locking policy makes local sync folder harder to use

Some solutions not able to pass even simple tests:

■ e.g. multiple folder rename while uploading leads to chaos on local computer ■ concurrent changes occur on one replica during detection and propagation ○

No way to benefit from special features of our NDS2 system:

■ inefficient enumeration makes sync too expensive for storage backend ■ fast & lightweight metadata enumeration needed

What is NDSCryptobox? (1)

slide-18
SLIDE 18
  • About NDS2 project

○ Secure Storage Cloud with efficient

and easy data access

○ More information: ■ nds.psnc.pl ■ NDS2 paper published in the TNC2013

  • NDSCryptobox in NDS2 architecture

○ one of client applications… ○ … accessing the system using SFTP ○ + server-side mechanisms: deltas, enumeration etc.

What is NDSCryptobox? (2)

slide-19
SLIDE 19
  • AES 256 CBC encryption using

per-file keys for data privacy:

○ Key hierarchy makes security high while keeping system easy to use ○ File keys stored in the file header, protected with user’s private RSA key ○ File names encryption using per-folder keys

  • SHA-512 digests for integrity check:

Digests calculated per 64-byte logical block

Stored with the data, encrypted

  • Data encrypted with AES-256 CTR
  • Plus key mgmt and protection

mechanisms to prevent data loss

More information: NDS2 paper submitted to TNC2013

Key NDS2 features: Security, client-side cryptography

slide-20
SLIDE 20
  • Efficient enumeration thanks to integration of sync

client with NDS2 system meta-data mechanisms

○ “find on 169 546 elements” takes: ■ 46.476s with required inode lookup over NFS

■ Tested on Bluearc Mercury 100 NAS Filer

■ 3.017s! by speaking directly to NDS2 meta-data;

■ DB “Select * on 169546 elements”

○ No overload in NDS2 system even for many concurrent clients enumerating files (we are curious how others can scale…)

  • other approaches

○ e.g. propfind with infinite on WebDAV may cause high overload

  • n filesystem and on Apache/PHP stack

○ request/response XML/HTTP overhead may lead to bad performance on-the-wire, timeouts, etc.

Some tricks of NDSCryptobox

slide-21
SLIDE 21
  • May talk to regular SFTP server:

○ easy to deploy in user’s infrastructure ○ login/pass or keys can be used for authorisation ○ easy to integrate with existing auth systems: LDAP, etc.

  • thers support:

  • wnCloud: WebDAV (with hooks)

■ proprietary protocols: SpiderOak, PowerFolder, (Drop)box

  • Virtual-IO:

○ enables developing drivers for other backends

■ currently: SFTP, NDS2 ■ in future: WebDAV, WebDAV/ownCloud, S3, Mega…

  • Supports multiple directories sync

○ SugarSync does, many others don’t

Other features of NDS Cryptobox

slide-22
SLIDE 22

Cryptobox limitations

It’s not yet resolving file vs folder name clash. This causes sync endless loop (operation won’t be resolved on other replica)

slide-23
SLIDE 23
  • Linux/OSX support (currently only Windows)
  • Further optimalization on sync algorithm
  • Support for read-only files (Dropbox way)
  • Support for file vs folder name-clash
  • Improvements in reliability
  • Internal recycle bin for sftp mode

(auto backup of replaced/overwritten/deleted files)

  • Dropbox-like icon overlays
  • Shell integration (share/export/publish a file option)
  • Secure data sharing support – NDS2 feature
  • Data import/export – NDS2’s FileSender-like feature
  • Data publication – NDS2 publication feature
  • Console version

Plans/roadmap

slide-24
SLIDE 24

Cryptobox recipe

  • Did research on existing solutions
  • Got frustrated ;)
  • Took SyncSharp:

https://code.google.com/p/syncsharp/, GNU GPL v3

Adopted:

Interface – GUI – with minor modifications

synchronisation logic - with major modifications

SyncSharp is originaly developed for syncing local folders

○ Added:

FileID support

Concurrency, errors and interrupts resistance

Graceful cancellation of running sync

Client-side encryption

Virtual IO, SFTP client

Autosync feature, real-time change detection

And many more…

slide-25
SLIDE 25

Cryptobox backup mode

  • In addition to synchronisation Cryptobox supports

backup mode similarly to e.g. Wuala

  • CryptoBox supports so-called contribute mode:
  • add changes from the left folder to the right folder.
  • New and updated files are copied left to right.
  • No deletions!
  • Folder creates and folder updates on the left are repeated on the right
slide-26
SLIDE 26

CryptoBox GUI: simple to use

with drag&drop support

Many sessions/connection to many SFTP servers On-demand synchronisation Preview mode Auto sync Support for multiple synchronisation tasks (local folders vs servers) System tray integration Folders to be sync’d can be dragged here

slide-27
SLIDE 27

CryptoBox GUI - Sessions

  • Sessions window lets you configure the server access
slide-28
SLIDE 28

CryptoBox GUI - Sync tasks

  • In sync tasks window you may configure client- and server-side

folders to be synchronised and decide if client-side encryption should be used

slide-29
SLIDE 29

CryptoBox GUI: Preview mode

  • Shows actions to be performed during synchronisation process
  • In the example below propagations of create / rename actions

and transfer of small file is planned

slide-30
SLIDE 30

Cryptobox’s auto-detect server capabilities function

  • Cryptobox can use SFTP servers for sync
  • It may also speak to NDS2 servers supporting advanced features
  • Client detects type of the server and behaves accordingly

This is a regular server that does not support versioning, link share, share & collaborate, geo-replication etc.

slide-31
SLIDE 31

Cryptobox change tracking compatibility

  • FileID-based:
  • client-side mechanisms: file-id based change detection
  • n (NTFS,EXT,HFS+)
  • server-side:
  • with NDS2 servers: make use of file-id support

and fast and lightweight enumeration

  • with regular SFTP servers: uses shell access

for fast change detection

  • Hash-based:
  • Used where FileID-base is not possible

e.g. clients with FAT32 file system

slide-32
SLIDE 32

Conclusions (general)

  • Reliable, secure & fast sync of large data is not trivial
  • Even popular solutions use simplistic sync algorithms:
  • Lack of intelligent change detection (both side rename/move discovery)

prevents them for scaling to really large data volumes

  • Only part of solutions uses encryption: user’s data privacy?
  • Some solutions use server-side encryption only (e.g. Dropbox)
  • Authors of some other  solution assume that data on the user-

controlled server are safe (what about hackers…?)

  • In our opinion end-to-end encryption is the only solution!
  • We evaluated most popular solutions:
  • General results in the table, details available on request
  • We consider also writing a report to be shared with larger public
  • Co-authors welcome!
  • This work can be continued within TF-Storage
  • We will start with putting our results online (TF-Storage wiki)
  • Feedback and discussion welcome!
  • Pilots to be run in collaboration with other NRENs within TF-Storage?
slide-33
SLIDE 33

Conclusions (NDS Cryptobox)

  • In NDS2 project we implemented Cryptobox:
  • pen source solution for secure and reliable data sync
  • working with standard and specialised SFTP servers
  • able to exploit NDS2 features for fast & leightweigt enumeration
  • able to cope with concurrency / conflict scenarios
  • supporting client-side encryption!
  • Results of this work:
  • Developed alpha version of the sync client for Windows
  • Gained experience & knowledge about sync & share problems
  • We are open to share this knowledge with TF-Storage
slide-34
SLIDE 34

Synchronisation solutions,

NDS2 Cryptobox

THANK YOU!!!