A P2P Dropbox @mafintosh 8 person team Based in 5 countries - PowerPoint PPT Presentation

A P2P Dropbox

@mafintosh

8 person team

Based in 5 countries

>1500 npm modules

>1500 npm modules (~0.5% of npm)

We make tools that help scientists   share data

We make tools that help scientists   share data (and other people as well)

Data === Files

Existing great file sharing tools

• Extremely easy to use • Centralised / High cost • Who owns the data? • Sustainable?

• Decentralised / P2P • Massive adopted / Simple protocol • Only works for static files • Scales worse on really big data sets • No diffs

We can do better

• Easy to use, but not centralised like Dropbox • Decentralised / P2P but not for piracy like BitTorrent • Build for modern use cases

• Easy to use, but not centralised like Dropbox • Decentralised / P2P but not for piracy like BitTorrent • Build for modern (scientific) use cases

A next generation file sharing tool

Real time / Live data (get only the data you need and get updates when it changes)

Decentralised (no servers / data centers needed, actually serverless)

Diffable (sharing two similar data sets should only share the diff)

npm install -g dat

Append only logs

Append only logs (a list of data you only ever append to, get it? )

Append only logs lists (a list of data you only ever append to, get it? )

(Append item to list) Data item #0

Data item #0 (Append item to list) Data item #1

Data item #0 Data item #1 (Append item to list) Data item #2

Why “Append Only Logs”?

• A simple data structure • Immutable • Logical ordering • Easy to digest / index

How can we share append only logs?

How can we share append only logs? (over a p2p network where we don’t trust other people)

Merkle Trees

Merkle Trees (a tree structure that verifies data)

Merkle Trees (a tree structure that verifies data) (unrelated to Angela Merkel)

Data #0

Root hash #0 Hash #0 Data #0

Root hash #1 Hash #1 Hash #0 Hash #2 Data #0 Data #1

Root hash #2 Hash #1 Hash #0 Hash #2 Hash #4 Data #0 Data #1 Data #2

Root hash #3 Hash #3 Hash #1 Hash #5 Hash #0 Hash #2 Hash #4 Hash #6 Data #0 Data #1 Data #2 Data #3

Root hash #3 verifies all the data

👪 wants to share data with  Data #2

Root hash #3  trust this hash Hash #3 Hash #1 Hash #5 Hash #0 Hash #2 Hash #4 Hash #6 Data #0 Data #1 Data #2 Data #3 👪 wants to share this

Root hash #3  trust this hash Hash #1 Hash #6 👪 needs to share these Data #2

Root hash #3 Hash #1 Hash #4 Hash #6 Data #2

Root hash #3 Hash #1 Hash #5 Hash #4 Hash #6 Data #2

Root hash #3 Hash #3 Hash #1 Hash #5 Hash #4 Hash #6 Data #2

 checks that match Hash #3 Root hash #3

👪 only needs to send O(log(n)) hashes to 

👪 only needs to send O(log(n)) hashes to  (can easily be optimised to never send the same hash twice)

👪 only needs to send O(log(n)) hashes to  (can easily be optimised to never send the same hash twice) (come ask me later, i’m fun at parties)

Real time

Every time we append data root hash changes Root hash

Crypto to the rescue

Generate a key pair Secret Key + Public Key

 trusts ……. Public Key

Secret Key 👪 signs the root Root hash #2 Hash #1 Hash #0 Hash #2 Hash #4 Data #0 Data #1 Data #2

Secret Key Root hash #3 👪 signs the new root Hash #3 Hash #1 Hash #5 Hash #0 Hash #2 Hash #4 Hash #6 Data #0 Data #1 Data #2 Data #3

 uses to verify signatures Public Key Root hash

npm install hypercore

(demo)

How do we turn append only logs into a file sharing tool?

Take a file ~/cool.data

Cut it into pieces ~/cool.data

Insert each piece into the log Data #0 Data #1 Data #2 ~/cool.data Data #3 Data #4

Diffable

Divide a file into chunks that are unlikely to change when the file is updated

Example: git

function hello () { var world = 'world' console.log('hello', world) }

function hello () { var world = 'world' console.log('hello', world) } (One line per chunk)

function hello () { var world = 'universe' console.log('hello', world) } (Edit one line)

function hello () { var world = 'universe' console.log('hello', world) } (3/4 chunks unchanged)

Only works for text files

Rabin fingerprinting (Content defined chunking)

Scans through the file and creates chunks based on the actual file content

(A new part is inserted in the middle of the file)

(Only the neighbouring chunks are changed)

npm install rabin

Each Rabin chunk is an entry in our append only log

Data #0 Data #1 Data #2 …

Merkle trees + Rabin = ❤

Hash #3 Hash #1 Hash #5 Hash #0 Hash #2 Hash #4 Hash #6 Data #0 Data #1 Data #2 Data #3

Hash #3 Hash #1 Hash #5 Hash #0 Hash #2 Hash #4 Hash #6 Data #0 * Data #1 Data #2 Data #3 Change some data

Hash #3 Hash #1 Hash #5 Hash #0 Hash #2 Hash #4 Hash #6 Data #0 * Data #1 Data #2 Data #3 Change some data Rabin makes sure these entries do not change

Only a few hashes change * Hash #3 * Hash #1 Hash #5 Hash #0 * Hash #2 Hash #4 Hash #6 Data #0 * Data #1 Data #2 Data #3 Change some data

Keep an index Hash Data Data … Data

See the same hash twice, just copy the data Hash Data

See the same hash twice, just copy the data Hash Data (no need to re-download it)

See the same hash twice, just copy the data Hash Data (no need to re-download it) (can be … easily … optimised for space)

npm install hyperdrive

(demo)

is a cli tool and desktop app that manages hyperdrives

(demo)

Great apps build on

Beaker browser https://github.com/beakerbrowser/beaker

Science Fair https://github.com/codeforscience/sciencefair

A P2P Dropbox @mafintosh 8 person team Based in 5 countries - PowerPoint PPT Presentation

A P2P Dropbox @mafintosh 8 person team Based in 5 countries >1500 npm modules >1500 npm modules (~0.5% of npm) We make tools that help scientists share data We make tools that help scientists share data (and other people as

COMPARATIVE ONBOARDING STUDY Dropbox and the Dropbox logo are trademarks of Dropbox, Inc. Color

P2P-NEXT EUROPEAN UNION FRAMEWORK 7 PROJECT WWW.P2P-NEXT.ORG Johnathan Ishmael

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Data Checking at Dropbox David Mah Dropbox Problems we are tackling Examples of Checkers

Scaling Dropbox P R E S L AV L E , N O V E M B E R 7 T H , 2 0 1 6 Zone Zone (west) (east)

P2P Traffic Localization by Alias Tracker for Tracker-based P2P applications (ATTP)

Dropbox for Education @ UNCSA Old tools impose boundaries In-person Size limits Internal-only

Backbone Procure to Pay Process P2P Process Review Requirement Order Receipt Invoice

P2P: Distributed Hash Tables Chord + Routing Geometries Nirvan Tyagi CS 6410 Fall16

P2P Applications Niels Olof Bouvin 1 Purpose Demonstrate the use of P2P techniques in

Distributed Adaptive Systems (DAS) Unit Self-organising P2P Antonio Bucchiarone Fondazione Bruno

P2P: Storage Overall outline (Relatively) chronological overview of P2P areas: What is

P2P Overlay Design Overview John Buford, Panasonic Digital Networking Laboratory IRTF P2P RG Core

to CE Devices http://www.p2p-next.eu Mark Stuart Pioneer Digital Design Centre Limited

Pawel K. Olszewski, PhD pawel@waikato.ac.nz TEAM TEAM TEAM TEAM TEAM TEAM TEAM TEAM TEAM

Smart Cameras Mark DiVelbiss, Selena Grant, Qing Liu Overview - First Person vs Third Person -

Embedded Linux Conference 2016 Buildroot vs. OpenEmbedded/Yocto Project: A Four Hands Discussion

Topics in trees and Catalan numbers See Chapter 8.1.2.1. These slides have more details than the

Services Using E-Tree Service Type Ethernet Private Tree (EP-Tree) and Ethernet Virtual Private

Set 4: Game-Playing ICS 271 Fall 2016 Kalev Kask Overview Computer programs that play

Practical Techniques to Obviate Setuid-to-Root Binaries Bhushan Jain , Chia-Che Tsai, Jitin John,

Measuring Relative Attack Surfaces Jeannette Wing School of Computer Science Carnegie Mellon

Intro to Trees After today, you should be able to use tree terminology write recursive

Natural Language Understanding Lecture 7: Introduction to Dependency Parsing Adam Lopez Credits: