Bittorrent Introduction Introduction to BitTorrent Efficiency - PowerPoint PPT Presentation

Bittorrent � Introduction Introduction to BitTorrent � Efficiency & Reliability � The incentive mechanism � Trackerless with DHT Arvid Norberg arvid@cs.umu.se http://libtorrent.net Distributed systems C, TDBC85, Umeå University, Fall 2006 Introduction Introduction � Bittorrent is a system for efficient and � The file to be distributed is split up in pieces scalable replication of large amounts of and an SHA-1 hash is calculated for each static data piece – Scalable - the throughput increases with the number of downloaders 0 1 2 . . . – Efficient - it utilises a large amount of available network bandwidth 18cf5e2d7a920d73e3bc2a4b9c0523e5f061437d8f6e 81f2437ee85c52a29037f73e871d371f31d34b901387 4ba723d98fe792358da9f01ef3c5a24965fe72ed6613 . . .

Introduction Introduction � A metadata file (.torrent) is distributed to all � The tracker is a central server keeping a list of peers all peers participating in the swarm � A swarm is the set of peers that are – Usually via HTTP � The metadata contains: participating in distributing the same files � A peer joins a swarm by asking the tracker for a – The SHA-1 hashes of all pieces peer list and connects to those peers – A mapping of the pieces to files – A tracker reference Introduction Introduction Tracker Tracker

Goals Efficiency � Efficiency � Ability to download from many peers yields fast downloads – Fast downloads � Minimise piece overlap among peers to allow � Reliability each peer to exchange pieces with as many – Tolerant to dropping peers other peers as possible – Ability to verify data integrity (SHA-1 hashes) Piece overlap Piece overlap � To minimise piece overlap: Peer 1 Peer 2 – Download random pieces Peer 3 – Prioritise the rarest pieces, aiming towards uniform piece distribution Peer 4 � Small overlap � Big overlap – Every peer can – Only a few peers exchange pieces with can exchange all other peers pieces – The bandwidth can be – The bandwidth is well utilised under utilised

Reliability Distributed copies � Be tolerant against dropping peers � The number of distributed copies is the – Each dropped peer means decreased piece number of copies of the rarest piece availability e.g. � Maximise piece redundancy Peer 1 – Maximise the number of distributed copies Peer 2 Peer 3 Peer 4 Distributed copies = 2 Distributed copies = 1 Distributed copies Rarest first � To maximise the distributed copies, maximise � The piece picking algorithm used in Bittorrent is the availability of the rarest pieces called rarest first � To increase the availability of a piece, download � Picks a random piece from the set of rarest it pieces � To maximise the distributed copies: � No peer has global knowledge of piece availability, it is approximated by the availibility – Download the rarest pieces first among neighbours

Rarest first The incentive to share � All peer connections are symmetric � Pick a random piece from the set of rarest � Both peers have an interest of exchanging data pieces {2, 3} � Peers may prefer to upload to peers from whom � Ignore pieces that we already have they can download Piece Pieces – Leads to slow starts 0 1 2 3 4 5 0 – Fixed in a recent extension Us Availability 1 2 3 1 2 Peer 1 3 4 4 Peer 2 Peer 3 The incentive to share Trackerless torrents � There is a loose connection between upload � Common problems with trackers and download speed – Single point of failure � Each peer has an incentive to upload – Bandwidth bottleneck for publishers � Solutions – Multiple trackers – UDP trackers – DHT tracker

DHT distributed hash table DHT distributed hash table � Each node is assigned an ID � Works as a hash table with sha1-hashes as – in the key space (160 bit numbers) keys � Nodes order themselves in a defined � The key is the info-hash , the hash of the topography metadata. It uniquely identifies a torrent – Makes it possible to search for Ids by traversing the � The data is a peer list of the peers in the node topography swarm � Bittorrent uses kademlia as DHT Kademlia bootstrap Kademlia routing table � Each node bootstraps by looking for its own ID Our node-id Node distance – The search is done recursively until no closer nodes Node buckets can be found – The nodes passed on the way are stored in the � Each node knows much more about close routing table nodes than distant nodes – The routing table have more room for close nodes – The key space each bucket represents is growing than distant nodes with the power of 2 with the distance – Querying a node for a specific ID will on average halve the distance to the target ID each step

Kademlia routing table Kademlia routing table 160 bit key space � The distance metric is defined as XOR – In practice, the distance is 2 to the power of the Distance (should be 159 levels) inverse of the size of the common bit prefix 100110110011101010110001 100110110010101110101100 Common prefix = 11 Distance � 2 13 Our node-id Kademlia search Kademlia distributed tracker � Each search step increases the common bit � Each peer announces itself with the distributed prefix by at least one tracker – Search complexity: O (log n ) – by looking up the 8 nodes closest to the info-hash of the torrent – And send an announce message to them – Those 8 nodes will then add the announcing peer to the peer list stored at that info-hash

Kademlia distributed tracker Kademlia distributed tracker � A peer joins a torrent by looking up the peer list � 8 nodes is considered enough to minimise the at a specific info-hash probability that all of them will drop from the network within the announce interval – Like a search but nodes return the peer list if they have it – Each announce looks up new nodes, in case nodes have joined the network with Ids closer to the info- hash than a previous node

Bittorrent Introduction Introduction to BitTorrent Efficiency - PowerPoint PPT Presentation

Bittorrent Introduction Introduction to BitTorrent Efficiency & Reliability The incentive mechanism Trackerless with DHT Arvid Norberg arvid@cs.umu.se http://libtorrent.net Distributed systems C, TDBC85, Ume University,

BitTorrent Mads Dar Kristensen Niels Olof Bouvin 1 Overview BitTorrent terms The BitTorrent

Can Realistic BitTorrent Experiments Be Performed on Clusters? Ashwin Rao , Arnaud Legout, and

Project Presentations 1 BitTorrent Raylene Yung Nathan Marz BitTorrent protocol File split

CS5412: TORRENTS AND TIT-FOR-TAT Lecture VI Ken Birman BitTorrent 2 Today well be

nT nTorrent: : Bi BitTorrent in in Nam Named Data a Ne Networ orkin ing Spyros

Auction-based Model of BitTorrent Seminar in Distributed Computing 2010 ETH Zurich Yu Li What

BitTorrent-based credit systems BitTorrent - reminder tracker: keeps a record of every

Exploring and Improving BitTorrent Topologies Christian Decker ETH Zurich Distributed

How to cheat BitTorrent and why nobody does Gbor Nagy Farkas - NAFSAAT.SZE Modeling Social

Reducing BitTorrent Traffic at the Internet Scale Stevens Le Blond , Arnaud Legout, Walid Dabbous

Applying Social Network Analysis (SNA) to P2P File Sharing Andreas Schaufelbhl Robin Stohler

correcting for clock drift in uTP and LEDBAT Arvid Norberg, arvid@bittorrent.com Bram Cohen,

Comparison of BitTorrent with Traditional Content Distribution Networks (CDNs) ENSC 835 High

BitTorrent Experiments on Testbeds: A Study of the Impact of Network Latencies Ashwin Rao , Arnaud

Using Bittorrent and SVC for Efficient Video Sharing and Streaming Abdelhalim Amer 1 Toufik Ahmed

Connectivity Properties of Mainline BitTorrent DHT Nodes Raul Jimenez, Flutra Osmani, Bjrn

Amin Tootoonchian, Kiran Gollu, Stefan Saroiu, Yashar Ganjali, Alec Wolman University of Toronto

Public and private BitT orrent communities: A measurement study M. Meulpolder, L. DAcunto, M.

CS 3700 Networks and Distributed Systems Internet Architecture (Layer cake and an hourglass)

The Impact of Caching on BitTorrent-Like Peer-to-Peer Systems Best Paper Award at IEEE P2P 2010

Torrent-based software distribution Costin Grigoras Pablo Saiz ALICE Offline Week 24.06.2009

Smart Home Network Management with Dynamic Traffic Distribution Chenguang Zhu Xiang Ren

Battlefield report: Bittorrent protocol implementation Analysis of using Erlang and Haskell

Approche Algorithmique des Syst` emes Distribu es (AASR) Guillaume Pierre