[ B IT T ORRENT & D ISTRIBUTED C OMPUTING E CONOMICS ] Shrideep - - PDF document

b it t orrent d istributed c omputing e conomics
SMART_READER_LITE
LIVE PREVIEW

[ B IT T ORRENT & D ISTRIBUTED C OMPUTING E CONOMICS ] Shrideep - - PDF document

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University CS 555: D ISTRIBUTED S YSTEMS [ B IT T ORRENT & D ISTRIBUTED C OMPUTING E CONOMICS ] Shrideep Pallickara Computer Science Colorado State University


slide-1
SLIDE 1

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.1

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS 555: DISTRIBUTED SYSTEMS

[ BITTORRENT & DISTRIBUTED COMPUTING ECONOMICS]

Shrideep Pallickara Computer Science Colorado State University

September 24, 2019

L9.1 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.2 Professor: SHRIDEEP PALLICKARA

Frequently asked questions from the previous class survey

¨ Difference in routing in the network space vs ID space ¨ Can Gnutella be viewed as a semi-structured P2P system?

September 24, 2019

slide-2
SLIDE 2

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.2

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.3 Professor: SHRIDEEP PALLICKARA

Topics covered in this lecture

¨ BitTorrent ¨ Distributed Computing Economics

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

BITTORRENT

September 24, 2019

L9.4

slide-3
SLIDE 3

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.3

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.5 Professor: SHRIDEEP PALLICKARA

Bit Torrent: Traffic statistics

¨ In November 2004 ¤ Responsible for 25% of all Internet traffic ¨ February 2013 ¤ 3.35% of all worldwide bandwidth ¤ > 50% of the 6% total bandwidth dedicated to file sharing

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.6 Professor: SHRIDEEP PALLICKARA

BitTorrent

¨ Designed for downloading large files ¨ Not intended for real-time routing of content ¨ Relies on capabilities of ordinary user machines

September 24, 2019

slide-4
SLIDE 4

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.4

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.7 Professor: SHRIDEEP PALLICKARA

Bit Torrent: Key concepts

¨ Instead of downloading a file from a single source server ¤ Users join a swarm of hosts to upload-to/download-from simultaneously ¨ Several basic commodity computers can replace large, customized

servers

¤ Without compromising on efficiency ¤ In fact, lower bandwidth usage with swarms prevents large internet traffic

spikes

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.8 Professor: SHRIDEEP PALLICKARA

Segmented file transfer [1/2]

September 24, 2019

¨ File being transferred is divided into fixed-size segments called

chunks (or pieces)

¤ Chunks are of the same size throughout a single download (10MB file: 10

1MB chunks or 40 256KB chunks)

¨ Chunks are downloaded non-sequentially and rearranged into the

correct order by BitTorrent

slide-5
SLIDE 5

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.5

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.9 Professor: SHRIDEEP PALLICKARA

Segmented file transfer [2/2]

September 24, 2019

¨ Advantages: ¤ File transfers can be stopped at any time and resumed n Without loss of previously downloaded content ¤ Clients seek out readily available chunks, rather than waiting for an

unavailable (next in sequence) chunk

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.10 Professor: SHRIDEEP PALLICKARA

BitTorrent: Protocol summary

¨ Splits files into fixed-sized chunks ¨ Chunks are then made available at various peers across the P2P

network

¨ Clients can download a number of chunks in parallel from different

sites

¤ Reduces the burden on a particular peer to service the entire download

September 24, 2019

slide-6
SLIDE 6

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.6

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.11 Professor: SHRIDEEP PALLICKARA

The BitTorrent protocol

¨ When a file is made available in BitTorrent, a .torrent file is

created

¤ Holds metadata associated that file ¨ Metadata ¤ The name and length of the file ¤ Location of a tracker (URL) n Centralized server that manages download for that file ¤ Checksum n Associated with each chunk n Generated using the SHA-1 algorithm

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.12 Professor: SHRIDEEP PALLICKARA

Advantages of hashing chunks

¨ Each chunk has a cryptographic hash in the torrent descriptor ¨ Modifications of chunks can be reliably detected ¤ Prevents accidental and malicious modifications ¨ If a node starts with an authentic/legitimate torrent descriptor? ¤ It can verify the authenticity of the entire file that it receives

September 24, 2019

slide-7
SLIDE 7

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.7

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.13 Professor: SHRIDEEP PALLICKARA

The swarm or torrent for a particular file includes

¨ Tracker ¨ Seeders ¨ Leechers

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.14 Professor: SHRIDEEP PALLICKARA

Trackers

¨ The use of trackers, compromises a core P2P principle ¤ But simplifies the system ¨ Trackers are responsible for tracking the download status for a

particular file

September 24, 2019

slide-8
SLIDE 8

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.8

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.15 Professor: SHRIDEEP PALLICKARA

The roles of participants in BitTorrent: Seeder

¨ Peer with a complete version of a file (i.e. with all its chunks) is known

as a seeder

¨ Peer that initially creates the file, provides the initial seed for file

distribution

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.16 Professor: SHRIDEEP PALLICKARA

The roles of participants in BitTorrent: Leechers

¨ Peers that want to download a file are known as leechers ¤ A given leecher, at any given time, contains a number of chunks for that file ¨ Once a leecher downloads all chunks for a file, it can become a

seeder for subsequent downloads

¤ Files spread virally based on demand

September 24, 2019

slide-9
SLIDE 9

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.9

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.17 Professor: SHRIDEEP PALLICKARA

When a peers wants to download a file

¨ Contacts the tracker ¨ Is given a partial view of the torrent ¤ The set of peers that can support the download ¤ The tracker does not participate in scheduling the downloads n Decentralized ¨ Chunks are requested and transmitted in any order

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.18 Professor: SHRIDEEP PALLICKARA

Incentive mechanism: Quid pro quo

September 24, 2019

¨ Gives downloading preference to peers who have previously uploaded

to the site

¤ Encourages concurrent uploads/downloads to make better use of bandwidth ¨ A peer supports downloads from n simultaneous peers by unchoking

these peers

¤ Decisions based on rolling calculations of download rates

slide-10
SLIDE 10

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.10

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.19 Professor: SHRIDEEP PALLICKARA

Scheduling downloads

September 24, 2019

¨ Rarest first scheduling policy ¨ Peer prioritizes chunk that is rarest among its set of connected peers ¨ Ensures that chunks that are not widely available, spread rapidly

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.20 Professor: SHRIDEEP PALLICKARA

How BitTorrent differs from a classic download

BitTorrent Classic download

Connections Download Order Many small data requests

  • ver different IP connections

to different machines One TCP connection to one machine Random or “rarest first” to ensure high- availability Sequential

September 24, 2019

** Allows BitTorrent to achieve lower cost, higher redundancy, and resistance to abuse

slide-11
SLIDE 11

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.11

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.21 Professor: SHRIDEEP PALLICKARA

BitTorrent: Advantages

¨ Advantages ¤ Lower costs, greater redundancy, higher resistance to abuse or “flash

crowds”

¨ Shortcomings ¤ Non-contiguous download precludes progressive download ¤ No streaming playback n Beta BitTorrent Streaming protocol was made available for testing in 2013; this was

not successful

n New service BitTorrent Live was released as Public Beta in Spring 2019.

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.22 Professor: SHRIDEEP PALLICKARA

BitTorrent: Shortcomings

September 24, 2019

¨ Downloads can take time to rise to full speed ¤ May take time to enough peer connections to be established ¤ Takes time for a node to receive data to become an effective uploader ¨ Regular (non-BitTorrent/traditional) downloads on the other hand: ¤ Rise to full speed very quickly and maintain this speed throughout

slide-12
SLIDE 12

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.12

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.23 Professor: SHRIDEEP PALLICKARA

But how do you find a torrent?

¨ Browsing the web or by some other means ¤ Open it with a BitTorrent client ¨ Client connects to trackers in the torrent file and finds peers ¤ If swarm contains only the initial seeder, client connects directly to it and

begins to request pieces

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.24 Professor: SHRIDEEP PALLICKARA

Support for trackerless Torrents

September 24, 2019

¨ Azureus (now Vuze) supported this first ¨ Mainline BitTorrent provides a DHT based implementation ¤ Mainline DHT ¤ Kademlia-based Distributed Hash Table (DHT) used by BitTorrent clients

slide-13
SLIDE 13

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.13

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

DISTRIBUTED COMPUTING ECONOMICS

JIM GRAY. Distributed Computing Economics. Technical Report: MSR-TR-2003-24. Microsoft Research. July 2003.

September 24, 2019

L11.25 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.26 Professor: SHRIDEEP PALLICKARA

Outsourcing allows smaller services to benefit from mega services

¨ Automate the routine ¤ Harness economies-of-scale ¨ Companies outsource payroll, insurance, web presence, and e-mail ¤ Universities have tied-up with Google for e-mail for instance

September 24, 2019

slide-14
SLIDE 14

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.14

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.27 Professor: SHRIDEEP PALLICKARA

Outsourcing works under certain conditions …

September 24, 2019

¨ Should be a service business ¤ And computing should be CENTRAL n To operating and supporting the customer ¨ Application should be nearly identical across companies ¤ Payroll, E-mail n Exception not the rule

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.28 Professor: SHRIDEEP PALLICKARA

Distributed computing does not have an outsourcing

  • r business model

September 24, 2019

¨ Designed for computer-to-computer interactions ¤ No eyeballs involved ¨ Need new business models to make profit ¤ Enter the notion of leasing in modern Cloud systems

slide-15
SLIDE 15

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.15

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.29 Professor: SHRIDEEP PALLICKARA

Baseline hardware parameters

¨ 2 GHz CPU with 2 GB RAM = $2000 ¨ 200 GB disk = $200 ¤ 100 access/sec ¤ 50 MB/sec transfer speed ¨ 1 Gbps Ethernet port-pair = $200 ¨ 1 Mbps WAN link = $50/month

September 24, 2019

Note: Numbers are circa 2003

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.30 Professor: SHRIDEEP PALLICKARA

1 dollar buys you

¨ 1 GB transfer over WAN ¨ 8 hours of CPU time ¨ 10 tops (Tera CPU operations) ¨ 1 GB disk space for 3 years ¨ 10 M database accesses ¨ 10 TB of sequential disk access ¨ 10 TB of LAN Bandwidth (bulk) ¨ 10 KWhrs = 4 days of computer time

September 24, 2019

slide-16
SLIDE 16

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.16

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.31 Professor: SHRIDEEP PALLICKARA

Caveats

September 24, 2019

¨ Beowulf clusters have different networking economics ¤ Networking costs comparable to disk bandwidth n 10,000 times cheaper than price of Internet transports ¤ Do not confuse with Internet-scale computations ¨ If telecom costs drop faster than Moore’s law …analysis fails ¤ Over past 40 years telecom costs have fallen the slowest

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.32 Professor: SHRIDEEP PALLICKARA

The right abstraction level for Internet Distributed Computing

September 24, 2019

¨ Disk Block ? No ¨ File? No ¨ Database? No ¨ Applications? Yes ¤ BLAST search ¤ Google search ¤ Send/GET e-mail

slide-17
SLIDE 17

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.17

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.33 Professor: SHRIDEEP PALLICKARA

Computing on-demand enables mobile applications

¨ Tasks are mobile ¨ Computing is dynamically provisioned ¨ Write-once-run-anywhere (WORA) ¤ Java ¤ COBOL

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.34 Professor: SHRIDEEP PALLICKARA

A computation task has 4 demands that must be met

① Networking Questions & Answers ② Computation Transform data/info into new information ③ Database/File Access Access to reference information ④ Database/File Storage Long term storage

September 24, 2019

slide-18
SLIDE 18

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.18

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.35 Professor: SHRIDEEP PALLICKARA

Ratios of demands and the relative costs is pivotal

¨ OK to send GB of data if it saves years of computation ¨ NOT OK to send KB of data over network ¤ If computation can be performed locally

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.36 Professor: SHRIDEEP PALLICKARA

Ideal mobile computation task

¨ Stateless ¤ No disk access ¨ Tiny network input or output ¨ Huge computational demand ¨ Examples: ¤ Cryptographic search n {encrypted text, clear text, key search range} ¤ Monte Carlo simulation ¤ SETI@HOME

September 24, 2019

slide-19
SLIDE 19

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.19

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.37 Professor: SHRIDEEP PALLICKARA

Why SETI@HOME is a good deal

¨ Sends out 109 jobs: each is 300 KB ¨ Network costs ¤ 1 GB = $1 ¤ 1 MB = 10-3 $ ¤ 100 KB = 10-4 $ ¨ Compute Cost = 0.5$ ¨ Compute Cost/Network Cost = 0.5/(3*10-4) ¤ Approx: 1600:1

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.38 Professor: SHRIDEEP PALLICKARA

How do you move a Terabyte?

14 minutes 617 200 1,920,000 9600 OC 192 2.2 hours 1000 Gbps 1 day 100 100 Mpbs 14 hours 976 316 49,000 155 OC3 2 days 2,010 651 28,000 43 T3 2 months 2,469 800 1,200 1.5 T1 5 months 360 117 50 0.6 Home DSL 6 years 3,086 1,000 40 0.04 Home phone

Time/TB $/TB Sent $/Mbps Rent $/month Speed Mbps Context

Source: TeraScale Sneakernet, Microsoft Research, Jim Gray, Chong; Tom Barclay; Alex Szalay; Jan vandenBerg

September 24, 2019

slide-20
SLIDE 20

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.20

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.39 Professor: SHRIDEEP PALLICKARA

Consequences

September 24, 2019

¨ The cheapest & fastest way to move Terabytes cross country is

sneakernet

¤ 24 hours ¤ $50 shipping vs $1000 WAN cost ¨ Sending 10PB CERN data via network is silly:

① Buy disk bricks in Geneva ② Fill them ③ Ship them

TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange

Jim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan vandenBerg Microsoft Technical Report may 2002, MSR-TR-2002-54 http://research.microsoft.com/research/pubs/view.aspx?tr_id=569

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.40 Professor: SHRIDEEP PALLICKARA

Web Data processing systems

September 24, 2019

¨ Network or State intensive ¨ 100 MB FTP task = 10 cents ¤ 99% network cost ¨ HTML webpage access ¤ 10-6 dollars, 88% network cost ¨ Hotmail ¤ 10-5 dollars; some balance in CPU and network costs

slide-21
SLIDE 21

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.21

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.41 Professor: SHRIDEEP PALLICKARA

Why Napster was a good deal

¨ 5 MB song ¤ Network cost = 5 x 10-3 $ = ½ a penny ¨ Both sender and receiver could afford it ¨ Yahoo! Serving web pages ¤ 10-3 $ in advertising revenue per page ¤ 10-5 $ total cost in serving web page ¤ ROI: 100:1

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.42 Professor: SHRIDEEP PALLICKARA

Computations that are not economically viable

September 24, 2019

¨ Data loading and data scanning tasks ¤ CPU-intensive; but also data intensive. ¤ Therefore not economically viable as mobile applications.

slide-22
SLIDE 22

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.22

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.43 Professor: SHRIDEEP PALLICKARA

Break even point for mobile computation tasks

September 24, 2019

¨ 10 Tops & 1 GB of networking both cost $1 ¨ Break-even point ¤ 10,000 instructions per byte of network traffic ¨ Outsourcing becomes attractive when the cost-benefit ratio involves ¤ 30,000 instructions per byte

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.44 Professor: SHRIDEEP PALLICKARA

The type of network also matters

September 24, 2019

¨ LAN is 10,000 cheaper than WAN ¨ Computational Fluid Dynamics ¤ Simulate crack propagation in an Object ¤ 100 MB input, 10 GB output, 7 CPU years ¤ 106 instructions per byte : so good for WAN n But needs to executed in a tightly connected cluster n Cluster networking is free when compared to WAN networking

slide-23
SLIDE 23

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.23

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.45 Professor: SHRIDEEP PALLICKARA

Toy Story 2

¨ A 200 MB image takes several CPU hours to render ¨ Instruction density § 200-600 x 103 instructions per byte ¨ Send 50 MB task; compute for 10 hours; ¤ Return 200 MB image!

September 24, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.46 Professor: SHRIDEEP PALLICKARA

Bioinformatics systems

¨ BLAST, FASTA and Smith-Waterman ¤ Algorithms for matching DNA sequences against a database (GenBank or

SwissProt).

¤ Database sizes 50 GB ¨ Does it make sense to send SwissProt (40GB) to a server if processing

(7220 hrs) is free?

¤ Yes

September 24, 2019

slide-24
SLIDE 24

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.24

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.47 Professor: SHRIDEEP PALLICKARA

Do not provision databases, provision the searches instead

September 24, 2019

¨ Does NOT make sense to provision databases on demand ¨ Set up dedicated servers instead ¤ Use inexpensive servers and processors ¤ Provision searches! ¨ 40 GB server costs $20K ¤ Can deliver complex 1-hour searches for $1

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.48 Professor: SHRIDEEP PALLICKARA

What does this imply?

¨ Put the computations near the data ¤ Instruction density must exceed 105 per byte ¨ Combining data from multiple sites ¤ PUSH processing to data sources n Filter the data early

September 24, 2019

slide-25
SLIDE 25

SLIDES CREATED BY: SHRIDEEP PALLICKARA L9.25

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L9.49 Professor: SHRIDEEP PALLICKARA

The contents of this slide-set are based on the following references

September 24, 2019 ¨ Distributed Systems: Concepts and Design. George Coulouris, Jean Dollimore, Tim

Kindberg, Gordon Blair. 5th Edition. Addison Wesley. ISBN: 978-0132143011. [Chapter 10]

¨ JIM GRAY. Distributed Computing Economics. Technical Report: MSR-TR-2003-24.

Microsoft Research. July 2003.