Attila Szegedi, Software Engineer @asz 1 Wednesday, November 23, - - PowerPoint PPT Presentation

attila szegedi software engineer asz
SMART_READER_LITE
LIVE PREVIEW

Attila Szegedi, Software Engineer @asz 1 Wednesday, November 23, - - PowerPoint PPT Presentation

Attila Szegedi, Software Engineer @asz 1 Wednesday, November 23, 11 Twitters Open Source Involvements 2 Wednesday, November 23, 11 Both users and producers Twitters systems are almost completely based on Open Source software


slide-1
SLIDE 1

Attila Szegedi, Software Engineer @asz

1

Wednesday, November 23, 11

slide-2
SLIDE 2

Twitter’s Open Source Involvements

2

Wednesday, November 23, 11

slide-3
SLIDE 3

Both users and producers

  • Twitter’s systems are almost completely based on

Open Source software

  • our finance department runs Windows and

Outlook, though…

3

Wednesday, November 23, 11

slide-4
SLIDE 4

4

Wednesday, November 23, 11

slide-5
SLIDE 5

Contributor agreements

  • Twitter has signed a companywide contributor

agreements with:

  • Oracle (OpenJDK)
  • Eclipse Foundation
  • Apache Software Foundation
  • Our employees can contribute to these projects

automatically, no further red tape involved.

5

Wednesday, November 23, 11

slide-6
SLIDE 6

Twitter’s own Open Source projects

https://github.com/twitter

6

Wednesday, November 23, 11

slide-7
SLIDE 7

Twitter’s own Open Source projects

  • We use these projects internally, and either:
  • develop on GitHub, or…
  • … frequently sync to GitHub
  • You get access to same versions that we use.
  • Lots of things for both front-end presentation and

back-end capacity and scalability.

7

Wednesday, November 23, 11

slide-8
SLIDE 8

Hearst Castle

8

Wednesday, November 23, 11

slide-9
SLIDE 9

Hearst Castle

9

  • William Randolph Hearst had it built between 1919-1947
  • “The estate is a pastiche of historic architectural styles

that its owner admired in his travels around Europe. Hearst was an omnivorous buyer who did not so much purchase art and antiques to furnish his home as built his home to get his bulging collection out of warehouses… The floor plan of the Main Building is chaotic due to his habit of buying centuries-old ceilings, which dictated the proportions and decor of various rooms.” --Wikipedia

Wednesday, November 23, 11

slide-10
SLIDE 10

10

Wednesday, November 23, 11

slide-11
SLIDE 11

Bootstrap

https:// twitter.github.com/ bootstrap

11

Wednesday, November 23, 11

slide-12
SLIDE 12

Bootstrap

  • Bootstrap is Twitter's frontend HTML, CSS and

JavaScript toolkit for kickstarting websites.

  • It includes base CSS styles for typography, forms,

buttons, tables, grids, navigation, alerts, and more.

  • Supports IE7 and up
  • Very small (CSS is ~7Kb)

12

Wednesday, November 23, 11

slide-13
SLIDE 13

Incredibly popular

  • 3rd most watched Github project (after Ruby on

Rails and Node.js)

13

Wednesday, November 23, 11

slide-14
SLIDE 14

14

Wednesday, November 23, 11

slide-15
SLIDE 15

15

Wednesday, November 23, 11

slide-16
SLIDE 16

16

Wednesday, November 23, 11

slide-17
SLIDE 17

17

Wednesday, November 23, 11

slide-18
SLIDE 18

18

Wednesday, November 23, 11

slide-19
SLIDE 19

Built around a complete styleguide

  • Scaffolding
  • grid, fixed-width and variable width
  • Typography
  • headings, body text, quotes, lists, code, labels
  • Navigation
  • fixed topbar, tab and pill navigation, breadcrumbs, pagination
  • Alerts, dialogs
  • Media thumbnails, tables, forms, buttons

19

Wednesday, November 23, 11

slide-20
SLIDE 20

20

Wednesday, November 23, 11

slide-21
SLIDE 21

21

Wednesday, November 23, 11

slide-22
SLIDE 22

22

Wednesday, November 23, 11

slide-23
SLIDE 23

Bootstrap

  • Lets you build websites that have consistent,

beautiful look, quickly.

23

Wednesday, November 23, 11

slide-24
SLIDE 24

Finagle

https://twitter.github.com/finagle

24

Wednesday, November 23, 11

slide-25
SLIDE 25

Finagle

  • Switching gears from front end to back end now…
  • Finagle is a library for building asynchronous RPC

servers and clients on JVM.

25

Wednesday, November 23, 11

slide-26
SLIDE 26

Finagle

  • Built on top of Netty
  • Supports request-response, streaming, pipelining.
  • Supports stateful RPC styles.

26

Wednesday, November 23, 11

slide-27
SLIDE 27

Client features

  • Connection pooling
  • Load balancing
  • Failure detection
  • Failover/retry
  • Distributed tracing
  • Service discovery
  • Sharding
  • Native OpenSSL support
  • Rich statistics

27

Wednesday, November 23, 11

slide-28
SLIDE 28

Server features

  • Backpressure (against abusive clients)
  • Service registration
  • Native OpenSSL bindings

28

Wednesday, November 23, 11

slide-29
SLIDE 29

Protocol support

  • HTTP
  • Streaming HTTP (“Comet”)
  • Thrift
  • Memcached
  • Kestrel
  • In no way limited to these only…

29

Wednesday, November 23, 11

slide-30
SLIDE 30

Minimal HTTP server:

30

val service: Service[HttpRequest, HttpResponse] = new Service[HttpRequest, HttpResponse] { def apply(request: HttpRequest) = Future(new DefaultHttpResponse(HTTP_1_1, OK)) } val server: Server[HttpRequest, HttpResponse] = ServerBuilder() .codec(Http) .bindTo(new InetSocketAddress(10000)) .name("HttpServer") .build(service) Service<HttpRequest, HttpResponse> service = new Service<HttpRequest, HttpResponse>() { public Future<HttpResponse> apply(HttpRequest request) { return Future.value( new DefaultHttpResponse(HttpVersion.HTTP_1_1, HttpResponseStatus.OK)); } }; Server server = ServerBuilder.safeBuild(service, ServerBuilder.get() .codec(Http.get()) .name("HttpServer") .bindTo(new InetSocketAddress("localhost", 10000)));

… same in Java:

Wednesday, November 23, 11

slide-31
SLIDE 31

Minimal HTTP client

31

val client: Service[HttpRequest, HttpResponse] = ClientBuilder() .codec(Http) .hosts(address) .hostConnectionLimit(1) .build() // Issue a request, get a response: val request: HttpRequest = new DefaultHttpRequest(HTTP_1_1, GET, "/") val responseFuture: Future[HttpResponse] = client(request)

  • nSuccess { response => println("Received response: " + response)

}

Wednesday, November 23, 11

slide-32
SLIDE 32

Robust client

32

val client = ClientBuilder() .codec(Http) .hosts("localhost:10000,localhost:10001,localhost:10003") .hostConnectionLimit(1) // max num of connections at a time to a host .connectionTimeout(1.second) // max time to spend establishing a conn .retries(2) // (1) per-request retries .reportTo(new OstrichStatsReceiver) // export host-level load data .logger(Logger.getLogger("http")) .build()

Wednesday, November 23, 11

slide-33
SLIDE 33

Architecture

33

Wednesday, November 23, 11

slide-34
SLIDE 34

Architecture

34

Wednesday, November 23, 11

slide-35
SLIDE 35

Futures

35

  • Unifying abstraction for asynchronous computation
  • A computation that has not yet completed
  • can succeed or fail
  • Either block and wait for it to return, or…
  • … register a completion callback.
  • completion callbacks provide scaling, timeouts,

scatter-gather, etc.

Wednesday, November 23, 11

slide-36
SLIDE 36

Futures

36

  • Socket handler is not blocked while the

response is being generated.

  • Socket handler can time out if the operation

takes too long.

  • Response generator can scatter its operation,

and return once every sub-operation completed

  • r timed out.

Wednesday, November 23, 11

slide-37
SLIDE 37

Futures

37

val future = dispatch(request) val response = future() // blocks val future = dispatch(request) future onSuccess { value => // do something asynchronously } val future = dispatch(request) if (future.isDefined()) { val response = future() } else { // do something - timeout? }

  • Blocking style
  • Event handler style
  • Non-blocking style

Wednesday, November 23, 11

slide-38
SLIDE 38

Cassandra

38

Wednesday, November 23, 11

slide-39
SLIDE 39

39

  • Onto distributed storage...
  • Cassandra is a decentralized, fault tolerant, highly

scalable distributed database

  • Multi-master, multi-datacenter
  • Linearly scalable
  • High performance

Cassandra

Wednesday, November 23, 11

slide-40
SLIDE 40

Project

40

  • Multiple committers at Twitter
  • Twitter is one of the largest users
  • Has contributed major patches in performance,

scalability, and operational efficiency.

  • Hundreds of nodes in production
  • Serving millions of reads/writes per second!

Wednesday, November 23, 11

slide-41
SLIDE 41

Use Cases

41

  • Spiderduck (real-time crawler)
  • Cuckoo (real-time monitoring/alerting engine for

Twitter infrastructure)

  • Tweet button
  • Geolocation
  • Distributed RPC tracing store
  • Real-time spam/IP store
  • and more!

Wednesday, November 23, 11

slide-42
SLIDE 42

Features

42

  • Supports eventual AND strong consistency!
  • Distributed counters
  • CQL (SQL like interface - select * from table)
  • Secondary Indexing
  • Hadoop support
  • Compression

Wednesday, November 23, 11

slide-43
SLIDE 43

Twitter at Scale

43

  • Add capacity by racks not servers
  • Measure everything in percentiles (p95,p99,p999)
  • Tune Cassandra to better integrate with the kernel

and our hardware platforms

  • Profile, profile and profile!
  • Agile build deployment processes (jenkins,

bittorrent)

  • Automated performance and distributed testing

Wednesday, November 23, 11

slide-44
SLIDE 44

FlockDb

https://twitter.github.com/flockdb

44

Wednesday, November 23, 11

slide-45
SLIDE 45

FlockDb

45

Distributed graph database for storing adjacency lists

Wednesday, November 23, 11

slide-46
SLIDE 46

FlockDb goals

46

  • Support a high rate of add/update/remove operations
  • Support potientially complex set arithmetic queries
  • Support paging through query result sets containing

millions of entries

  • Ability to “archive” and later restore archived edges
  • Horizontal scaling including replication
  • Online data migration

Wednesday, November 23, 11

slide-47
SLIDE 47

FlockDb

47

  • Simpler, because it solves fewer problems than generic

graph databases.

  • Scales horizontally, and is designed for low-latency,

high-throughput environments.

  • Twitter uses it to store its social graph (“follows” and

“blocks” relations).

Wednesday, November 23, 11

slide-48
SLIDE 48

Gizzard

https:// twitter.github.com/ gizzard

48

Wednesday, November 23, 11

slide-49
SLIDE 49

Gizzard

49

A Scala framework for creating fault-tolerant distributed databases.

Wednesday, November 23, 11

slide-50
SLIDE 50

Gizzard

50

  • Lots of Open Source eventually-consistent distributed

databases lately.

  • Gizzard turns it around:
  • it’s a middleware framework sitting between

clients and your replicated/partitioned storage

  • but it doesn’t tell you or limit you in your:
  • data storage choice
  • sharding and replication strategy choice

Wednesday, November 23, 11

slide-51
SLIDE 51

Gizzard

51

  • Any storage backend:
  • MySQL, Redis, Lucene, …
  • Stateless: run as many instances as necessary

Wednesday, November 23, 11

slide-52
SLIDE 52

Gizzard partitioning

52

  • Hash function + forwarding table
  • Not “consistent hashing”
  • Allows heterogeneously sized partitions
  • easy hotspot management

Wednesday, November 23, 11

slide-53
SLIDE 53

Gizzard replication

53

  • Each shard is either physical or logical
  • Logical shards are trees of other shards, with

propagation strategies for reads and writes.

  • You can code your own strategies for

transaction coordination, quorum, etc., or use standard ones.

  • Standard: “Replicate” (write to all children,

load balance reads), Write-Only, Read-Only, Blocked.

Wednesday, November 23, 11

slide-54
SLIDE 54

Gizzard replication

54

  • Replication topologies can vary per partition, i.e.:
  • higher level for hotter partitions
  • backends can mirror each other, or…
  • … stripe partitions across machines
  • better fault tolerance, higher configuration

complexity

Wednesday, November 23, 11

slide-55
SLIDE 55

Gizzard fault tolerance

55

  • If a partition replica crashes, request are rerouted to

healthy ones.

  • If all replicas of a partition crash, then
  • reads from that partition are unavailable, but

the other partitions are unaffected.

  • writes can be buffered in durable journal with

error queue.

  • Requires writes to be idempotent and

commutative.

  • Data modeling needs to account for write

conflicts.

Wednesday, November 23, 11

slide-56
SLIDE 56

The rest

56

Wednesday, November 23, 11

slide-57
SLIDE 57

The rest

57

  • Kestrel: queueing system
  • Fault tolerant, robust
  • Not JMS compliant!
  • Loosely ordered; no cross communication in

cluster.

  • https://github.com/robey/kestrel

Wednesday, November 23, 11

slide-58
SLIDE 58

The rest

58

  • Commons: Java libraries augmenting Google

Guava

  • Unified configuration & service launcher
  • Closures, codecs, memcache client, Thrift

server and client, load balancer, Zookeeper serversets

  • Stats collection and publishing (non-JMX)
  • Twitter-text included

Wednesday, November 23, 11

slide-59
SLIDE 59

The rest

59

  • Kiji: Twitter’s fork of REE (Ruby Enterprise

Edition)

  • Used for all Ruby runtimes at Twitter
  • Has memory management and garbage

collector radically rewritten

  • Before, Twitter frontend REE runtimes spent

33% of CPU in GC

  • Now they spend 5%

Wednesday, November 23, 11

slide-60
SLIDE 60

Attila Szegedi, Software Engineer @asz

60

Wednesday, November 23, 11