zookeeper
play

ZooKeeper Wait-free coordination for Internet-scale systems Patrick - PowerPoint PPT Presentation

ZooKeeper Wait-free coordination for Internet-scale systems Patrick Hunt and Mahadev (Yahoo! Grid) Flavio Junqueira and Benjamin Reed (Yahoo! Research) Internet-scale Challenges Lots of servers, users, data FLP, CAP Mere mortal


  1. ZooKeeper Wait-free coordination for Internet-scale systems Patrick Hunt and Mahadev (Yahoo! Grid) Flavio Junqueira and Benjamin Reed (Yahoo! Research)

  2. Internet-scale Challenges • Lots of servers, users, data • FLP, CAP • Mere mortal programmers

  3. Classic Distributed System Master Slave Slave Slave Slave Slave Slave

  4. Fault Tolerant Distributed System Master Coordination Service Master Slave Slave Slave Slave Slave Slave

  5. Fault Tolerant Distributed System Master Coordination Service Master Slave Slave Slave Slave Slave Slave

  6. Fully Distributed System Coordination Service Worker Worker Worker Worker Worker Worker

  7. What is coordination? • Group membership • Leader election • Dynamic Configuration • Status monitoring • Queuing • Barriers • Critical sections

  8. Goals • Been done in the past –ISIS, distributed locks (Chubby, VMS) • High Performance –Multiple outstanding ops –Read dominant • General (Coordination Kernel) • Reliable • Easy to use

  9. wait-free • Pros –Slow processes cannot slow down fast ones –No deadlocks –No blocking in the implementations • Cons –Some coordination primitives are blocking –Need to be able to efficiently wait for conditions

  10. Serializable vs Linearizability • Linearizable writes • Serializable read (may be stale) • Client FIFO ordering

  11. Change Events • Clients request change notifications • Service does timely notifications • Do not block write requests • Clients get notification of a change before they see the result of a change

  12. Solution Order + wait-free + change events = coordination

  13. ZooKeeper API String create(path, data, acl, flags) void delete(path, expectedVersion) Stat setData(path, data, expectedVersion) (data, Stat) getData(path, watch) Stat exists(path, watch) String[] getChildren(path, watch) void sync() Stat setACL(path, acl, expectedVersion) (acl, Stat) getACL(path)

  14. Data Model / • Hierarchal namespace (like a file system) services YaView • Each znode has data and workers children worker1 • data is read and written in worker2 its entirety locks s-1 apps users

  15. Create Flags / • Ephemeral: znode deleted Ephemerals created by when creator fails or services Session X explicitly deleted YaView workers • Sequence: append a worker1 monotonically increasing worker2 counter locks s-1 apps Sequence users appended on create

  16. Configuration config settings • Workers get configuration –getData(“.../config/settings”, true) • Administrators change the configuration –setData(“.../config/settings”, newConf, -1) • Workers notified of change and get the new settings –getData(“.../config/settings”, true)

  17. Group Membership • Register serverName in group –create(“.../workers/workerName”, hostInfo, EPHEMERAL) workers • List group members worker1 –listChildren(“.../workers”, true) worker2

  18. Leader Election • getData(“.../workers/leader”, true) • if successful follow the leader described in the data and exit • create(“.../workers/leader”, hostname, workers EPHEMERAL) worker1 • if successful lead and exit worker2 • goto step 1 leader If a watch is triggered for “.../workers/leader”, followers will restart the leader election process

  19. Locks • id = create(“.../locks/x-”, SEQUENCE|EPHEMERAL) • getChildren(“.../locks”/, false) • if id is the 1 st child, exit locks x-11 • exists(name of last child before id, true) x-19 • if does not exist, goto 2) x-20 • wait for event • goto 2) Each znode watches one other. No herd effect.

  20. Shared Locks • id = create(“.../locks/s-”, SEQUENCE|EPHEMERAL) • getChildren(“.../locks”/, false) • if no children that start with x- locks before id, exit s-11 • exists(name of the last x- before x-19 x-19 x-19 id, true) s-20 • if does not exist, goto 2) s-21 • wait for event x-20 x-22 • goto 2)

  21. ZooKeeper Servers ZooKeeper Service Server Server Server Server Server Server • All servers have a copy of the state in memory • A leader is elected at startup • Followers service clients, all updates go through leader • Update responses are sent when a majority of servers have persisted the change We need 2f+1 machines to tolerate f failures

  22. ZooKeeper Servers ZooKeeper Service Leader Server Server Server Server Server Server Client Client Client Client Client Client Client Client

  23. Current Performance

  24. Summary • Easy to use • High Performance • General • Reliable • Release 3.3 on Apache –See http://hadoop.apache.org/zookeeper –Committers from Yahoo! and Cloudera

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend