completing the big data ecosystem
play

Completing the Big Data Ecosystem: Vines Our Big-Data sqrrl and - PowerPoint PPT Presentation

sqrrl and Accumulo Adam Fuchs and John Completing the Big Data Ecosystem: Vines Our Big-Data sqrrl and Accumulo Perspective How Accumulo Works Adam Fuchs and John Vines Implications for Applications Accumulo in Production Contacts


  1. sqrrl and Accumulo Adam Fuchs and John Completing the Big Data Ecosystem: Vines Our Big-Data sqrrl and Accumulo Perspective How Accumulo Works Adam Fuchs and John Vines Implications for Applications Accumulo in Production Contacts sqrrl data INC. August 3, 2012

  2. Design Drivers sqrrl and Accumulo Analysis of big data is central to our customers’ requirements, in which the Adam Fuchs strongest drivers are: and John Vines Scalability : The ability to do twice the work at only (about) twice the cost. Adaptability : The ability to rapidly evolve the analytical tools available in Our Big-Data Perspective an operational environment, building upon and enhancing existing capabilities. How Accumulo Security : Getting all of the above without giving up secrecy and assurance Works properties. Implications From these directives we can derive the following requirements: for Applications Data-Centric Security to reduce coordination needed between application developers and data providers. Accumulo in Production Simplicity in the overall architecture to encourage participation and Contacts ameliorate learning curve. Generic design patterns to store and organize data whose format we don’t control. Generic discovery analytics to retrieve and visualize generic data.

  3. Optimization sqrrl and Accumulo Adam Fuchs and John Vines ... is a secondary concern, given: hundreds of evolving applications, Our Big-Data Perspective hundreds of changing data sources, How petabytes/exabytes of data, Accumulo Works many complicated interactions . Implications Instead, we need a generic platform that is cheap, simple, scalable, secure, and for adaptable , with pretty good performance. Applications Accumulo in Production Contacts

  4. Key/Value Structure sqrrl and Accumulo Adam Fuchs An Accumulo Key is a 5-tuple, including: and John Vines Row : controls Atomicity Our Big-Data Column Family : controls Locality Perspective Column Qualifier : controls Uniqueness How Accumulo Visibility : controls Access (unique to Accumulo) Works Timestamp : controls Versioning Implications for Applications Sample Entries Accumulo in Row : Col. Fam. : Col. Qual. : Visibility : Timestamp ⇒ Value Production Adam : Favorites : Food : (Public) : 20090801 ⇒ Sushi Contacts Adam : Favorites : Programming Language : (Private) : 20090830 ⇒ Java Adam : Favorites : Programming Language : (Private) : 20070725 ⇒ C++ Adam : Friends : Bob : (Public) : 20110601 ⇒ Adam : Friends : Joe : (Private) : 20110601 ⇒

  5. Visibility Label Syntax and Semantics sqrrl and Accumulo Adam Fuchs and John Document Labels User Authorization Sets Vines Doc 1 : (Federation) CptKirk : { Federation,Human } Our Big-Data Doc 2 : (Klingon|Vulcan) MrSpock : { Federation,Human,Vulcan } Perspective Doc 3 : (Federation & Human & Vulcan) Doc 4 : (Federation & (Human|Vulcan)) How Accumulo Works Syntax Semantics Implications for ⇒ [a-zA-Z0-9 ]+ WORD ( T ⇒ τ ) ∧ ( τ ∈ A ) Applications term CLAUSE ⇒ AND ( T , A ) | = true ⇒ OR Accumulo in AND ⇒ AND & AND ( T ⇒ T 1 & T 2 ) ∧ (( T 1 , A ) | = true) ∧ (( T 2 , A ) | = true) Production ⇒ ( CLAUSE ) and ( T , A ) | = true ⇒ WORD Contacts OR ⇒ OR | OR ( T ⇒ T 1 | T 2 ) ∧ ((( T 1 , A ) | = true) ∨ (( T 2 , A ) | = true)) ⇒ ( CLAUSE ) or ⇒ WORD ( T , A ) | = true ( T ⇒ ( T1 )) ∧ ( T1 | = true) paren ( T , A ) | = true

  6. Tablets sqrrl and Accumulo Adam Fuchs and John Collections of Vines key/value pairs form Tables Our Big-Data Perspective Tables are partitioned into How Accumulo Tablets Works Metadata tablets Implications hold info about for other tablets, Applications forming a Accumulo in three-level Production hierarchy Contacts A Tablet is a unit of work for a Tablet Server

  7. Distributed Processes sqrrl and Accumulo Adam Fuchs and John Vines Our Big-Data Perspective How Accumulo Works Implications for Applications Accumulo in Production Contacts

  8. Tablet Server Composition sqrrl and Accumulo Adam Fuchs and John Vines Quick and loose definitions: Table : A map of keys to values with one global sort order among keys. Our Big-Data Perspective Tablet : A row range within a Table. Tablet Server : The mechanism that hosts Tablets, providing the primary How Accumulo functionality of Bigtable or Accumulo. Works Tablet servers have several primary functions: Implications for 1 Hosting RPCs (read, write, etc.) Applications Managing resources (RAM, CPU, File I/O, etc.) 2 Accumulo in Production Scheduling background tasks (compactions, caching, etc.) 3 Contacts Handling key/value pairs 4 Category 4 is almost entirely accomplished through the Iterator framework .

  9. Tablet Server Data Flow sqrrl and Accumulo Adam Fuchs and John Iterator Uses Vines File Reads Our Big-Data Block Caching Perspective Merging How Deletion Accumulo Works Isolation Locality Groups Implications for Range Selection Applications Column Selection Accumulo in Cell-level Security Production Versioning Contacts Filtering Aggregation Partitioned Joins

  10. The Perils of Distributed Computing sqrrl and Accumulo Adam Fuchs and John Vines Dealing with failures is hard! Our Big-Data Operations like table creation are logically atomic, but consist of multiple Perspective operations on distributed systems. How Accumulo Resource locking (via mutex, semaphores, etc.) provides some sanity. Works Implications Distributed systems have many complicated failure modes: clients, master, for tablet servers, and dependent systems can all go offline periodically. Applications Who is responsible for unlocking locks when any component can fail? Accumulo in Production How do we know it’s safe to unlock a lock? Contacts

  11. Accumulo Testing Procedures sqrrl and Accumulo Adam Fuchs Testing Frameworks and John Vines Unit : Verify correct functioning of each module separately Other Considerations Our Big-Data Perspective System : Perform correctness and Scoping tests to include performance tests on a small How server-side code, client-side code, running instance Accumulo dependent processes, etc. Works Load/Scale : Generate high loads Code coverage vs. path coverage Implications at scale and measure performance for Static vs. dynamic analysis and correctness Applications Simulating failures of distributed Random Walk : Randomly, Accumulo in repeatedly, and concurrently components Production execute a variety of test modules Strange failure modes (often Contacts representative of user activity on hardware/physics-related) an instance at scale Simulation : Evaluate the model to gauge expected performance

  12. Adampotence sqrrl and Accumulo Adam Fuchs and John Vines Our Big-Data Perspective Idempotent: f ( f ( x )) = f ( x ) How Accumulo Works Adampotent: f ( f ′ ( x )) = f ( x ), Implications where f ′ ( x ) denotes partial execution of f ( x ) for Applications Accumulo in Production Contacts

  13. Fault-Tolerant Executor sqrrl and Accumulo Adam Fuchs and John Vines Our Big-Data If a process dies, previously submitted operations continue Perspective to execute on restart. How Accumulo FATE serializes every task in Zookeeper before execution. Works Implications The Master process uses FATE to execute table operations for Applications and administrative actions. Accumulo in FATE eliminates the single point of failure. Production Contacts

  14. Verified State Models sqrrl and Accumulo Adam Fuchs State models used for and John Vines many internal functions Explicit-state model Our Big-Data Perspective checking proves correctness How Accumulo Works Implications for Applications Accumulo in Production Contacts

  15. sqrrl and Accumulo Adam Fuchs and John Vines Our Big-Data Perspective How Accumulo Works Implications for Applications Accumulo in Production Contacts

  16. sqrrl and Accumulo Adam Fuchs and John Vines Our Big-Data Perspective How Accumulo Works Implications for Applications Accumulo in Production Contacts

  17. Event Table with Inverted Index sqrrl and Accumulo Adam Fuchs and John Vines Our Big-Data Perspective How Accumulo Works Implications for Applications Accumulo in Production Contacts

  18. Inverted Index Flow sqrrl and Accumulo Adam Fuchs and John Vines Our Big-Data Perspective How Accumulo Works Implications for Applications Accumulo in Production Contacts

  19. Multidimensional Index sqrrl and Accumulo Adam Fuchs and John Vines Our Big-Data Perspective How Accumulo Works Implications for Applications Accumulo in Production Contacts See also: http://en.wikipedia.org/wiki/Geohash

  20. Graph Table sqrrl and Accumulo Adam Fuchs and John Vines Our Big-Data Perspective How Accumulo Works Implications for Applications Accumulo in Production Contacts

  21. The “shard” Table sqrrl and Accumulo Adam Fuchs and John Vines Our Big-Data Perspective How Accumulo Works Implications for Applications Accumulo in Production Contacts

  22. Appstores sqrrl and Accumulo Adam Fuchs and John Vines Our Big-Data Perspective How Accumulo Works Implications for Applications Accumulo in Production Contacts Reduced barrier to entry Faster app development More innovation!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend