Oak, the Architecture of the new Repository Michael Drig, Adobe - PowerPoint PPT Presentation

Oak, the Architecture of the new Repository Michael Dürig, Adobe Research Switzerland

Design goals • Scalable • Big repositories • Clustering • Customisable, flexible • OSGi friendly 5.11.14 ¡ 2 ¡

Outline • CRUD • Changes • Search 5.11.14 ¡ 3 ¡

Tree model a d b c 5.11.14 ¡ 4 ¡

Updating ? ¡ a d x b c 5.11.14 ¡ 5 ¡

MVCC HEAD r1: / r2: / a d r1: /d r2: /d b c r1: /a/b r2: /d/x r2: /a/b 5.11.14 ¡ 6 ¡

Refresh and Garbage Collection

Refresh garbage 5.11.14 ¡ 8 ¡

Garbage collection garbage 5.11.14 ¡ 9 ¡

Concurrency and Conflicts

Concurrent updates r2a r1 r2b 5.11.14 ¡ 11 ¡

Merging r2a updates merge r1 r3 r2b 5.11.14 ¡ 12 ¡

Conflict handing: serialisation • Fully serialised – Fail, no concurrent update • Partially serialised – Concurrent conflict free updates 5.11.14 ¡ 13 ¡

Conflict handling strategies: merging • Partial merge – Conflict markers, deferred resolution • Full merge – Need to choose victim 5.11.14 ¡ 14 ¡

Replicas and Shards

Replica and caches master copy full replica cache 5.11.14 ¡ 16 ¡

Sharding strategies by path by level by hash with caching 5.11.14 ¡ 17 ¡

Implementations

MicroKernel / NodeStore • Tree / Revision model implementation Responsible for Not responsible for Clustering Validation Sharding Access control Caching Search Conflict handling Versioning 5.11.14 ¡ 19 ¡

Current implementations DocumentMK TarMK (SegmentMK) Persistence MongoDB, JDBC Local FS Conflict handling Partial serialisation Full serialisation Clustering MongoDB clustering Simple failover Sharding MongoDB sharding N/A Node Performance Moderate High Key use cases Large deployments (>1TB), Small/medium concurrent writes deployments, mostly read 5.11.14 ¡ 20 ¡

Access Control

Accessible paths a d b c 5.11.14 ¡ 22 ¡

xistentialism • All paths traversable – Node may not exist – Decorator on NodeStore ⟹ false ¡ root.getChildNode("a"). exists (); root.getChildNode("a") ⟹ true ¡ .getChildNode("b"). exists (); 5.11.14 ¡ 23 ¡

Comparing Revisions

Content di ff • What changed between trees • Cornerstone for – Validation – Indexing – Observation – … 5.11.14 ¡ 25 ¡

What changed? ∆ 5.11.14 ¡ 26 ¡

Example: merging ∆ r2a r1 ➞ r2a   “a” modified “b” removed r3 r1 ∆ r2b r1 ➞ r2b “d” modified “x” added 5.11.14 ¡ 27 ¡

Commit Hooks

Commit hooks • Key plugin mechanism – Higher level functionality • Validation (node type, access control, …) • Trigger (auto create, defaults, …) • Updates (index, …) 5.11.14 ¡ 29 ¡

Editing a commit ∆ ∆ + x 5.11.14 ¡ 30 ¡

Commit hooks • Based on content di ff – pass a commit – fail a commit – edit a commit • Applied in sequence 5.11.14 ¡ 31 ¡

Type of hooks CommitHook Editor Validator Content di ff Optional Always Always Can modify Yes Yes No Programming Simple Callbacks Callbacks model Performance High Medium Low impact 5.11.14 ¡ 32 ¡

Observers

Observers • Observe changes – After commit – Often does a content di ff – Asynchronous – Optionally synchronous • Local cluster node only 5.11.14 ¡ 34 ¡

Examples • JCR observation • External index update • Cache invalidation • Logging 5.11.14 ¡ 35 ¡

Search

Query Engine parse execute post process SELECT Parser Index WHERE x=y /a//* Parser Index Parser Index Parser Traverse 5.11.14 ¡ 37 ¡

Index Implementations • Property (ordered) • Reference • Lucene – In-content or file system • Solr – Embedded or external 5.11.14 ¡ 38 ¡

Big Picture

Big picture JCR API Oak JCR Plugins Oak API Oak Core NodeStore API MicroKernel 5.11.14 ¡ 40 ¡

Resources http://jackrabbit.apache.org/oak/ 5.11.14 ¡ 41 ¡

Appendix

Resources http://jackrabbit.apache.org/oak/ http://jackrabbit.apache.org/oak/docs/ https://svn.apache.org/repos/asf/jackrabbit/ oak/trunk/ 5.11.14 ¡ 43 ¡

Session Notes

This presentation is mainly about Oak’s architecture and design. Understanding these concepts gives crucial insight in how to make the most out of Oak and to why Oak might behave differently than Jackrabbit 2 in some cases. 5.11.14 ¡ 45 ¡

Jackrabbit Oak started early 2012 with some initial ideas dating back as far as 2008. It became necessary as many parts of Jackrabbit 2 outgrew their original design. Most of Jackrabbit 2’s features date back to the 90-ies and are not well suited for today's requirements. Oak was designed to overcome those challenges and to serve as the foundation of modern web content management systems. Key design goals: scalable writes. The web is not read only any more. • large amounts of data. There is much more as a few web pages nowadays. • Built in clustering. Instead of built on top • Customisable • OSGi friendly • Since Oak doesn't need to be the JCR reference implementation, we gained some additional design space by not having to implement all of the optional features (like e.g. same name siblings and support for multiple work spaces). 5.11.14 ¡ 46 ¡

CRUD: this presentation first covers the underlying persistence model: the tree model and basic • create, read, update and delete operations. Changes: being able to track changes between different revisions of a tree turns out to be crucial for • building higher level functionality. Search: while nothing much changed on the outside, search is completely different in Oak wrt. • Jackrabbit 2. 5.11.14 ¡ 47 ¡

Let’s consider a simple hierarchy of nodes. Each node (except the root) has a single parent and any number of child nodes. The parent-child relationships are named, i.e. each child has a unique name within its parent. This makes it possible to uniquely identify any node using its path: a user can access all content by path starting from the root node. This is a key different to Jackrabbit 2 where each node was assigned an unique id to look it up from the persistence store. In Oak nodes are always addressed its path from the root. In this sense Oak stores (sub) trees while Jackrabbit 2 stores key value pairs. In Oak one traverses down from the root following a path while in Jackrabbit 2 traversal was from a node to its parent up to the root. Tree persistence vs. key/value persistence • Path vs. UID as primary identifier • Traversing down vs. traversing up • 5.11.14 ¡ 48 ¡

Let’s consider what happens when another user updates parts of the tree. For example adds a new node at /d/x. Such in place changes might confuse other users whose tree suddenly change. This is how Jackrabbit 2 works, each update is immediately made visible to all users. Unfortunately, beyond the potential for confusion, this design turns out to be a major concurrency bottleneck, as the synchronisation overhead of keeping everyone aware of all changes as they happen becomes very high. The existing Jackrabbit architecture was heavily optimized for mostly-read use cases, with only occasional and rarely concurrent content updates. Unfortunately that optimisation no longer works too well with increasingly interactive web sites and other content applications where all users are potential content editors. More generally the way such state transitions are handled has a major impact on how efficiently a system can scale up to handle lots of concurrent updates. Many noSQL systems use the concept of eventual consistency which leaves the rate (and often order) at which new updates become visible to users undefined. This solves the concurrency issue, but can lead to even more confusion as it might not be possible to clearly define the exact state of the repository. The hierarchical structure of Oak allows us to solve both of these issues by borrowing an idea from version control systems like Git or Subversion. 5.11.14 ¡ 49 ¡

Oak, the Architecture of the new Repository Michael Drig, Adobe - PowerPoint PPT Presentation

Oak, the Architecture of the new Repository Michael Drig, Adobe Research Switzerland Design goals Scalable Big repositories Clustering Customisable, flexible OSGi friendly 5.11.14 2 Outline CRUD Changes

UCHIWA / Design by Doshi Levien HAY UCHIWA HAY UCHIWA Oak Black Stained Oak Black Stained

MDF and Oak Kitchen 1 Constructed from MDF with solid oak worktops MDF and Oak Kitchen 2 Kitchen

WHIT ITE OAK SCIE IENCE GATEWAY 1 White Oak Science Gateway Master Plan 2014 to 2018

Oak Improvement Programme John Fennessy Chairman of Future Trees Trust Oak Group and Trustee of

WHITE OAK SCIENCE GATEWAY THE FUTURE OF THE EAST COUNTY 1 White Oak Science Gateway Master Plan

EBB & FLOW EBB & FLOW LIVE OAK LIBRARY ANNEX LIVE OAK LIBRARY ANNEX ONLINE SURVEY

WHITE OAK SCIENCE GATEWAY THE FUTURE OF THE EAST COUNTY 1 White Oak Science Gateway Master Plan

OAK HILL PARKWAY Texas Transportation Commission Dec. 13, 2018 Oak Hill Parkway Project Dec.

Ci City of of R Red d Oak an ak and d Red O Oak ak ISD SD Community P y Partnersh ship

Following Through: Oak Ridges Cleanup Program Continues Forward Laura Wilkerson, Deputy

OAK HILL PARKWAY Industry Workshop October 11, 2018 Oak Hill Parkway Design-Build Project

Limited Use Repository Updates Citizens Coordination Council April 18, 2018 Craig Cameron U.S.

Repository (IDR) Dr. Chris Harle Becky Liao Integrated Data Repository (IDR) Mar. 3, 2020

Status of the Repository at Status of the Repository at Yucca Mountain Presented to: DOE-EM

Grid Data Repository Dariush Shirmohammadi FERC Technical Conference June 28, 2018 Agenda

Sydney eScholarship Repository and DSpace Sten Christensen & Gary Browne Sydney eScholarship

Luca Bedogni e Luciano Bononi Dipartimento di Informatica: Scienza e Ingegneria Universit di

Luca Bedogni e Luciano Bononi Dipartimento di Informatica: Scienza e Ingegneria Universit di

A R C H I T E C T U R E O F A C L O U D S E R V I C E U S I N G P Y T H O N T E C H N O L O

We are not a perfect church We are not a perfect people We are here because we know we need

E fg e c t i v e W e b A p p l i c a t i o n D e v e l o p m e n t

Installing TYPO3 5.0 TYPO3 Developer Days 25.-29.04.2007, Dietikon / Switzerland Inspiring people

Database Design Theory and Normalization CS 377: Database Systems Midterm: Gradescope Logistics

MA111: Contemporary mathematics Entrance Slip (due 5 min past the hour): say Why or why not? If

Oak, the Architecture of the new Repository Michael Drig, Adobe - PowerPoint PPT Presentation

Oak, the Architecture of the new Repository Michael Drig, Adobe Research Switzerland Design goals Scalable Big repositories Clustering Customisable, flexible OSGi friendly 5.11.14 2 Outline CRUD Changes

UCHIWA / Design by Doshi Levien HAY UCHIWA HAY UCHIWA Oak Black Stained Oak Black Stained

MDF and Oak Kitchen 1 Constructed from MDF with solid oak worktops MDF and Oak Kitchen 2 Kitchen

WHIT ITE OAK SCIE IENCE GATEWAY 1 White Oak Science Gateway Master Plan 2014 to 2018

Oak Improvement Programme John Fennessy Chairman of Future Trees Trust Oak Group and Trustee of

WHITE OAK SCIENCE GATEWAY THE FUTURE OF THE EAST COUNTY 1 White Oak Science Gateway Master Plan

EBB &amp; FLOW EBB &amp; FLOW LIVE OAK LIBRARY ANNEX LIVE OAK LIBRARY ANNEX ONLINE SURVEY

WHITE OAK SCIENCE GATEWAY THE FUTURE OF THE EAST COUNTY 1 White Oak Science Gateway Master Plan

OAK HILL PARKWAY Texas Transportation Commission Dec. 13, 2018 Oak Hill Parkway Project Dec.

Ci City of of R Red d Oak an ak and d Red O Oak ak ISD SD Community P y Partnersh ship

Following Through: Oak Ridges Cleanup Program Continues Forward Laura Wilkerson, Deputy

OAK HILL PARKWAY Industry Workshop October 11, 2018 Oak Hill Parkway Design-Build Project

Limited Use Repository Updates Citizens Coordination Council April 18, 2018 Craig Cameron U.S.

Repository (IDR) Dr. Chris Harle Becky Liao Integrated Data Repository (IDR) Mar. 3, 2020

Status of the Repository at Status of the Repository at Yucca Mountain Presented to: DOE-EM

Grid Data Repository Dariush Shirmohammadi FERC Technical Conference June 28, 2018 Agenda

Sydney eScholarship Repository and DSpace Sten Christensen &amp; Gary Browne Sydney eScholarship

Luca Bedogni e Luciano Bononi Dipartimento di Informatica: Scienza e Ingegneria Universit di

Luca Bedogni e Luciano Bononi Dipartimento di Informatica: Scienza e Ingegneria Universit di

A R C H I T E C T U R E O F A C L O U D S E R V I C E U S I N G P Y T H O N T E C H N O L O

We are not a perfect church We are not a perfect people We are here because we know we need

E fg e c t i v e W e b A p p l i c a t i o n D e v e l o p m e n t

Installing TYPO3 5.0 TYPO3 Developer Days 25.-29.04.2007, Dietikon / Switzerland Inspiring people

Database Design Theory and Normalization CS 377: Database Systems Midterm: Gradescope Logistics

MA111: Contemporary mathematics Entrance Slip (due 5 min past the hour): say Why or why not? If

EBB & FLOW EBB & FLOW LIVE OAK LIBRARY ANNEX LIVE OAK LIBRARY ANNEX ONLINE SURVEY

Sydney eScholarship Repository and DSpace Sten Christensen & Gary Browne Sydney eScholarship