Enhancing the Linux Radix Tree MATTHEW WILCOX LINUXCON NORTH - PowerPoint PPT Presentation

Enhancing the Linux Radix Tree MATTHEW WILCOX LINUXCON NORTH AMERICA 2016-08-24

Overview  What is a Radix Tree?  What is it used for?  Large entries in the Radix Tree  Radix Tree Test Suite  Other radix trees  Radix Tree Memory Consumption  RCU and the Radix Tree

What is a Radix Tree?  Wikipedia says Radix Trees are all about strings  Used for string compression and inverted indices of text documents  Linux says Radix Trees are all about converting small integers to pointers  I think of it as a resizable array of pointers  The Linux Radix Tree appears to be an independent reinvention of the Judy Array

How does it work?  Each layer of the Radix Tree contains 64 pointers  The “next” 6 bits of the index determine which pointer to use  If this is the last level, the pointer is a user pointer  If not the last level, the pointer points to the next layer  Other tree metadata is also stored at each layer:  Tags, height (shift), reference count, parent pointer, offset in parent

RCU and the Radix Tree  With care, some radix tree functions can be used with only rcu_read_lock protection  Which (depending on kernel config options) may mean no protection  Many CPUs may be walking the tree at the same time another CPU is inserting or deleting an entry from the tree  The user may get back a stale pointer from the tree walk, but it is guaranteed to be a pointer which was in the tree for that index at some point  Radix Tree frees tree nodes using RCU, so any CPU holding the read lock is guaranteed not to reference freed memory

Height 2 Radix Tree Root S=6 S=0 User Node Ptr Node Node Node Ptr NULL NULL Node Ptr

How is it different from other trees?  Tree points to objects  RB trees embed an rb_node in data structures  All data at leaves; no data in intermediate nodes  Never needs to be rebalanced  A tree of height N can contain any index between 0 and 64 𝑂 -1  If the new index is larger than the current max index, insert new nodes above the current top node to create a deeper tree  If deleting an element results in a top node with only one child at offset 0, replace the top node with its only child, creating a shallower tree

Removing an entry Root S=6 S=0 User Ptr Node Ptr Node NULL Node Ptr Node Node Node Ptr NULL

Removing an entry Root S=6 S=0 User Node Ptr Node Node NULL Ptr NULL NULL Node Ptr

Removing an entry Root S=0 User Ptr Node Node Ptr NULL Node Ptr

What is it used for?  Most important user is the page cache  Every time we look up a page in a file, we consult the radix tree to see if the page is already in the cache  Also used by dozens of places in the kernel which want a resizable array  Drivers, filesystems, interrupt controllers  More places should use it  E.g. nvme driver

Tagged entries in the Radix Tree  Primary user is the page cache  Pages are tagged as dirty, under writeback, or to be written  Radix tree can be searched for entries with any of the three bits set  Tags are replicated all the way up to the root  Setting a tag sets it on all parents  Clearing a tag may clear it on a parent if all other entries are also clear

Large pages in the page cache  Multiple indices return the same pointer  E.g. indices 512-1023 all refer to the same huge page  Support aligned power-of-two size entries  No need for entries which are not a power of two in size  No need for entries which are not aligned to a multiple of their size  Coalesce multiple small entries into a large entry  Split a large entry into multiple small entries

Three solutions Insert 512 4kB entries for each 2MB page 1. Search the tree once for 2MB pages, then again for 4kB pages 2. Modify the radix tree to support entries with an order > 0 3.

Multi-order support  Mark entries as being user pointers or internal nodes  Concept already existed, just needed to be broadened  If the fan-out of the radix tree happens to match the order of the entry, simply insert the entry at the right place in the tree  Otherwise need to refer from sibling slots to canonical slot  Need to ensure tags are set/cleared only on canonical slot

Large entry Root S=12 S=6 S=0 User Node Page Node Node Page Node Page Node Page Sblg Node Sblg Sblg

Splitting a large page Root S=12 S=6 S=0 User Node Page Node Node Page Node Page Retry Node Node Retry Node Retry Node Retry Node Retry Node Retry Node Retry Node Retry

Radix Tree Test Suite  Originally written by Nick Piggin (we believe)  Curated by Andrew Morton out of tree for many years  Merged into Linux 4.6  Many tests added since  More tests needed

Other radix trees in the Linux kernel  assoc_array  Maps large binary blobs to pointers  e.g. NFS file handles  Has a neat trick to handle very sparse areas which we should steal  IDR  Less efficient implementation of the radix tree

What’s wrong with the IDR?  Two implementations of same data structure is bad  No test suite that can be easily found  IDR root larger than Radix Tree root  Uses 256 pointers per level instead of 64  Wastes memory on trees of almost all sizes  Interface can be re-implemented on radix tree core, saving over a kilobyte of code

Radix Tree Memory Consumption  Fundamental unit of memory consumption in Linux is the page  SLAB allocator used for allocations smaller than a page  On 64-bit x86, with 64 pointers per node, we can allocate 7 nodes per page  64 pointers × 8 bytes per pointer = 512 bytes  Plus ~64 bytes of overhead per node, need a 576 byte allocation  With 128 pointers per node, we allocate 3 nodes per page  With 256 pointers per node, we allocate 3 nodes per two pages

IDR API  Alloc (find an empty slot)  Alloc_Cyclic  For_Each  Destroy  Preload  Get_Next  Replace  Remove  Init  Is_Empty

Re-implementing the IDR  Tag mechanism repurposed to track empty entries  Alloc_Cyclic is two calls to Alloc  Radix Tree iteration interface implements For_Each interface  Destroy implemented by freeing each entry

Microsoft ❤ Linux

Preloading  Some Radix Tree users cannot sleep when they want to insert an entry  The Radix Tree keeps a per-CPU list of pre-allocated nodes  The IDR keeps a per-tree list of pre-allocated nodes

Enhancing the Linux Radix Tree MATTHEW WILCOX LINUXCON NORTH - PowerPoint PPT Presentation

Enhancing the Linux Radix Tree MATTHEW WILCOX LINUXCON NORTH AMERICA 2016-08-24 Enhancing the Linux Radix Tree MATTHEW WILCOX LINUXCON NORTH AMERICA 2016-08-24 Overview What is a Radix Tree? What is it used for? Large entries

Experiential Learning Requirement ELR is Grounded in the Business Schools Strategic Plan

Enhancing and Integrating Whole Enhancing and Integrating Whole Student Learning and Engagement

U.S. NRC Regulatory Initiatives in Enhancing U.S. NRC Regulatory Initiatives in Enhancing

Enhancing Employability in OIC Countries: Enhancing Employability in OIC Countries: The Role of

Enhancing Global Competitiveness: Enhancing Global Competitiveness: University Ranking Movement

Enhancing the Scope and Quality of Enhancing the Scope and Quality of Mathematics Teacher

Enhancing Customs collaboration to Enhancing Customs collaboration to combat the trade in illegal

Enhancing global security through space Enhancing global security through space the role of

Enhancing Academic Enhancing Academic Advisement Using the Advisement Using the First-Year

Enhancing Privacy in Machine Learning Mathias Humbert INSA Toulouse/CNRS Toulouse, January 22,

4 I SYSTEMS ENHANCING C 4 I SYSTEMS ENHANCING C WITH WITH ACTIONABLE HUMAN TERRAIN ACTIONABLE

PRIVACY ENHANCING TECHNOLOGIES INTRODUCTION INTRODUCTION TO PRIVACY ENHANCING TECHNOLOGIES OUR

REEF STREETS BIO-ENHANCING CONCRETE BIO-ENHANCING CONCRETE ARMOR UNITS TIDE POOLS STATISTICAL

The E-FORCE: Enhancing Resilience and Performance Dr. Joseph W. Janesz December 2013 1

Enhancing Gut Health in Transition Cows Anne H. Laarman, PhD Assistant Professor, Dairy

SARNET ENHANCING RESILIENCE AGAINST CYBER ATTACKS Frank Fransen | 5 October 2016 SARNET -

Almost Gorenstein rings Naoki Taniguchi Waseda University Colloquium at West Virginia University

A relational localisation theory for topological algebras Friedrich Martin Schneider Technische

Institute of Ecologys Research Roadmap in 2011 2015 Research Themes & Location 2011

A Branching Process Approach to Power Markets Simone Scotti Universit e Paris-Diderot Joint

r r rts

Low Rank Matrix Completion: A Smoothed 0 -Search Wei Dai Jointly with Guangyu Zhou and

Quantum Entanglement and Local Excitations Pawe Caputa HMSCS, GGI, 12/03/2015 Based on :

Public key cryptography: a practical Public key cryptography: a practical approach approach

Sambuz

Useful Links

Newsletter

Mail Us

Enhancing the Linux Radix Tree MATTHEW WILCOX LINUXCON NORTH - PowerPoint PPT Presentation

Enhancing the Linux Radix Tree MATTHEW WILCOX LINUXCON NORTH AMERICA 2016-08-24 Enhancing the Linux Radix Tree MATTHEW WILCOX LINUXCON NORTH AMERICA 2016-08-24 Overview What is a Radix Tree? What is it used for? Large entries

Experiential Learning Requirement ELR is Grounded in the Business Schools Strategic Plan

Enhancing and Integrating Whole Enhancing and Integrating Whole Student Learning and Engagement

U.S. NRC Regulatory Initiatives in Enhancing U.S. NRC Regulatory Initiatives in Enhancing

Enhancing Employability in OIC Countries: Enhancing Employability in OIC Countries: The Role of

Enhancing Global Competitiveness: Enhancing Global Competitiveness: University Ranking Movement

Enhancing the Scope and Quality of Enhancing the Scope and Quality of Mathematics Teacher

Enhancing Customs collaboration to Enhancing Customs collaboration to combat the trade in illegal

Enhancing global security through space Enhancing global security through space the role of

Enhancing Academic Enhancing Academic Advisement Using the Advisement Using the First-Year

Enhancing Privacy in Machine Learning Mathias Humbert INSA Toulouse/CNRS Toulouse, January 22,

4 I SYSTEMS ENHANCING C 4 I SYSTEMS ENHANCING C WITH WITH ACTIONABLE HUMAN TERRAIN ACTIONABLE

PRIVACY ENHANCING TECHNOLOGIES INTRODUCTION INTRODUCTION TO PRIVACY ENHANCING TECHNOLOGIES OUR

REEF STREETS BIO-ENHANCING CONCRETE BIO-ENHANCING CONCRETE ARMOR UNITS TIDE POOLS STATISTICAL

The E-FORCE: Enhancing Resilience and Performance Dr. Joseph W. Janesz December 2013 1

Enhancing Gut Health in Transition Cows Anne H. Laarman, PhD Assistant Professor, Dairy

SARNET ENHANCING RESILIENCE AGAINST CYBER ATTACKS Frank Fransen | 5 October 2016 SARNET -

Almost Gorenstein rings Naoki Taniguchi Waseda University Colloquium at West Virginia University

A relational localisation theory for topological algebras Friedrich Martin Schneider Technische

Institute of Ecologys Research Roadmap in 2011 2015 Research Themes &amp; Location 2011

A Branching Process Approach to Power Markets Simone Scotti Universit e Paris-Diderot Joint

r r rts

Low Rank Matrix Completion: A Smoothed 0 -Search Wei Dai Jointly with Guangyu Zhou and

Quantum Entanglement and Local Excitations Pawe Caputa HMSCS, GGI, 12/03/2015 Based on :

Public key cryptography: a practical Public key cryptography: a practical approach approach

Sambuz

Useful Links

Newsletter

Mail Us

Institute of Ecologys Research Roadmap in 2011 2015 Research Themes & Location 2011