Introduction to the HAMT: Opportunity for T cl 2017 Tcl Conference - - PowerPoint PPT Presentation

introduction to the hamt opportunity for t cl
SMART_READER_LITE
LIVE PREVIEW

Introduction to the HAMT: Opportunity for T cl 2017 Tcl Conference - - PowerPoint PPT Presentation

Introduction to the HAMT: Opportunity for T cl 2017 Tcl Conference Don Porter Tcl/Tk Release Manager Hash Maps in Tcl Dictionaries Array variables Name lookups (commands, vars, etc.) Much much more Most make use of


slide-1
SLIDE 1

2017 Tcl Conference Don Porter Tcl/Tk Release Manager

Introduction to the HAMT: Opportunity for T cl

slide-2
SLIDE 2

Hash Maps in Tcl

  • Dictionaries
  • Array variables
  • Name lookups (commands, vars, etc.)
  • Much much more…

– Most make use of Tcl_HashTable.

  • Customizable
slide-3
SLIDE 3

…..

Hash Map – Giant Bucket Array

Search bucket [ Hash(key) ] for key

2 64

  • Define Hash: Key → index

– Efficient – Range evenly distributed over indices

slide-4
SLIDE 4

Hash Map – Tcl_HashTable

Search bucket [ Hash(key) & mask ] for key

2 3

slide-5
SLIDE 5

…..

Hash Map – Hash Trie

Follow Hash( key ) path to leaf storing key

1

slide-6
SLIDE 6

…..

Hash Map – Hash Trie

Eliminate empty buckets and paths

1

slide-7
SLIDE 7

Hash Map – Hash Trie

Store hashes – shorten paths w/o branches

1

slide-8
SLIDE 8

Hash Map – Hash Trie

Store node IDs – shorten paths w/o branches

1

slide-9
SLIDE 9

Hash Array-Map Trie (HAMT)

Structure nodes as array maps

0011 1100 1100 0110

slide-10
SLIDE 10

Array Map Encoding

  • Two bits encoding bucket leaf children

– Bit n is set → child n is a bucket

  • Hash and leaf pointer are stored in array
  • Two bits encoding subnode children

– Bit n is set → child n is a subnode

  • Pointer to subnode is stored in array
slide-11
SLIDE 11

…..

Removal Operation

1

slide-12
SLIDE 12

Removal Operation – Tcl_HashTable (Destructive)

2 3 2 3

slide-13
SLIDE 13

Removal Operation – HAMT (non-destructive)

0011 1100 1100 0110 0110 OLD NEW

slide-14
SLIDE 14

IMMUTABILITY

  • Values as Read-only structures
  • Matches value semantics of Tcl
  • Alternative to Copy on Write

– CoW is a discipline to implement immutable

values out of mutable foundations

slide-15
SLIDE 15

...on Steroids

  • Presented as binary tree

– Two two-bit encoding maps per node – Easy to draw and explain – Inessential

  • Implemented as 64-ary tree

– Two 64-bit encoding maps per node – Shallow, wide trees → few hops in lookup – Depth of 11 covers entire 16 exbibyte capacity

slide-16
SLIDE 16

Demo: dict vs hamt

% set data [lmap _ [lrepeat 20000 {}] tcl::mathfunc::rand] % set d [dict create {*}$data] % time {foreach {k v} $data {set d [dict remove $d $k]}}

  • > 23839420 microseconds per iteration

% set h [hamt create {*}$data] % time {foreach {k v} $data {set h [hamt remove $h $k]}}

  • > 77113 microseconds per iteration

% set d [dict create {*}$data] % time {foreach {k v} $data {dict unset d $k}}

  • > 28610 microseconds per iteration
slide-17
SLIDE 17

The Enemy

slide-18
SLIDE 18

Merge Demo

% time {set d [dict merge $d1 $d2]} → 681783 microseconds per iteration % time {dict merge $d $d} → 1032838 microseconds per iteration % time {dict merge $d $d1} → 927085 microseconds per iteration % time {set h [hamt merge $h1 $h2]} → 294936 microseconds per iteration % time {hamt merge $h $h} → 65 microseconds per iteration % time {hamt merge $h $h1} → 218641 microseconds per iteration

slide-19
SLIDE 19

More dict vs hamt

  • For one hashmap, hamt uses more memory.
  • For set of related hashmaps, will use less.
  • Operation speeds are competitive. (oom)
  • Avoids copy catastrophe by design
  • Still prototype quality

– Known improvement avenues

  • Immutability benefits...
slide-20
SLIDE 20

Immutable Hashmap Benefits

  • Read-only values share easily

– Think “threads”

  • Keep useful checkpoints

– Think built-in command set of an interp.

  • Controlled teardowns

– Think namespace delete

  • Caching and Epochs

– No epoch for something that does not change

  • Scaling?
slide-21
SLIDE 21

How can I try it?

  • Branch dgp-refactor in the Tcl fossil

repository.

– https://core.tcl.tk/tcl

  • [hamt info] reports interesting details.
  • Comments welcome.
slide-22
SLIDE 22

Relaxed Radix Balance (RRB) Tree

  • HAMT : Hashmap :: RRB : Sequence

– Think “list” – Think “string” (list of characters)

  • Foundation of the Clojure Vector
  • Stay Tuned!
slide-23
SLIDE 23

Conclusions

  • Protoype HAMT implementation underway

– Basic functions complete.

  • Initial testing shows promise

– Not yet a clear failure.

  • Immutable structures are useful tools.
  • Other immutable structure opportunities.
  • Further work is needed.