Clojure Hash Maps: plenty of room at the bottom @spinningtopsofdoom - - PowerPoint PPT Presentation

clojure hash maps
SMART_READER_LITE
LIVE PREVIEW

Clojure Hash Maps: plenty of room at the bottom @spinningtopsofdoom - - PowerPoint PPT Presentation

Clojure Hash Maps: plenty of room at the bottom @spinningtopsofdoom @2kliph @bendyworks Building an alien space ship Avoiding the gray goo scenario when making nano machines What cup of tea is best to power your Infinite Improbability


slide-1
SLIDE 1

Clojure Hash Maps:

plenty of room at the bottom

@spinningtopsofdoom @2kliph @bendyworks

slide-2
SLIDE 2

Building an alien space ship

  • Avoiding the gray goo scenario when making

nano machines

  • What cup of tea is best to power your Infinite

Improbability Drive (earl gray hot)

  • How to make the spaceship bigger on the inside

then on the outside

slide-3
SLIDE 3

Talk about real alien technology

slide-4
SLIDE 4
slide-5
SLIDE 5

Immutability: a cornerstone of functional programming

slide-6
SLIDE 6

See it's used in

  • Scala
  • Elixir
  • Haskell
  • Clojure
slide-7
SLIDE 7

Why immutable?

  • Deeply nested heterogeneous data
  • Send data off to another part of the code: fire

and forget :)

  • Fast delta diffing

– E.g. React shouldComponentUpdate

slide-8
SLIDE 8

There's always a catch

  • Orders of magnitude slower
  • Efficient implementations have constraints, like

sortable keys, storing deltas in the data structure itself

– Increasing cognitive overhead for developers

slide-9
SLIDE 9

Hash Array Mapped Tries provide performance improvements

  • 2 to 3 times slower for common operations

– That's a lot better than an order of magnitude

slower

  • No constraints

– Only need a hashable key

  • Reduced cognitive overhead
slide-10
SLIDE 10

Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections

by Michael J. Steindorfer and Jurgen J. Vinju

slide-11
SLIDE 11

Compressed Hash-Array Mapped Prefix-tree CHAMP

slide-12
SLIDE 12

ClojureScript Implementation

https://github.com/bendyworks/lean-map

slide-13
SLIDE 13

CHAMP gives you guaranteed Hash Map performance gains

  • Iteration by 2x
  • Equality checking by 10x to 100x
slide-14
SLIDE 14

CHAMP trims your Hash Maps

slide-15
SLIDE 15
slide-16
SLIDE 16

CHAMP makes Hash Maps more wieldy, making them both simpler and easier Code size is two thirds the size of the original implementation

slide-17
SLIDE 17

Overview of Clojure Hash Maps

slide-18
SLIDE 18

Clojure Hash Maps tree of nodes 32 way branching factor

slide-19
SLIDE 19

Node internals

:foo 3 5 :bar nil metadata Key :foo Key 3

slide-20
SLIDE 20

How a key finds a node

20 10 18 3 5 1 26 Key: :foo Hash: 1268894036 20 10 18

slide-21
SLIDE 21

First major improvement

Removes problems with sub node references

slide-22
SLIDE 22

Sub node reference is a psuedo Key Value pair with nil as the "key"

:foo 3 5 :bar nil

slide-23
SLIDE 23

Doubles overhead for each sub node reference

slide-24
SLIDE 24

Adds incidental complexity

  • Needs a flag for nil key and field for nil

values

  • Optimized node (Array Node) just containing

sub node references

– Happens when normal node's array has 32

elements

  • Further complications with second problem
slide-25
SLIDE 25

Sub node references are scattered throughout a nodes array

6 6 nil :foo :bar 3 3 nil

slide-26
SLIDE 26

Combined with nil marker value makes that you you have to ask "Is it a Key Value pair or sub node reference?" for every operation

slide-27
SLIDE 27

Makes iteration a wiki walk

slide-28
SLIDE 28

The Roman Empire was the post- Roman Republic period

slide-29
SLIDE 29

The Roman Republic was the period

  • f ancient Roman civilization

beginning with the

slide-30
SLIDE 30

Lots more link clicking...

slide-31
SLIDE 31

Awareness is the ability to perceive, to feel, or to be conscious of events,

  • bjects, thoughts, emotions, or

sensory patterns

slide-32
SLIDE 32

What was the next word after Roman Republic?

slide-33
SLIDE 33

Wiki Walk Iteration

  • Bad locality

– Blows the stack – CPU caches are never hot

slide-34
SLIDE 34

CHAMP node improvements

slide-35
SLIDE 35

Key Value Pairs in front, Sub Node references in back

6 6 :foo :bar 3 3

slide-36
SLIDE 36

Decomplect metadata

6 6 :foo :bar 3 3 metadata KV metadata node metadata

slide-37
SLIDE 37

Lower memory overhead by removing nil marker values

slide-38
SLIDE 38

Removes all sub node incidental complexity

  • nil key flag
  • nil value field
  • Array Node
  • Check for Key Value or Sub node reference
slide-39
SLIDE 39

2X speedup by changing iteration from wiki walk to a linear scan

slide-40
SLIDE 40

Original Hash Map iteration algorithm (pseudocode)

  • If nil flag is true return [nil, <nil value>]
  • For normal nodes

– If key is not nil then return the Key Value pair – Otherwise go to sub node and repeat

  • For Array node

– If element is nil continue – Otherwise go to sub node and repeat

slide-41
SLIDE 41

CHAMP iteration algorithm

1.Iterate though Key Value pairs 2.Iterate through sub node(s) repeating step one

slide-42
SLIDE 42

Comparison

  • Seven lines vs two lines
  • Three conditionals vs none
  • Polymorphism vs no polymorphism
slide-43
SLIDE 43

CHAMP Equality Check improvements

slide-44
SLIDE 44

Clojure Puzzler Sloppy Cleaning

slide-45
SLIDE 45

(def base-map (hash-map)) (def one-million 1000000) (def full-map (reduce (fn [m i] (assoc m i 0)) base-map (range one-million))) (def same-map (reduce (fn [m i] (dissoc m I)) full-map (range one-million))) (= base-map same-map) ;; true (time (into {} base-map)) ;; 140 microseconds (time (into {} same-map)) ;; ??? microseconds

slide-46
SLIDE 46

A) 140 microseconds B) 280 microseconds C) 1400 microseconds D) 14000 microseconds E) 31000 microseconds

slide-47
SLIDE 47

E) 31000 microseconds

slide-48
SLIDE 48

Original Delete Algorithm

6 6 :foo :bar 6 6 :foo :bar 3 3

slide-49
SLIDE 49

This leads to

slide-50
SLIDE 50

3 3 4 4 5 5 6 6 nil nil nil nil 2 2 1 1 nil nil

slide-51
SLIDE 51

nil nil nil nil nil nil

empty node empty node empty node empty node

slide-52
SLIDE 52

CHAMP Delete Algorithm

slide-53
SLIDE 53

1 1 3 3 2 2 1 1 2 2

slide-54
SLIDE 54

1 1 3 3 2 2 1 1 2 2 1 1 2 2

slide-55
SLIDE 55

Lowers memory overhead that

  • ccurs from deletion
slide-56
SLIDE 56

So what? This only really matters in pathological cases Equal CHAMP maps have the exact same layout in memory We don't have to compare all Key Values we can compare nodes (pointer equality)

slide-57
SLIDE 57

Equality check is now O(log n) vs O(n) leading to 100x performance improvement

Assuming maps share structure

slide-58
SLIDE 58

Structural Sharing

slide-59
SLIDE 59

We still get 10x performance boost for maps don't share any structure

  • Original comparison has overhead due to

Clojure abstractions (sequences and lookup)

  • CHAMP comparison is only comparing two

arrays

slide-60
SLIDE 60

Caveats

  • Javascript version: addition: 8% slower;

deletion: 10 - 20% slower

– Compared to current ClojureScript version

  • JVM version: comparable speed to HAMT

– Used in Rascal (Steindorfer & Vinju) – Christopher Grand has ported CHAMP to Java

using Clojure's hashing functions

slide-61
SLIDE 61

CHAMP improvements paves the way for future improvements

CHAMP internals are much easier to work with and reason about

slide-62
SLIDE 62

Two Future possibilities

  • Merge and Diff operations could have greatly

increased performance

  • Similar to RRB Vectors for Vectors
slide-63
SLIDE 63

Interesting work on merging

  • Christopher Grand is investigating using CHAMP as a

basis for confluent hash maps

– Uses node metadata to mark transient / persistent nodes – Removes marker objects needed for addition and deletion – Makes CHAMP able to merge hash maps in O(log n) time

slide-64
SLIDE 64

CHAMP is not as cool as working nanobots

slide-65
SLIDE 65
slide-66
SLIDE 66

CHAMP shows Hash Maps have plenty of room at the bottom compared to original ClojureScript HAMT implementation

  • 2x performance for iteration
  • 10 - 100x performance for equality checking
  • Lower memory overhead
slide-67
SLIDE 67

For Peter biggest win is making Hash Maps much easier to understand and implement

slide-68
SLIDE 68

Clojure Hash Maps is one of Clojure's best exports

  • Scala (base hash map)
  • Elixir (base hash map)
  • Haskell (unordered-containers)
  • Ruby (hamster)
  • JavaScript (immutable.js)
slide-69
SLIDE 69

Thanks

  • Bendyworks for supporting my work on this
  • Michael J. Steindorfer and Jurgen J. Vinju for the

CHAMP Paper

  • Zach Tellman for writing Collection Check
  • Martin Klepsch for porting Collection Check to

ClojureScript

  • Nicolás Berger for helping me setup test harness
  • David Nolen for performance and profiling

suggestions

slide-70
SLIDE 70

Fin

slide-71
SLIDE 71

Questions?