Transducing for fun and profit simon@metabase.com @sbelak Clojure - - PowerPoint PPT Presentation

transducing for fun and profit
SMART_READER_LITE
LIVE PREVIEW

Transducing for fun and profit simon@metabase.com @sbelak Clojure - - PowerPoint PPT Presentation

Transducing for fun and profit simon@metabase.com @sbelak Clojure at a glance (lisp (running-on :JVM)) Functional, dynamic, immutable Excellent concurrency and state-management primitives Unparalleled data manipulation


slide-1
SLIDE 1

Transducing for fun and profit

simon@metabase.com @sbelak

slide-2
SLIDE 2

Clojure at a glance

  • (lisp (running-on :JVM))
  • Functional, dynamic, immutable
  • Excellent concurrency and state-management primitives
  • Unparalleled data manipulation
slide-3
SLIDE 3

Anatomy of a transducer

  • Transducers decomplect recursion mechanism, transformation,

building the output, and access mechanism
 
 
 
 
 


  • 3 user-facing “protocols”: transducer, reducing fn, CollReduce
slide-4
SLIDE 4

transducer and reducing function

slide-5
SLIDE 5

Using a transducer to wrap/keep state

transducer and reducing function

slide-6
SLIDE 6

Wrap Java

slide-7
SLIDE 7

CollReduce protocol

  • Get the next element
  • Makes transducing data structure-agnostic allowing us to

(re)use transducers for things such as clojure.async channels

slide-8
SLIDE 8

Transducing an async channel

slide-9
SLIDE 9

Composing transducers

  • 1. comp transducers


  • 2. Reducing function and transducer
  • 3. github.com/henrygarner/redux


post-complete fuse
 
 
 
 
 


Data structure that can be manipulated like any other

slide-10
SLIDE 10

On-line/streaming analysis

slide-11
SLIDE 11

Metabase ❤


github.com/metabase/metabase

  • Open source analytics tool (runs on-premises)
  • Building a “data scientist in a box”
  • Hundreds to billions of rows
  • Some DBs optimised for analytics, some not
slide-12
SLIDE 12

Many batch algorithms can be turned into online ones

Parallelize independent computations Find a recursive relation

slide-13
SLIDE 13

github.com/MastodonC/kixi.stats

  • Count
  • (Arithmetic) mean
  • Geometric mean
  • Harmonic mean
  • Median
  • Variance
  • Interquartile range
  • Standard deviation
  • Standard error
  • Skewness
  • Kurtosis
  • Covariance
  • Covariance matrix
  • Correlation
  • Correlation matrix
  • Simple linear regression
  • Standard error of the mean
  • Standard error of the estimate
  • Standard error of the prediction
slide-14
SLIDE 14

Single-pass analysis

slide-15
SLIDE 15

Data = code

slide-16
SLIDE 16

Using transducers is worth it for the composition alone

slide-17
SLIDE 17

Annoyances

  • Can only transduce one coll at a time
  • Always have to pass in an xf (especially annoying when

using redux)

  • Having functions that return a transducer or not is error

prone

  • Inconsistent support for transducers in core library
slide-18
SLIDE 18

Questions

simon@metabase.com @sbelak