Causal Commutative Arrows Revisited Jeremy Yallop Hai (Paul) Liu - - PowerPoint PPT Presentation

causal commutative arrows revisited
SMART_READER_LITE
LIVE PREVIEW

Causal Commutative Arrows Revisited Jeremy Yallop Hai (Paul) Liu - - PowerPoint PPT Presentation

Causal Commutative Arrows Revisited Jeremy Yallop Hai (Paul) Liu University of Cambridge Intel Labs September 21, 2016 1/22 Normalization as an optimization technique? 2/22 Normalization as an optimization technique? Plausible, because


slide-1
SLIDE 1

1/22

Causal Commutative Arrows Revisited

Jeremy Yallop Hai (Paul) Liu

University of Cambridge Intel Labs

September 21, 2016

slide-2
SLIDE 2

2/22

Normalization as an optimization technique?

slide-3
SLIDE 3

2/22

Normalization as an optimization technique?

◮ Plausible, because it preserves semantics.

slide-4
SLIDE 4

2/22

Normalization as an optimization technique?

◮ Plausible, because it preserves semantics. ◮ Effective, when conditions are met:

slide-5
SLIDE 5

2/22

Normalization as an optimization technique?

◮ Plausible, because it preserves semantics. ◮ Effective, when conditions are met:

◮ It has to terminate; ◮ It gives simpler program as a result; ◮ It enables other optimizations.

slide-6
SLIDE 6

2/22

Normalization as an optimization technique?

◮ Plausible, because it preserves semantics. ◮ Effective, when conditions are met:

◮ It has to terminate; ◮ It gives simpler program as a result; ◮ It enables other optimizations.

◮ with a few catches:

◮ Strongly normalizing can be too restrictive; ◮ Sharing is hard to preserve; ◮ Static or dynamic implementation?

slide-7
SLIDE 7

3/22

Arrows

Arrows are a generalization of monads (Hughes 2000). class Arrow (arr :: ∗ → ∗ → ∗) where arr :: (a → b) → arr a b (≫) :: arr a b → arr b c → arr a c first :: arr a b → arr (a, c) (b, c) class Arrow arr ⇒ ArrowLoop arr where loop :: arr (a, c) (b, c) → arr a b

(a) arr f (b) f ≫ g (c) first f (d) second f (e) f ⋆⋆⋆ g (f) loop f

slide-8
SLIDE 8

4/22

Arrow and ArrowLoop laws

arr id ≫ f ≡ f f ≫ arr id ≡ f (f ≫ g) ≫ h ≡ f ≫ (g ≫ h) arr (g . f ) ≡ arr f ≫ arr g first (arr f ) ≡ arr (f ×id) first (f ≫ g) ≡ first f ≫ first g first f ≫ arr (id ×g) ≡ arr (id ×g) ≫ first f first f ≫ arr fst ≡ arr fst ≫ f first (first f ) ≫ arr assoc ≡ arr assoc ≫ first f loop (first h ≫ f ) ≡ h ≫ loop f loop (f ≫ first h) ≡ loop f ≫ h loop (f ≫ arr (id ×k)) ≡ loop (arr (id ×k) ≫ f ) loop (loop f ) ≡ loop (arr assoc-1 . f . arr assoc) second (loop f ) ≡ loop (arr assoc . second f . arr assoc-1) loop (arr f ) ≡ arr (trace f )

slide-9
SLIDE 9

5/22

Normalizing arrows (a dataflow example)

(a) original

slide-10
SLIDE 10

5/22

Normalizing arrows (a dataflow example)

(a) original (b) normalized

slide-11
SLIDE 11

6/22

Causal Commutative Arrows

CCA is a more restricted arrow with an additional init combinator: class ArrowLoop arr ⇒ ArrowInit arr where init :: a → arr a a and two additional arrow laws: first f ≫ second g ≡ second g ≫ first f init i ⋆⋆⋆ init j ≡ init (i, j)

slide-12
SLIDE 12

6/22

Causal Commutative Arrows

CCA is a more restricted arrow with an additional init combinator: class ArrowLoop arr ⇒ ArrowInit arr where init :: a → arr a a and two additional arrow laws: first f ≫ second g ≡ second g ≫ first f init i ⋆⋆⋆ init j ≡ init (i, j) Causal Commutative Normal Form (CCNF) is either a pure arrow,

  • r a single loop containing a pure arrow and an initial state:

loopD :: ArrowInit arr ⇒ c → ((a, c) → (b, c)) → arr a b loopD i f = loop (arr f ≫ second (init i)) Proved by algebraic arrow laws. (Liu et al. ICFP2009, JFP2010)

slide-13
SLIDE 13

7/22

Application: stream transformers as arrows

newtype SF a b = SF {unSF :: a → (b, SF a b)} instance Arrow SF where arr f = g where g = SF (λx → (f x, g)) f ≫ g = ... first f = ... instance ArrowLoop SF where ... instance ArrowInit SF where ...

slide-14
SLIDE 14

7/22

Application: stream transformers as arrows

newtype SF a b = SF {unSF :: a → (b, SF a b)} instance Arrow SF where arr f = g where g = SF (λx → (f x, g)) f ≫ g = ... first f = ... instance ArrowLoop SF where ... instance ArrowInit SF where ... We can run a stream transformer over an input stream: runSF :: SF a b → [a] → [b] runSF (SF f ) (x : xs) = let (y, f ′) = f x in y : runSF f ′ xs nthSF :: Int → SF () a → a nthSF n sf = runSF sf (repeat ()) !! n

slide-15
SLIDE 15

8/22

Performance Comparison

Orders of magnitude speedup (JFP2010):

Name SF CCNF sf CCNF tuple exp 1.0 30.84 672.79 sine 1.0 18.89 442.48

  • scSine

1.0 14.28 29.53 50’s sci-fi 1.0 18.72 21.37 robotSim 1.0 24.67 34.93

Table : Performance Ratio (greater is better)

Normalization of CCA programs seems very effective!

slide-16
SLIDE 16

8/22

Performance Comparison

Orders of magnitude speedup (JFP2010):

Name SF CCNF sf CCNF tuple exp 1.0 30.84 672.79 sine 1.0 18.89 442.48

  • scSine

1.0 14.28 29.53 50’s sci-fi 1.0 18.72 21.37 robotSim 1.0 24.67 34.93

Table : Performance Ratio (greater is better)

Normalization of CCA programs seems very effective! But why is everyone not using it??

slide-17
SLIDE 17

8/22

Performance Comparison

Orders of magnitude speedup (JFP2010):

Name SF CCNF sf CCNF tuple exp 1.0 30.84 672.79 sine 1.0 18.89 442.48

  • scSine

1.0 14.28 29.53 50’s sci-fi 1.0 18.72 21.37 robotSim 1.0 24.67 34.93

Table : Performance Ratio (greater is better)

Normalization of CCA programs seems very effective! But why is everyone not using it?? Not even used by Euterpea, the music and sound synthesis framework from the same research group!

slide-18
SLIDE 18

9/22

Pitfalls of the CCA implementation

The initial CCA library was implemented using Template Haskell, because:

◮ Normalization is a syntactic transformation; ◮ Meta-level implementation guarantees normal form at compile time; ◮ TH is less work than a full-blown pre-processor.

slide-19
SLIDE 19

9/22

Pitfalls of the CCA implementation

The initial CCA library was implemented using Template Haskell, because:

◮ Normalization is a syntactic transformation; ◮ Meta-level implementation guarantees normal form at compile time; ◮ TH is less work than a full-blown pre-processor.

However, TH based static normalization is:

◮ restricted to first-order, no reactivity, etc.

slide-20
SLIDE 20

9/22

Pitfalls of the CCA implementation

The initial CCA library was implemented using Template Haskell, because:

◮ Normalization is a syntactic transformation; ◮ Meta-level implementation guarantees normal form at compile time; ◮ TH is less work than a full-blown pre-processor.

However, TH based static normalization is:

◮ restricted to first-order, no reactivity, etc. ◮ hard to program with:

f x = ...[| ...x... |] ... ... $(norm g) ...

slide-21
SLIDE 21

9/22

Pitfalls of the CCA implementation

The initial CCA library was implemented using Template Haskell, because:

◮ Normalization is a syntactic transformation; ◮ Meta-level implementation guarantees normal form at compile time; ◮ TH is less work than a full-blown pre-processor.

However, TH based static normalization is:

◮ restricted to first-order, no reactivity, etc. ◮ hard to program with:

f x = ...[| ...x... |] ... ... $(norm g) ...

◮ perhaps not as effective as we had thought for “real” applications?

slide-22
SLIDE 22

10/22

How about run-time normalization?

slide-23
SLIDE 23

10/22

How about run-time normalization?

from: Paul Liu to: Jeremy Yallop cc: Paul Hudak, Eric Cheng date: 18 June 2009 I wonder if there is any way to optimize GHC’s

  • utput based on your code since the CCNF is

actually running slower

slide-24
SLIDE 24

10/22

How about run-time normalization?

from: Paul Liu to: Jeremy Yallop cc: Paul Hudak, Eric Cheng date: 18 June 2009 I wonder if there is any way to optimize GHC’s

  • utput based on your code since the CCNF is

actually running slower “. . . that the actual construction of CCNF is now at run-time rather than compile-time. Therefore, we cannot rely on GHC to take the pure function and state captured in a CCNF and produce optimized code. . . ” (Liu 2011)

slide-25
SLIDE 25

11/22

Normalization by construction

  • 1. Define normal form as a data type:

data CCNF a b where Arr :: (a → b) → CCNF a b LoopD :: c → ((a, c) → (b, c)) → CCNF a b

slide-26
SLIDE 26

11/22

Normalization by construction

  • 1. Define normal form as a data type:

data CCNF a b where Arr :: (a → b) → CCNF a b LoopD :: c → ((a, c) → (b, c)) → CCNF a b

  • 2. Observation function:
  • bserve :: ArrowInit arr ⇒ CCNF a b → arr a b
  • bserve (Arr f ) = arr f
  • bserve (LoopD i f ) = loop (arr f ≫ second (init i))
slide-27
SLIDE 27

11/22

Normalization by construction

  • 1. Define normal form as a data type:

data CCNF a b where Arr :: (a → b) → CCNF a b LoopD :: c → ((a, c) → (b, c)) → CCNF a b

  • 2. Observation function:
  • bserve :: ArrowInit arr ⇒ CCNF a b → arr a b
  • bserve (Arr f ) = arr f
  • bserve (LoopD i f ) = loop (arr f ≫ second (init i))
  • 3. Instances for the data type:

instance Arrow CCNF where ... instance ArrowLoop CCNF where ... instance ArrowInit CCNF where ...

slide-28
SLIDE 28

12/22

Optimize the observe function

  • 1. Specialize observe to a concrete instance.
  • bserve

:: ArrowInit arr ⇒ CCNF a b → arr a b

  • bserveSF :: CCNF a b → SF a b
  • bserveSF (Arr f )

= arrSF f

  • bserveSF (LoopD i f ) = loopSF (arrSF f ≫SF secondSF (initSF i))
slide-29
SLIDE 29

12/22

Optimize the observe function

  • 1. Specialize observe to a concrete instance.
  • bserve

:: ArrowInit arr ⇒ CCNF a b → arr a b

  • bserveSF :: CCNF a b → SF a b
  • bserveSF (Arr f )

= arrSF f

  • bserveSF (LoopD i f ) = loopSF (arrSF f ≫SF secondSF (initSF i))
  • 2. Derive an optimized definition.
  • bserveSF (LoopD i f ) = loopD i f

where loopD :: c → ((a, c) → (b, c)) → SF a b loopD i f = SF (λx → let (y, i′) = f (x, i) in (y, loopD i′ f ))

slide-30
SLIDE 30

12/22

Optimize the observe function

  • 1. Specialize observe to a concrete instance.
  • bserve

:: ArrowInit arr ⇒ CCNF a b → arr a b

  • bserveSF :: CCNF a b → SF a b
  • bserveSF (Arr f )

= arrSF f

  • bserveSF (LoopD i f ) = loopSF (arrSF f ≫SF secondSF (initSF i))
  • 2. Derive an optimized definition.
  • bserveSF (LoopD i f ) = loopD i f

where loopD :: c → ((a, c) → (b, c)) → SF a b loopD i f = SF (λx → let (y, i′) = f (x, i) in (y, loopD i′ f ))

  • 3. Fuse observe with the context in which it is used.

nthCCNF :: Int → CCNF () a → a nthCCNF n = nthSF n . observeSF = ...

slide-31
SLIDE 31

13/22

Performance comparison

◮ Compute 44100 × 5 = 2, 205, 000 samples (≈ 5 seconds of audio) ◮ GHC 7.10.3 using the flags -O2 -funfolding-use-limit=512 ◮ 64-bit Linux, Intel Xeon CPU E5-2680 2.70GHz

Benchmark Unnormalized Normalized Name States Loops SF CCNF TH fib 2 1 1.0 2.29 2.30 exp 1 2 1.0 242 242 sine 2 1 1.0 124 146

  • scSine

1 1 1.0 60.6 60.6 sci-fi 3 3 1.0 27.7 27.4 robot 5 4 1.0 104 96.7 flute 16 7 1.0 5.10 16.2 shepard 80 30 1.0 7.47 12.9 baseline speedup ratio

slide-32
SLIDE 32

14/22

Why it works

◮ Unlike SF, CCNF is not recursively defined.

data SF a b where SF :: a → (b, SF a b) → SF a b data CCNF a b where Arr :: (a → b) → CCNF a b LoopD :: c → ((a, c) → (b, c)) → CCNF a b

slide-33
SLIDE 33

14/22

Why it works

◮ Unlike SF, CCNF is not recursively defined.

data SF a b where SF :: a → (b, SF a b) → SF a b data CCNF a b where Arr :: (a → b) → CCNF a b LoopD :: c → ((a, c) → (b, c)) → CCNF a b

◮ Hand optimized observe function is the key to get performance.

nthCCNF :: Int → CCNF () a → a nthCCNF n = nthSF n . observeSF = ...

slide-34
SLIDE 34

14/22

Why it works

◮ Unlike SF, CCNF is not recursively defined.

data SF a b where SF :: a → (b, SF a b) → SF a b data CCNF a b where Arr :: (a → b) → CCNF a b LoopD :: c → ((a, c) → (b, c)) → CCNF a b

◮ Hand optimized observe function is the key to get performance.

nthCCNF :: Int → CCNF () a → a nthCCNF n = nthSF n . observeSF = ...

◮ GHC has improved! GHC 6.10 fails to optimize our program.

slide-35
SLIDE 35

14/22

Why it works

◮ Unlike SF, CCNF is not recursively defined.

data SF a b where SF :: a → (b, SF a b) → SF a b data CCNF a b where Arr :: (a → b) → CCNF a b LoopD :: c → ((a, c) → (b, c)) → CCNF a b

◮ Hand optimized observe function is the key to get performance.

nthCCNF :: Int → CCNF () a → a nthCCNF n = nthSF n . observeSF = ...

◮ GHC has improved! GHC 6.10 fails to optimize our program.

Compilers help those who help compilers!

slide-36
SLIDE 36

15/22

Levels of abstraction

Axiomatic . . . Type class (Arrow laws)

slide-37
SLIDE 37

15/22

Levels of abstraction

Axiomatic . . . Type class (Arrow laws) ↓ ↓ Denotational . . . Data type (Interpretation)

slide-38
SLIDE 38

15/22

Levels of abstraction

Axiomatic . . . Type class (Arrow laws) ↓ ↓ Denotational . . . Data type (Interpretation) ↓ ↓ Operational . . . Mealy machine (state and transition)

slide-39
SLIDE 39

15/22

Levels of abstraction

Axiomatic . . . Type class (Arrow laws) ↓ ↓ Denotational . . . Data type (Interpretation) ↓ ↓ Operational . . . Mealy machine (state and transition) nthCCNF n (LoopD i f ) = next n i where next n i = if n ≡ 0 then x else next (n − 1) i′ where (x, i′) = f ((), i)

slide-40
SLIDE 40

16/22

That is not all (performance we could have)

◮ CCA normalization clusters all states as one nested tuple.

LoopD ((0, ((0, 0), 0)), (((((buf100), 0), 0), ((0), (((buf50), 0), 0))), (((0, i), (0, ((0, 0), 0))), ((0, ((0, 0), 0)), (0, ((0, 0), 0)))))) (λ(((((a, f ), e), d), c), ...) → ...)

slide-41
SLIDE 41

16/22

That is not all (performance we could have)

◮ CCA normalization clusters all states as one nested tuple.

LoopD ((0, ((0, 0), 0)), (((((buf100), 0), 0), ((0), (((buf50), 0), 0))), (((0, i), (0, ((0, 0), 0))), ((0, ((0, 0), 0)), (0, ((0, 0), 0)))))) (λ(((((a, f ), e), d), c), ...) → ...)

◮ Transition function destructs/constructs tuples at every iteration!

next n i = if n ≡ 0 then x else next (n − 1) i′ where (x, i′) = f ((), i)

slide-42
SLIDE 42

16/22

That is not all (performance we could have)

◮ CCA normalization clusters all states as one nested tuple.

LoopD ((0, ((0, 0), 0)), (((((buf100), 0), 0), ((0), (((buf50), 0), 0))), (((0, i), (0, ((0, 0), 0))), ((0, ((0, 0), 0)), (0, ((0, 0), 0)))))) (λ(((((a, f ), e), d), c), ...) → ...)

◮ Transition function destructs/constructs tuples at every iteration!

next n i = if n ≡ 0 then x else next (n − 1) i′ where (x, i′) = f ((), i)

◮ GHC can only help us so far.

slide-43
SLIDE 43

16/22

That is not all (performance we could have)

◮ CCA normalization clusters all states as one nested tuple.

LoopD ((0, ((0, 0), 0)), (((((buf100), 0), 0), ((0), (((buf50), 0), 0))), (((0, i), (0, ((0, 0), 0))), ((0, ((0, 0), 0)), (0, ((0, 0), 0)))))) (λ(((((a, f ), e), d), c), ...) → ...)

◮ Transition function destructs/constructs tuples at every iteration!

next n i = if n ≡ 0 then x else next (n − 1) i′ where (x, i′) = f ((), i)

◮ GHC can only help us so far. ◮ Real applications demand mutable states (for arrays and so on).

slide-44
SLIDE 44

17/22

Local mutable state via ST Monad

ST Monad in Haskell: data ST s a = ... runST :: (forall s . ST s a) → a fixST :: (a → ST s a) → a

slide-45
SLIDE 45

17/22

Local mutable state via ST Monad

ST Monad in Haskell: data ST s a = ... runST :: (forall s . ST s a) → a fixST :: (a → ST s a) → a Use ST type as our state: data CCNF ST s a b where ArrST :: (a → b) → CCNF ST s a b LoopDST :: ST s c → (c → a → ST s b) → CCNF ST s a b

slide-46
SLIDE 46

17/22

Local mutable state via ST Monad

ST Monad in Haskell: data ST s a = ... runST :: (forall s . ST s a) → a fixST :: (a → ST s a) → a Use ST type as our state: data CCNF ST s a b where ArrST :: (a → b) → CCNF ST s a b LoopDST :: ST s c → (c → a → ST s b) → CCNF ST s a b The fused observe function: nth′ST :: Int → CCNF ST s () a → ST s a nth′ST n (LoopDST i f ) = do g ← fmap f i let next n = do x ← g () if n 0 then return x else next (n − 1) next n

slide-47
SLIDE 47

18/22

A (recursively defined) sound synthesis example

shepard :: BufferedCircuit a ⇒ Time → a () Double shepard seconds = if seconds 0.0 then arr (const 0.0) else proc → do f ← envLineSeg [800, 100, 100] [4.0, seconds ]− ≺ () e ← envLineSeg [0, 1, 0, 0] [2.0, 2.0, seconds ]− ≺ () s ← osc sineTable 0− ≺ f r ← delayLine 0.5 ≪ shepard (seconds − 0.5)− ≺ () returnA− ≺ (e ∗ s ∗ 0.1) + r Challenges of optimizing a recursively defined arrow:

◮ Static normalization blows up code size. ◮ Nested states builds up quickly and deeply.

slide-48
SLIDE 48

19/22

Shepard performance (higher is better)

0K 20K 40K 60K 80K 100K 120K 140K 160K 180K 5 7.5 10 12.5 15 Output Rate (Samples/second) Input Size CCNF CCNF Template Haskell

ST

slide-49
SLIDE 49

20/22

That is still not all (performance we would like to have)

◮ The definition of loop requires recursive monad:

instance ArrowLoop (CCNF ST s) where loop (LoopDST i f ) = LoopDST i h where h i x = do rec (y, j) ← f i (x, j) return y

◮ Although in the end all loops are de-coupled, the overhead of ST

type remains in compiled code. fixST :: (a → ST s a) → ST s a fixST k = ST $ λs → let ans = liftST (k r) s STret r = ans in case ans of STret s′ x → (# s′, x #)

slide-50
SLIDE 50

21/22

Related work

◮ Representing arrow computation as data (Hughes 2005,

Nilsson 2005, Yallop 2010)

◮ Generalized arrows (Joseph 2014) ◮ Deriving implementation by equational reasoning (Birds

1988, Hinze 2000)

◮ Free representation used in optimization (Voigtländer 2008,

Kiselyov and Ishii 2015)

slide-51
SLIDE 51

22/22

More in the paper

◮ Normalization by construction in steps. ◮ Equational derivation of observe function. ◮ Embedding mutable states with ST monad. ◮ Proving CCNF ST is an instance of CCA. ◮ Detailed performance analysis.

slide-52
SLIDE 52

22/22

More in the paper

◮ Normalization by construction in steps. ◮ Equational derivation of observe function. ◮ Embedding mutable states with ST monad. ◮ Proving CCNF ST is an instance of CCA. ◮ Detailed performance analysis.

https://github.com/yallop/causal-commutative-arrows-revisited

Thank you!