Pointless fusion for pointwise application
Malcolm Wallace
University of York (soon @ Standard-Chartered Bank)
1 Monday, 15 February
Pointless fusion for pointwise application Malcolm Wallace - - PowerPoint PPT Presentation
Pointless fusion for pointwise application Malcolm Wallace University of York (soon @ Standard-Chartered Bank) Monday, 15 February 1 1.000E+003 7.216E+001 7.599E-001 6.937E-005 2.400E-001 3.135E-054 4.936E-079 2.600E-012 1.455E-010 7.263E-015
Malcolm Wallace
University of York (soon @ Standard-Chartered Bank)
1 Monday, 15 February
1.000E+003 7.216E+001 7.599E-001 6.937E-005 2.400E-001 3.135E-054 4.936E-079 2.600E-012 1.455E-010 7.263E-015 1.000E+003 7.216E+001 7.599E-001 6.937E-005 2.400E-001 2.410E-054 4.936E-079 2.601E-012 1.456E-010 7.264E-015 1.000E+003 7.216E+001 7.599E-001 6.937E-005 2.400E-001 1.853E-054 4.936E-079 2.601E-012 1.457E-010 7.265E-015 1.000E+003 7.216E+001 7.599E-001 6.937E-005 2.400E-001 1.425E-054 4.936E-079 2.602E-012 1.458E-010 7.266E-015 1.000E+003 7.216E+001 7.599E-001 6.937E-005 2.400E-001 1.095E-054 4.936E-079 2.602E-012 1.459E-010 7.267E-015 1.000E+003 7.216E+001 7.599E-001 6.937E-005 2.400E-001 8.421E-055 4.936E-079 2.602E-012 1.460E-010 7.268E-015 1.000E+003 7.216E+001 7.599E-001 6.937E-005 2.400E-001 6.474E-055 4.936E-079 2.603E-012 1.461E-010 7.269E-015 1.000E+003 7.216E+001 7.599E-001 6.937E-005 2.400E-001 4.977E-055 4.936E-079 2.603E-012 1.462E-010 7.270E-015 1.000E+003 7.216E+001 7.599E-001 6.937E-005 2.400E-001 3.827E-055 4.936E-079 2.604E-012 1.463E-010 7.271E-015 1.000E+003 7.216E+001 7.599E-001 6.937E-005 2.400E-001 2.942E-055 4.936E-079 2.604E-012 1.464E-010 7.272E-015 8.563E+002 2.051E+004 4.180E-004 7.596E-001 5.260E-005 6.898E-002 1.710E-001 8.053E-011 2.686E-013 8.650E-012 8.571E+002 2.050E+004 4.205E-004 7.596E-001 5.546E-005 7.249E-002 1.675E-001 8.104E-011 2.729E-013 8.706E-012 8.581E+002 2.049E+004 4.231E-004 7.596E-001 5.855E-005 7.627E-002 1.637E-001 8.158E-011 2.775E-013 8.764E-012 8.592E+002 2.047E+004 4.258E-004 7.596E-001 6.190E-005 8.037E-002 1.596E-001 8.214E-011 2.823E-013 8.824E-012 8.604E+002 2.046E+004 4.286E-004 7.596E-001 6.555E-005 8.479E-002 1.551E-001 8.274E-011 2.874E-013 8.887E-012 8.619E+002 2.044E+004 4.315E-004 7.596E-001 6.952E-005 8.958E-002 1.504E-001 8.338E-011 2.929E-013 8.952E-012 8.635E+002 2.041E+004 4.346E-004 7.596E-001 7.383E-005 9.474E-002 1.452E-001 8.405E-011 2.988E-013 9.020E-012 8.653E+002 2.039E+004 4.379E-004 7.596E-001 7.851E-005 1.003E-001 1.396E-001 8.477E-011 3.052E-013 9.090E-012 8.674E+002 2.036E+004 4.413E-004 7.596E-001 8.358E-005 1.063E-001 1.336E-001 8.555E-011 3.120E-013 9.164E-012 8.697E+002 2.032E+004 4.450E-004 7.596E-001 8.905E-005 1.127E-001 1.272E-001 8.639E-011 3.194E-013 9.241E-012 8.723E+002 2.028E+004 4.489E-004 7.596E-001 9.495E-005 1.195E-001 1.204E-001 8.729E-011 3.275E-013 9.323E-012 8.752E+002 2.024E+004 4.531E-004 7.595E-001 1.013E-004 1.267E-001 1.132E-001 8.827E-011 3.364E-013 9.408E-012 8.784E+002 2.019E+004 4.576E-004 7.595E-001 1.080E-004 1.343E-001 1.056E-001 8.934E-011 3.461E-013 9.498E-012 8.819E+002 2.013E+004 4.624E-004 7.595E-001 1.151E-004 1.421E-001 9.775E-002 9.049E-011 3.567E-013 9.593E-012 8.857E+002 2.007E+004 4.675E-004 7.595E-001 1.225E-004 1.502E-001 8.966E-002 9.174E-011 3.684E-013 9.692E-012 8.899E+002 2.000E+004 4.730E-004 7.595E-001 1.303E-004 1.584E-001 8.145E-002 9.312E-011 3.813E-013 9.797E-012 8.945E+002 1.992E+004 4.791E-004 7.595E-001 1.382E-004 1.666E-001 7.324E-002 9.465E-011 3.959E-013 9.908E-012 8.994E+002 1.984E+004 4.856E-004 7.595E-001 1.462E-004 1.747E-001 6.518E-002 9.627E-011 4.116E-013 1.002E-011 9.043E+002 1.976E+004 4.923E-004 7.595E-001 1.542E-004 1.824E-001 5.744E-002 9.796E-011 4.283E-013 1.014E-011 9.092E+002 1.967E+004 4.992E-004 7.595E-001 1.619E-004 1.897E-001 5.012E-002 9.972E-011 4.459E-013 1.026E-011 9.140E+002 1.959E+004 5.063E-004 7.595E-001 1.693E-004 1.965E-001 4.335E-002 1.015E-010 4.644E-013 1.038E-011 9.191E+002 1.950E+004 5.138E-004 7.595E-001 1.765E-004 2.027E-001 3.716E-002 1.035E-010 4.846E-013 1.051E-011 9.240E+002 1.942E+004 5.215E-004 7.595E-001 1.831E-004 2.082E-001 3.164E-002 1.054E-010 5.054E-013 1.064E-011 9.286E+002 1.934E+004 5.290E-004 7.595E-001 1.893E-004 2.130E-001 2.678E-002 1.073E-010 5.264E-013 1.076E-011 9.329E+002 1.927E+004 5.365E-004 7.595E-001 1.948E-004 2.172E-001 2.257E-002 1.092E-010 5.477E-013 1.089E-011 9.368E+002 1.920E+004 5.439E-004 7.595E-001 1.998E-004 2.208E-001 1.895E-002 1.111E-010 5.691E-013 1.101E-011 9.406E+002 1.914E+004 5.514E-004 7.594E-001 2.044E-004 2.239E-001 1.588E-002 1.129E-010 5.911E-013 1.113E-011 9.442E+002 1.908E+004 5.590E-004 7.594E-001 2.086E-004 2.265E-001 1.328E-002 1.148E-010 6.137E-013 1.126E-011 9.475E+002 1.903E+004 5.665E-004 7.594E-001 2.124E-004 2.287E-001 1.110E-002 1.166E-010 6.363E-013 1.138E-011 9.506E+002 1.898E+004 5.738E-004 7.594E-001 2.158E-004 2.305E-001 9.281E-003 1.184E-010 6.590E-013 1.151E-011 9.534E+002 1.893E+004 5.811E-004 7.594E-001 2.189E-004 2.320E-001 7.767E-003 1.202E-010 6.818E-013 1.163E-011
2 Monday, 15 February
200 x 248 x 248 x 600 x 13
timesteps Z Y X fields fields: pressure temperature density gas motion (vector) H+ ion concentration H2 concentration ...
95,946,240,000 =
3 Monday, 15 February
Flat representation: “Virtual” dimensions: x=[0..27] y=[0..21] z=[0..4] i=[0..3079]
4 Monday, 15 February
5 Monday, 15 February
6 Monday, 15 February
7 Monday, 15 February
8 Monday, 15 February
9 Monday, 15 February
10 Monday, 15 February
11 Monday, 15 February
12 Monday, 15 February
13 Monday, 15 February
14 Monday, 15 February
200 x 248 x 248 x 600 x 13
timesteps Z Y X fields
95,946,240,000 =
15 Monday, 15 February
Array operations fall into three categories:
16 Monday, 15 February
17 Monday, 15 February
picture th dataset = isosurface th (mkCell8 (size dataset) (values dataset)) (mkCell8 (size dataset) (locations dataset)) isosurface th samples geom = zipWith (surf_cell th) (stream samples) (stream geom) surf_cell th sample geom = map (surf_geom th sample geom) $ mc_case $ fmap (>th) sample surf_geom th sample geom (v0,v1) = interp (inv_interp th samp_0 samp_1) geom_0 geom_1 where samp_0 = select v0 sample samp_1 = select v1 sample geom_0 = select v0 geom geom_1 = select v1 geom mkCell8 sz v = zipWith8 Cell8 v (drop 1 v) (drop line v) (drop (1+line) v) (drop plane v) (drop (1+plane) v) (drop (plane+line) v) (drop (1+plane+line) v) where line = thd3 size plane = snd3 size
18 Monday, 15 February
picture th dataset = isosurface th (mkCell8 (size dataset) (values dataset)) (mkCell8 (size dataset) (locations dataset)) isosurface th samples geom = zipWith (surf_cell th) (stream samples) (stream geom) surf_cell th sample geom = map (surf_geom th sample geom) $ mc_case $ fmap (>th) sample surf_geom th sample geom (v0,v1) = interp (inv_interp th samp_0 samp_1) geom_0 geom_1 where samp_0 = select v0 sample samp_1 = select v1 sample geom_0 = select v0 geom geom_1 = select v1 geom mkCell8 sz v = zipWith8 Cell8 v (drop 1 v) (drop line v) (drop (1+line) v) (drop plane v) (drop (1+plane) v) (drop (plane+line) v) (drop (1+plane+line) v) where line = thd3 size plane = snd3 size
19 Monday, 15 February
200 x 248 x 248 x 600 x 13
timesteps Z Y X fields [160,162..180] [124] [0,4..247] [0,4..599] [2,7] slice sub-range downsample slices
20 Monday, 15 February
slice i (transpose (subrange j k arr))
intermediate arrays result array
21 Monday, 15 February
200 x 248 x 248 x 600 x 13
timesteps Z Y X fields
for (t=160; t<=180; t+=2) { z = 124; for (y=0; y<=247; y+=4) { for (x=0; x<=599; x+=4) { for (field `elem` [2,7]) { arr [ t*248*248*600*13 + z*248*600*13 + y*600*13 + x*13 + field ]; } } } }
[160,162..180] [124] [0,4..247] [0,4..599] [2,7] slice sub-range downsample slices varies faster
22 Monday, 15 February
slice i (transpose (subrange j k arr)) 200 x 248 x [0,4..248] x [0,4..600] x [2,7]
timesteps Z Y X fields
for (t=0; t<=199; t+=1) { for (z=0; z<=247; z+=1) { for (y=0; y<=247; y+=4) { for (x=0; x<=599; x+=4) { for (field `elem` [2,7]) { arr [ t*248*248*600*13 + z*248*600*13 + y*600*13 + x*13 + field ]; } } } } }
23 Monday, 15 February
for (t=j; t<=k; t+=1) { for (z=0; z<=247; z+=1) { for (y=0; y<=247; y+=4) { for (x=0; x<=599; x+=4) { for (field `elem` [2,7]) { arr [ t*248*248*600*13 + z*248*600*13 + y*600*13 + x*13 + field ]; } } } } }
slice i (transpose (subrange j k arr)) [j..k] x 248 x [0,4..248] x [0,4..600] x [2,7]
timesteps Z Y X fields
24 Monday, 15 February
for (z=0; z<=247; z+=1) { for (t=j; t<=k; t+=1) { for (y=0; y<=247; y+=4) { for (x=0; x<=599; x+=4) { for (field `elem` [2,7]) { arr [ t*248*248*600*13 + z*248*600*13 + y*600*13 + x*13 + field ]; } } } } }
slice i (transpose (subrange j k arr)) 248 x [j..k] x [0,4..248] x [0,4..600] x [2,7]
Z timesteps Y X fields
25 Monday, 15 February
{ z=i; for (t=j; t<=k; t+=1) { for (y=0; y<=247; y+=4) { for (x=0; x<=599; x+=4) { for (field `elem` [2,7]) { arr [ t*248*248*600*13 + z*248*600*13 + y*600*13 + x*13 + field ]; } } } } }
slice i (transpose (subrange j k arr)) [i] x [j..k] x [0,4..248] x [0,4..600] x [2,7]
Z timesteps Y X fields
26 Monday, 15 February
type Iterator = [ForLoop] data ForLoop = FL { fl_jump :: !Int , fl_count :: Maybe Int } data Block a = Block { block_data :: !(UArr a) , block_start :: !Int , block_steps :: Iterator }
27 Monday, 15 February
Array operations fall into three categories:
28 Monday, 15 February
29 Monday, 15 February
linear :: Double -> Double -> Double
linear thresh val0 val1 pos0 pos1 = let interpolant = thresh - val0 / val1 - val0 in pos0 + interpolant * (pos1-pos0)
position: 3.1 5.2 value: 10.5 -4.7 threshold: 6.0 interpolated position?
30 Monday, 15 February
linear :: Double -> Double -> Double
linear thresh val0 val1 pos0 pos1 = let interpolant = thresh - val0 / val1 - val0 in pos0 + interpolant * (pos1-pos0)
position: (3.1,0.2) (5.2,2.8) value: 10.5 -4.7 threshold: 6.0 interpolated position?
31 Monday, 15 February
linear :: Double -> Array Double -> Array Double
linear thresh val0 val1 pos0 pos1 = let interpolant = pure thresh - val0 / val1 - val0 in pos0 + interpolant * (pos1-pos0)
position: (3.1,0.2) (5.2,2.8) (7.0,2.5) value: 10.5 -4.7 7.2 interpolated positions
32 Monday, 15 February
(+) :: (Num a) => a -> a -> a (+) :: (Num a, Applicative f) => f a -> f a -> f a (+) :: (Num a) => Array a -> Array a -> Array a
33 Monday, 15 February
class Applicative f where pure :: a -> f a (<*>) :: f (a->b) -> f a -> f b (<$>) :: (a->b) -> f a -> f b instance Applicative Array where pure x = newArray x f <$> a = arrayMap f a
The functor f can be: a container, sequence, stream, a parser, pretty-printer ...
34 Monday, 15 February
instance (Num a, Applicative f) => Num (f a) where a + b = pure (+) <*> a <*> b a - b = pure (-) <*> a <*> b a * b = pure (*) <*> a <*> b instance (Fractional a, Applicative f) => Fractional (f a) where a / b = pure (/) <*> a <*> b
35 Monday, 15 February
instance (Num a, Applicative f) => Num (f a) where a + b = (+) <$> a <*> b a - b = (-) <$> a <*> b a * b = (*) <$> a <*> b instance (Fractional a, Applicative f) => Fractional (f a) where a / b = (/) <$> a <*> b
36 Monday, 15 February
linear thresh val0 val1 pos0 pos1 = let interpolant = pure thresh - val0 / val1 - val0 in pos0 + interpolant * (pos1-pos0)
thr v0 Ap Ap
(/) Ap Ap (-) v1 v0 Ap Ap (-) Intermediate arrays everywhere!
37 Monday, 15 February
linear thresh val0 val1 pos0 pos1 = let interpolant = pure thresh - val0 / val1 - val0 in pos0 + interpolant * (pos1-pos0)
\thr v0 v1 -> thr-v0/v1-v0 thr v0 v1 Ap Ap Ap We can calculate this. :: Double -> Double -> Double -> Double
38 Monday, 15 February
data Expr c a = Val (c a) | Fun a | forall b . Ap (Expr c (b->a)) (Expr c b) instance (Applicative c) => Applicative (Expr c) where pure a = Val (pure a) f <*> a = Ap f a f <$> a = Ap (Fun f) a
Val Fun Ap
39 Monday, 15 February
Ap Ap
f g a
Ap Ap
f g a
Ap (.) f (g a) => (.) f g a Ap f (Ap g a) => Ap (Ap (Ap (Fun (.)) f) g) a Rule 1: flatten the tree to a single spine
40 Monday, 15 February
Ap
f a
Ap Ap
f a
Ap flip f a g => flip f g a Ap (Ap f a) (Fun g) => Ap (Ap (Ap (Fun flip) f) (Fun g)) a Rule 2: move all functions towards the left Ap g g Warning: not possible for all applicative functors
41 Monday, 15 February
Ap Ap (Fun f) (Fun g) => Fun (f g) Rule 3: reduce applications
f g f g
42 Monday, 15 February
a g f
thr v0 Ap Ap
(/) Ap Ap (-) v1 v0 Ap Ap (-)
43 Monday, 15 February
a g f
thr v0 Ap Ap
(/) Ap (-) v1 v0 Ap (-) Ap (.) Ap Ap
44 Monday, 15 February
a g f
thr v0 Ap Ap
(/) Ap (-) v1 v0 Ap (-) Ap (.) Ap Ap
45 Monday, 15 February
f a g
thr v0 Ap Ap
(/) Ap (-) v1 v0 (-) Ap (.) Ap Ap Ap Ap (.)
46 Monday, 15 February
f a g
thr v0 Ap Ap
(/) Ap (-) v1 v0 (-) Ap (.) Ap Ap Ap Ap (.)
47 Monday, 15 February
Several steps omitted....
48 Monday, 15 February
thr v0
(/) (-) v1 v0 (-) (.) Ap Ap Ap (.) Ap Ap Ap Ap Ap Ap Ap Ap Ap (.) (.) (.) (.)
49 Monday, 15 February
thr v0
v1 v0 (-) Ap Ap Ap Ap Ap (.) ((.) ((.) ((.) (.) (.)) (/))) (-)
50 Monday, 15 February
f a g
thr v0
(/) (-) v1 v0 (-) (.) Ap Ap Ap (.) Ap Ap Ap Ap Ap Ap Ap Ap Ap (.) (.) (.) (.)
51 Monday, 15 February
f a g
thr v0
(/) (-) v1 v0 (-) (.) Ap (.) Ap Ap Ap Ap Ap Ap Ap Ap (.) (.) (.) (.) Ap flip Ap Ap Ap
52 Monday, 15 February
f a g
thr v0
(/) (-) v1 v0 (-) (.) Ap (.) Ap Ap Ap Ap Ap Ap Ap Ap (.) (.) (.) (.) Ap flip Ap Ap Ap
53 Monday, 15 February
Several steps omitted....
54 Monday, 15 February
thr v0
v1 v0 Ap Ap Ap Ap ((((((flip ((.) (flip) ((.) ((.) (.)) ((.) ((.) (.)) ((.) ((.) (/)) (-)))))) (-))) \thr v0 v1v0 -> thr - v0 / v1 - v0
55 Monday, 15 February
Rule 1: flatten the tree to a single spine Rule 2: move all functions towards the left Rule 3: reduce applications
compose flip apply
56 Monday, 15 February
normalise :: Expr c a -> Expr c a normalise (Val x s) = Val x s normalise (Fun f s) = Fun f s normalise t | allFunctional t = uncurry Fun (reduce t) normalise (Ap f a) = case normalise a of Val b s -> Ap (normalise f) (Val b s) Ap g b -> normalise $ Ap (Ap (Ap (Fun (.) "(.)") f) g) b Fun g s -> case normalise f of Ap f' v -> normalise (Ap (Ap (Ap (Fun flip "flip") f') (Fun g s)) v) f' -> error "normalised function not an Ap"
reduce :: Expr c a -> (a,String) reduce (Val _ _) = (error "reduce of ground value","bad val") reduce (Fun f s) = (f,s) reduce (Ap f a) = let (f',s) = reduce f (a',t) = reduce a in (f' a', "("++s++" "++t++")")
allFunctional :: Expr c a -> Bool allFunctional (Val _ _) = False allFunctional (Fun _ _) = True allFunctional (Ap f a) = allFunctional f && allFunctional a
57 Monday, 15 February
Array operations fall into three categories:
58 Monday, 15 February
function.
“kernel” one can write (by hand) in CUDA for a GPGPU.
59 Monday, 15 February