Haskell in the datacentre!
Simon Marlow
Facebook (Copenhagen, April 2019)
Haskell in the datacentre! Simon Marlow Facebook (Copenhagen, - - PowerPoint PPT Presentation
Haskell in the datacentre! Simon Marlow Facebook (Copenhagen, April 2019) Haskell powers Sigma A platform for detection Clients Used by many different teams Mainly for anti-abuse e.g. spam, malicious URLs Machine
Simon Marlow
Facebook (Copenhagen, April 2019)
Clients Other Services
maintaining high performance.
Sigma Engine Haxl User code Data Sources Haskell C++ / Haskell
expression of functionality, abstracts away from performance (more later)
completely...
Photo: Scott Schiller, CC by 2.0
Photo: Greg Lobinski, CC BY 2.0
numCommonFriends a b = do af <- friendsOf a aff <- mapM friendsOf af return (count (b `elem`) aff) numCommonFriends a b = do af <- friendsOf a bf <- friendsOf b return (length (intersect af bf))
regress performance
bottleneck
Time Latency Oops 2pm yesterday
Photo:usehung, CC BY 2.0
benchmarks against production data, show the differences
code review tool
production
performance for the whole system)
server healthy…
better profiling tools
dependencies
explicitly ask for concurrency.
details
likely to get the concurrency wrong
as far as data dependencies allow.
system and module system.
numCommonFriends a b = do fa <- friendsOf a fb <- friendsOf b return (length (intersect fa fb))
friendsOf a friendsOf b length (intersect ...)
has the same value for a given x.
friendsOf :: Id -> Haxl [Id]
Why friendsOf :: Id -> Haxl [Id] ?
(>>=) :: Monad m => m a → (a → m b) → m b
dependency independent
(<*>) :: Applicative f => f (a → b) → f a → f b
both arguments to be performed concurrently
concurrent, e.g. mapM:
friendsOfFriends :: Id -> Haxl [Id] friendsOfFriends x = concat <$> mapM friendsOf x
numCommonFriends a b = do fa <- friendsOf a fb <- friendsOf b return (length (intersect fa fb)) numCommonFriends a b = (length . intersect) <$> friendsOf a <*> friendsOf b
hard to do in more complex cases
automatically from data dependencies
do x1 ← a x2 ← b x1 x3 ← c x4 ← d x3 x5 ← e x1 x4 return (x2,x4,x5) do ((x1,x2),x4) <- (,) <$> (do x1 <- a x2 <- b x1 return (x1,x2)) <*> (do x3 <- c; d x3) x5 <- e x1 x4 return (x2,x4,x5)
dependencies allow it
numCommonFriends a b = do fa <- friendsOf a fb <- friendsOf b return (length (intersect fa fb)) numCommonFriends a b = (length . intersect) <$> friendsOf a <*> friendsOf b
How should we translate this?
do x1 <- a x2 <- b x3 <- c x1 x4 <- d x2 return (x3,x4)
a b c d
((,) <$> A <*> B) >>= \(x1,x2) -> (,) <$> C[x1] <*> D[x2] (,) <$> (A >>= \x1 -> C[x1]) <*> (B >>= \x2 -> D[x2]) (A | B) ; (C | D) (A ; C) | (B ; D)
((,) <$> A <*> B) >>= \(x1,x2) -> (,) <$> C[x1] <*> D[x2] (,) <$> (A >>= \x1 -> C[x1]) <*> (B >>= \x2 -> D[x2]) More concurrency (A | B) ; (C | D) (A ; C) | (B ; D)
((,) <$> A <*> B) >>= \(x1,x2) -> (,) <$> C[x1] <*> D[x2] (,) <$> (A >>= \x1 -> C[x1]) <*> (B >>= \x2 -> D[x2]) valid for any law-abiding Monad
commutative Monads
result:
do x1 <- a x3 <- c x1 x2 <- b x4 <- d x2 return (x3,x4)
concurrency is a better default
possible, turn off ApplicativeDo or use >>=
myFunction = writeSomeData >>= \_ -> readSomeData …
given request
memo :: Key -> Haxl a -> Haxl a
imperfection causes GC to slow down
+RTS -N48 -qn16 and easily max out the CPU provided we have enough worker threads
counteract the slowdown when N > #cores
state which would be harder to manage
some form is the way forwards
copying collector
capabilities
Nurseries
Free Used
Problem: capabilities allocate at different rates, so we GC before we have filled all the memory
Free Used
Full Chunks Empty Chunks
Main Memory Processor Cores Bus
Main Memory Processor #1 Cores Bus Main Memory Processor #2 Cores
2x)
memory randomly, so we’ll get ~50% remote access
Free Used
Empty Chunks Full Chunks Node 0 Node 1 Capabilities
node, or run the GC?
(major) collection
compact :: a -> IO (Compact a) getCompact :: Compact a -> a
takes an arbitrary value and copies it into a consecutive region of memory returns a reference to the compacted value
heap, with zero GC overhead
for a C++ API like this
void sendRequest( Request &req, std::function<void(Response&)> callback );
type HaskellCallback = Ptr Response -> IO () foreign import ccall “wrapper” mkCallback :: HaskellCallback
sendRequest :: Request -> IO (MVar Response) sendRequest req = do mvar <- newEmptyMVar callback <- mkCallback $ \responsePtr -> do r <- unmarshal responsePtr putMVar r
void hs_try_putmvar ( int capability, HsStablePtr sp );
StablePtr (MVar ()) Hint
tryPutMVar :: MVar () -> IO ()
hs_try_putmvar()
and GC’d, no need to free
receive :: MVar () -> Ptr Response -> IO Response receive m p = do takeMVar m peek p
sends a message
hs_try_putmvar() avoids all that
improvements from this
the memory
and O(time since last GC)
Latency
Throughput
Queue
Clients Clients
Latency
Throughput
Queue
Clients Clients
throughput-optimised service when possible
understand it
Compact