Exploration of declarative languages applicability to development of - - PowerPoint PPT Presentation

exploration of declarative languages applicability to
SMART_READER_LITE
LIVE PREVIEW

Exploration of declarative languages applicability to development of - - PowerPoint PPT Presentation

Exploration of declarative languages applicability to development of large-scale data processing systems December 2016 Slavik Derevyanko, Anil Pacaci Declarative languages for distributed systems A research group at UC Berkeley lead by Prof.


slide-1
SLIDE 1

Exploration of declarative languages applicability to development of large-scale data processing systems

Slavik Derevyanko, Anil Pacaci

December 2016

slide-2
SLIDE 2
  • A research group at UC Berkeley lead by Prof. Hellerstein:

○ claims that the problems with distributed software come from the usage of imperative sequential programming languages to describe systems that are inherently non-sequential ○ resulting systems tend to be much smaller: 20KLOC / 1KLOC for HDFS

  • Related PhD theses we’ve studied in this class:

○ Peter Alvaro: Data-centric Programming for Distributed Systems, 2015 ○ Peter Bailis: Coordination Avoidance in Distributed Databases, 2015. I-Confluence

Declarative languages for distributed systems

Overview

2 / 17

slide-3
SLIDE 3

Project goals

  • Decided to verify claims on applicability of declarative logic programming for

development of distributed software systems

  • Decided to build one of the distributed data processing models presented in class
  • Decided to implement Google’s Pregel, as a simple synchronous model for

parallel computation based on Valiant’s Bulk Synchronous Parallel BSP model

  • To test correctness of our Pregel model - implemented PageRank on top of it

Overview

3 / 17

slide-4
SLIDE 4

Bloom Bud declarative framework

  • All data is represented as collections of facts (or tables containing records)
  • New facts can be derived by declaring transformational rules
  • No shared state: nodes exchange data as network messages (Overlog)
  • Introduction of notion of time - data collections evolve over time (Dedalus)

Overview

4 / 17

slide-5
SLIDE 5

Building Pregel using Bud Bloom declarative framework

slide-6
SLIDE 6

Pregel distributed graph processing model

6 / 17

Pregel implementation

slide-7
SLIDE 7

Master node superstep coordination

Pregel implementation

7 / 17

slide-8
SLIDE 8

Worker node superstep processing

Pregel implementation

8 / 17

slide-9
SLIDE 9

PageRank implementation

Pregel implementation

9 / 17

slide-10
SLIDE 10

Comparing declarative and imperative programming

slide-11
SLIDE 11

Advantages - less code

Bud Experience

11 / 17

slide-12
SLIDE 12

Troubles, limitations

Bud Experience

12 / 17

slide-13
SLIDE 13

Demo

slide-14
SLIDE 14

PageRank by matrix multiplication

14 / 17

slide-15
SLIDE 15
slide-16
SLIDE 16

TCP network communication (instead of UDP)

16 / 17