Foundations of Distributed Computing in the 2020s Jukka Suomela - - PowerPoint PPT Presentation

foundations of distributed computing in the 2020s
SMART_READER_LITE
LIVE PREVIEW

Foundations of Distributed Computing in the 2020s Jukka Suomela - - PowerPoint PPT Presentation

Foundations of Distributed Computing in the 2020s Jukka Suomela Aalto University, Finland What are the theoretical foundations of the modern society? Modern world large-scale communication networks Physical side: practice:


slide-1
SLIDE 1

Foundations of Distributed Computing in the 2020s

Jukka Suomela Aalto University, Finland

slide-2
SLIDE 2

What are the theoretical foundations

  • f the modern society?
  • Modern world ≈ large-scale communication networks
  • Physical side:
  • practice: computers, network equipment, laser, fiber optics, radio …
  • solid theoretical foundations: electromagnetism,

quantum mechanics …

  • Logical side:
  • practice: communication protocols, networked applications …
  • solid theoretical foundations: ???

2

slide-3
SLIDE 3

Logical foundations of large communication networks

  • Computers:
  • theory of computation, computability,

computational complexity …

  • Communication between computers:
  • information theory,

communication complexity theory …

  • Computation in a network as a whole:
  • theory of distributed computing

3

Our focus today

slide-4
SLIDE 4

Logical foundations of computers vs. computer networks

  • Theory of computation:

Which tasks can be solved efficiently with a computer?

  • Theory of distributed computing:

Which tasks can be solved efficiently in a large computer network?

4

slide-5
SLIDE 5

Logical foundations of computers vs. computer networks

  • Example: solving graph problems
  • Theory of computation:
  • “Here is a graph that is given as a string on a Turing machine tape”
  • How many steps does a Turing machine need to solve this problem?
  • Theory of distributed computing:
  • “I am a node in the middle of a very large graph”
  • How far do I need to see to pick my own part of the solution?
  • How much of the graph do I need to see?
  • How many communication rounds are needed to solve the problem?

5

slide-6
SLIDE 6

6

1 1 1 1 1 2 3 4 5 6 7 8 9

O(1) distance Θ(n) distance

Local: am I part of a triangle? Global: how far am I from the nearest triangle?

slide-7
SLIDE 7

Logical foundations of computers vs. computer networks

  • Theory of computation:
  • e.g. hugely influential framework of NP-completeness (1970s)
  • Theory of distributed computing:
  • studied actively already since the 1980s
  • but we have only very recently started to really understand e.g. locality
  • solid theoretical foundations still largely missing
  • lots of progress in the 2010s, tons of work left for the 2020s

7

slide-8
SLIDE 8

Distributed computing before the 2010s

8

slide-9
SLIDE 9

Standard models

  • f computing
  • LOCAL model
  • input graph = computer network
  • initially: each node has a unique ID + its own part of input
  • communication round: each node sends a message to each neighbor
  • finally: each nodes stops and outputs its own part of the solution
  • CONGEST model
  • bounded-size messages
  • Port-numbering model
  • no unique IDs

9

Number of rounds = time = distance

10 2 6 14 11 8 18 4 7 5 21

slide-10
SLIDE 10

Some important ideas and concepts

  • Solving vs. checking
  • finding a solution vs. verifying a solution
  • cf. deterministic vs. nondeterministic Turing machines, P vs. NP
  • Problem family of “locally checkable labelings” (LCLs)
  • O(1) input labels, O(1) output labels, max degree O(1)
  • verification: check each radius-O(1) neighborhood
  • Naor & Stockmeyer (1993, 1995)
  • Proof labeling schemes
  • Korman, Kutten, Peleg (2005)

10

Example: vertex coloring with 3 colors

1 2 1 2 3 3 1 2 3 1 3 2

slide-11
SLIDE 11

Some well-understood questions

  • What can be computed with deterministic algorithms in

anonymous networks?

  • e.g. Angluin (1980), Yamashita & Kameda (1996)
  • key technique: covering maps
  • Which LCL problems can be solved in constant time?
  • e.g. Naor & Stockmeyer (1993, 1995)
  • key technique: Ramsey theory

11

slide-12
SLIDE 12

Four key problems

  • Key primitives for symmetry breaking
  • e.g. input is a symmetric cycle → output has to break symmetry
  • Trivial linear-time centralized algorithms
  • e.g. maximal matching: pick non-adjacent edges until stuck
  • Can we solve these efficiently in a distributed setting?

12

maximal independent set maximal matching (Δ+1)-vertex coloring (2Δ−1)-edge coloring

slide-13
SLIDE 13

Four key problems

  • Pioneering work on upper bounds:
  • Cole & Vishkin (1986), Luby (1985, 1986), Alon, Babai, Itai (1986),

Israeli & Itai (1986), Panconesi & Srinivasan (1996), Hanckowiak, Karonski, Panconesi (1998, 2001), Panconesi & Rizzi (2001) …

  • Pioneering work on lower bounds:
  • Linial (1987, 1992), Naor (1991), Kuhn, Moscibroda, Wattenhofer (2004)

13

maximal independent set maximal matching (Δ+1)-vertex coloring (2Δ−1)-edge coloring

slide-14
SLIDE 14

Four key problems

  • Still wide gaps between upper and lower bounds
  • Role of randomness poorly understood

14

maximal independent set maximal matching (Δ+1)-vertex coloring (2Δ−1)-edge coloring

slide-15
SLIDE 15

It seems that before the 2010s…

  • Lots of work focused on specific problems
  • proving upper & lower bounds for problem X
  • connecting complexity of problem X through reductions to problem Y
  • Not so much effort in understanding the overall landscape of

distributed computational complexity

  • what are the meaningful classes of problems?
  • what can we prove about entire classes of problems?
  • We were lacking general-purpose techniques for studying

distributed computing

15

slide-16
SLIDE 16

With hindsight…

  • Naor & Stockmeyer (1993, 1995) introduced a very useful

problem class (LCLs) and initiated the study of decidability of distributed complexity

  • but there was little follow-up work on these ideas until around 2016
  • Linial (1987, 1992) already had the key idea behind

“round elimination”

  • but it was not really recognized as a general-purpose proof technique

until around 2018

16

slide-17
SLIDE 17

Some highlights of distributed computing in the 2010s

17

slide-18
SLIDE 18

From the 2010s: Classification of LCLs

18

slide-19
SLIDE 19

LCL problems

  • Examples of LCL problems (in graphs of max degree Δ = O(1)):
  • (Δ+1)-coloring, Δ-coloring, 3-coloring …
  • maximal independent set, maximal matching …
  • sinkless orientation
  • orient all edges
  • all nodes of degree ≥ 3 have outdegree ≥ 1
  • locally optimal cut
  • label nodes black/white
  • at least half of the neighbors have opposite color
  • SAT (when interpreted as a graph problem)
  • many other constraint satisfaction problems

19

Can we say something about all

  • f these?
slide-20
SLIDE 20

n n log n log n log log n log log n log∗ n log∗ n log log∗ n log log∗ n 1 1

20

Landscape of LCL problems

Randomized time complexity Deterministic time complexity

slide-21
SLIDE 21

21

Landscape of LCL problems

deterministic randomized

n n log n log n log log n log log n log∗ n log∗ n log log∗ n log log∗ n 1 1

Θ(log n) deterministic Θ(log log n) randomized

slide-22
SLIDE 22

n n log n log n log log n log log n log∗ n log∗ n log log∗ n log log∗ n 1 1

22

Landscape of LCL problems

deterministic randomized

Trivial Trivial

slide-23
SLIDE 23

n n log n log n log log n log log n log∗ n log∗ n log log∗ n log log∗ n 1 1

23

Landscape of LCL problems

deterministic randomized

Maximal independent set Cole & Vishkin 1986 Linial 1987, 1992 Naor 1991

slide-24
SLIDE 24

n n log n log n log log n log log n log∗ n log∗ n log log∗ n log log∗ n 1 1

24

deterministic randomized

State of the art 1992

slide-25
SLIDE 25

n n log n log n log log n log log n log∗ n log∗ n log log∗ n log log∗ n 1 1

25

deterministic randomized

State of the art 2015

???

slide-26
SLIDE 26

n n log n log n log log n log log n log∗ n log∗ n log log∗ n log log∗ n 1 1

Brandt et al. 2016 Chang et al. 2016 Ghaffari & Su 2017 Chang et al. 2016 Chang & Pettie 2017 Naor & Stockmeyer 1995 Cole & Vishkin 1986 Linial 1992 Naor 1991 Balliu et al. 2018a Chang & Pettie 2017 Fischer & Ghaffari 2017 Chang & Pettie 2017 Balliu et al. 2018a Balliu et al. 2018b Ghaffari et al. 2018 Balliu et al. 2019 Rozhon & Ghaffari 2019

26

deterministic randomized

State of the art 2019

slide-27
SLIDE 27

n n log n log n log log n log log n log∗ n log∗ n log log∗ n log log∗ n 1 1

27

deterministic randomized

State of the art 2019

slide-28
SLIDE 28

n n log n log n log log n log log n log∗ n log∗ n log log∗ n log log∗ n 1 1

28

deterministic randomized

Four classes of graph problems

slide-29
SLIDE 29

n n log n log n log log n log log n log∗ n log∗ n log log∗ n log log∗ n 1 1

29

deterministic randomized

Gaps

slide-30
SLIDE 30

Gaps have direct algorithmic implications

If you can solve an LCL problem

  • in o(log n) rounds with a deterministic algorithm or
  • in o(log log n) rounds with a randomized algorithm

then you can also solve it

  • in O(log* n) rounds with a deterministic algorithms

30

slide-31
SLIDE 31

Gaps have direct complexity-theoretic implications

If you can show that there is no O(log* n)-time deterministic algorithm then:

  • deterministic complexity is at least Ω(log n)
  • randomized complexity is at least Ω(log log n)

31

slide-32
SLIDE 32

From the 2010s: Complexity of maximal independent set & maximal matching

32

slide-33
SLIDE 33

2 of 4 key problems well understood

  • Maximal independent set & matching:
  • deterministic O(Δ + log* n)
  • deterministic poly(log n)
  • randomized O(log Δ) + poly(log log n)
  • cannot improve any of these much
  • Upper bound: Rozhon & Ghaffari (2019) + many others
  • a new algorithm for deterministic network decomposition
  • Lower bound: Balliu et al. (2019)
  • based on the “round elimination” technique

33

maximal independent set maximal matching (Δ+1)-vertex coloring (2Δ−1)-edge coloring

slide-34
SLIDE 34

From the 2010s: Round elimination technique

34

slide-35
SLIDE 35

Round elimination technique

  • Given:
  • algorithm A0 solves problem P0 in T rounds
  • We construct:
  • algorithm A1 solves problem P1 in T − 1 rounds
  • algorithm A2 solves problem P2 in T − 2 rounds
  • algorithm A3 solves problem P3 in T − 3 rounds

  • algorithm AT solves problem PT in 0 rounds
  • But PT is nontrivial, so A0 cannot exist

35

slide-36
SLIDE 36

Linial (1987, 1992): coloring cycles

  • Given:
  • algorithm A0 solves 3-coloring in T = o(log* n) rounds
  • We construct:
  • algorithm A1 solves 23-coloring in T − 1 rounds
  • algorithm A2 solves 223-coloring in T − 2 rounds
  • algorithm A3 solves 2223-coloring in T − 3 rounds

  • algorithm AT solves o(n)-coloring in 0 rounds
  • But o(n)-coloring is nontrivial, so A0 cannot exist

36

slide-37
SLIDE 37

Brandt et al. (2016): sinkless orientation

  • Given:
  • algorithm A0 solves sinkless orientation in T = o(log n) rounds
  • We construct:
  • algorithm A1 solves sinkless coloring in T − 1 rounds
  • algorithm A2 solves sinkless orientation in T − 2 rounds
  • algorithm A3 solves sinkless coloring in T − 3 rounds

  • algorithm AT solves sinkless orientation in 0 rounds
  • But sinkless orientation is nontrivial, so A0 cannot exist

37

slide-38
SLIDE 38

Round elimination can be automated

  • Always possible for any graph problem P0

that is locally checkable

  • If problem P0 has complexity T, we can always find in a

mechanical manner problem P1 that has complexity T − 1

  • Holds for tree-like neighborhoods (e.g. high-girth graphs)
  • Can be used to derive lower bounds and to design algorithms

38

Brandt 2019

slide-39
SLIDE 39

From the 2010s: Using computers to study distributed computing

39

slide-40
SLIDE 40

Using computers to do study distributed computing

  • Many questions related to distributed computational complexity

have turned out to be decidable or semi-decidable

  • at least in principle, and often also in practice
  • we can start to automate our own work and outsource

algorithm design & lower bound construction to computers

  • Automatic round elimination implemented, available online:

github.com/olidennis/round-eliminator (Olivetti 2019)

  • in 2016 a lower bound for “sinkless orientation” was a STOC paper
  • in 2019 you can reproduce it in your web browser

40

slide-41
SLIDE 41

Distributed computing in the 2020s

41

slide-42
SLIDE 42

Distributed complexity theory beyond LCLs

  • We can nowadays say a lot about LCL problems:
  • near-complete classification of distributed complexity
  • systematic studies, powerful proof techniques, automatic tools
  • How could we extend all this to non-LCLs?
  • Small first steps for the coming years:
  • locally checkable problems with unbounded degrees?
  • locally checkable problems with countably many labels?
  • locally checkable problems with real numbers and linear constraints?
  • optimization problems with locally checkable constraints?

42

slide-43
SLIDE 43

Four key problems

  • Independent sets & matchings: now well understood
  • Coloring: distributed complexity still wide open
  • “Small” first step for the coming years:
  • show that (Δ+1)-vertex coloring cannot

be solved in o(log Δ) + O(log* n) rounds

43

maximal independent set maximal matching (Δ+1)-vertex coloring (2Δ−1)-edge coloring

slide-44
SLIDE 44

Two perspectives of distributed computing

Network algorithms

  • Solving problems related

to the network structure

  • Example: network protocols
  • Key limitation: long distances
  • No centralized control
  • Local perspective

Big data

  • Solving large computational

tasks with many computers

  • Example: MapReduce
  • Key limitation: bandwidth
  • Fully centralized control
  • Global perspective

44

slide-45
SLIDE 45

Two perspectives of distributed computing

Network algorithms

  • LOCAL
  • CONGEST

Big data

  • PRAM
  • MPC = Massively Parallel

Computation

  • BSP = Bulk-Synchronous

Parallel

  • Congested clique

45

Unifying models?

slide-46
SLIDE 46

Two perspectives of distributed computing

Network algorithms

  • tight unconditional

lower bounds for many problems Big data

  • typically at best

conditional lower bounds

46

Technology transfer?

slide-47
SLIDE 47

Two perspectives of distributed computing

47

LOCAL MPC BSP PRAM

cannot simulate efficiently cannot simulate efficiently

slide-48
SLIDE 48

Two perspectives of distributed computing

48

LOCAL MPC BSP PRAM

efficient simulation

VOLUME

efficient simulation

slide-49
SLIDE 49

Volume model

  • Time T in LOCAL model:
  • each node can explore a subgraph
  • f radius T around it and then choose its output
  • Time T in VOLUME model:
  • each node can adaptively explore a subgraph
  • f size T around it and then choose its output
  • Closely related model: LCA (local computation algorithms),

a.k.a. centralized LOCAL algorithms or CentLOCAL

49

slide-50
SLIDE 50

Volume model

  • Bridge between two flavors of distributed computing
  • Close enough to LOCAL so that it is possible to prove

unconditional lower bounds

  • Yet poorly understood: typically exponential gaps between

upper and lower bounds

  • Not-so-small first steps:
  • charting the landscape of LCL problems in the volume model
  • tight bounds for e.g. sinkless orientation, maximal matching …
  • volume analogue of round elimination

50

slide-51
SLIDE 51

Summary

  • 2010s:
  • systematic study of LCL problems in the LOCAL model
  • new techniques and automatic tools
  • 2020s:
  • extending theory beyond LCLs
  • technology transfer LOCAL → VOLUME → MPC, PRAM, …
  • Small puzzles to solve:
  • show that O(Δ) volume is not enough for bipartite maximal matching
  • construct an LCL problem with deterministic volume ω(log* n) … o(n)

51