Scalable and Reliable Data Broadcast with Kascade ephane Martin, - - PowerPoint PPT Presentation

scalable and reliable data broadcast with kascade
SMART_READER_LITE
LIVE PREVIEW

Scalable and Reliable Data Broadcast with Kascade ephane Martin, - - PowerPoint PPT Presentation

Context Related work Kascade Experimental validation Conclusion and future work Scalable and Reliable Data Broadcast with Kascade ephane Martin, Tomasz Buchert, Pierric Willemet, Olivier Richard (2) , St Emmanuel Jeanvoine, Lucas Nussbaum


slide-1
SLIDE 1

Context Related work Kascade Experimental validation Conclusion and future work

Scalable and Reliable Data Broadcast with Kascade

St´ ephane Martin, Tomasz Buchert, Pierric Willemet, Olivier Richard(2), Emmanuel Jeanvoine, Lucas Nussbaum

Algorille Team (Inria-Loria, F-54500, France)

(2)MESCAL Team (LIG, F-38000,France)

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 1 / 20

slide-2
SLIDE 2

Context Related work Kascade Experimental validation Conclusion and future work

Context Big Data

Broadcast: Large amount of data From one storage To large number of nodes Fault tolerant Useful to: Distribute big data before analysis System image deployment in HPC Cluster

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 2 / 20

slide-3
SLIDE 3

Context Related work Kascade Experimental validation Conclusion and future work

Challenges

Efficient use of fat tree network

Present in most cluster Cost-efficient

Fault-tolerance

One computer can have a problem → Many computers have problems

Stream capability

core switch top-of-the-rack switch nodes

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 3 / 20

slide-4
SLIDE 4

Context Related work Kascade Experimental validation Conclusion and future work

Related work

Network Layer multicast Binomial tree BitTorrent Pipelined Broadcast

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 4 / 20

slide-5
SLIDE 5

Context Related work Kascade Experimental validation Conclusion and future work

Network Layer multicast

IP multicast

Support is usually disabled in network equipment Find a group address or method to share it High-throughput protocol UDP is unfair to another protocols Message delivering is not guaranteed ! →UDPCast provides:

Acknowledgment → Generate too many packets to the master, it’s not scalable Forward Error Correction (FEC) → Configuration depends of packets lost amount

InfiniBand multicast

Not installed on all nodes (expensive) Same problem

Conclusions Best in theory Not possible or efficient in practice

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 5 / 20

slide-6
SLIDE 6

Context Related work Kascade Experimental validation Conclusion and future work

Protocols

Binomial tree Transfer random part of file (entire file is in memory or hard drive) BitTorrent Verbose protocol Transfer random part of file Conclusions Not suitable

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 6 / 20

slide-7
SLIDE 7

Context Related work Kascade Experimental validation Conclusion and future work

Pipelined Broadcast

One time on each direction Topology aware Already existing projects: Not fault tolerant:

Ka Dolly part of MPI Bcast primitive (Open MPI)

Fault tolerant:

Dolly+ → unmaintained and FT is not mentioned in their publication.

1 sending node 2 3 4 5 6 7 8 9 10 receiving nodes

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 7 / 20

slide-8
SLIDE 8

Context Related work Kascade Experimental validation Conclusion and future work

Our contributions: Kascade

How does it work ? Overview of pipeline establishment Fault detection Recovery Collecting information Protocol is needed

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 8 / 20

slide-9
SLIDE 9

Context Related work Kascade Experimental validation Conclusion and future work

Overview of data transfer pipeline (Kascade)

1

Order the nodes

2

Deploy itself → Efficient help with Taktuk (Parallel launcher)

3

Establish the pipeline (open TCP/IP connections)

4

Transfer the data

5

Send report to the master

1 sending node 2 3 4 5 6 7 8 9 10 receiving nodes

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 9 / 20

slide-10
SLIDE 10

Context Related work Kascade Experimental validation Conclusion and future work

Overview of data transfer pipeline (Kascade)

1

Order the nodes

2

Deploy itself → Efficient help with Taktuk (Parallel launcher)

3

Establish the pipeline (open TCP/IP connections)

4

Transfer the data

5

Send report to the master

1 sending node 2 3 4 5 6 7 8 9 10 receiving nodes

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 9 / 20

slide-11
SLIDE 11

Context Related work Kascade Experimental validation Conclusion and future work

Overview of data transfer pipeline (Kascade)

1

Order the nodes

2

Deploy itself → Efficient help with Taktuk (Parallel launcher)

3

Establish the pipeline (open TCP/IP connections)

4

Transfer the data

5

Send report to the master

1 sending node 2 3 4 5 6 7 8 9 10 receiving nodes

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 9 / 20

slide-12
SLIDE 12

Context Related work Kascade Experimental validation Conclusion and future work

Fault detection

Two case of faults:

Stream closed (error packet received) Black hole (nothing comes back)

The sender handles error When the next node stops to read the stream → ping the next node ?

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 10 / 20

slide-13
SLIDE 13

Context Related work Kascade Experimental validation Conclusion and future work

Recovery

Add error to the report Try to connect to the next node Replay the lost messages

Use buffer to resend lost message Ask the missing part to the master in worst case

X

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 11 / 20

slide-14
SLIDE 14

Context Related work Kascade Experimental validation Conclusion and future work

Collecting status at end of transfer

Connecting to the master directly is not scalable or fault tolerant Using the pipeline to transmit a report is scalable and fault tolerant The last node forwards the report to the master The report reception implies the end of transfer

1 sending node 2 3 4 5 6 7 8 9 10 receiving nodes

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 12 / 20

slide-15
SLIDE 15

Context Related work Kascade Experimental validation Conclusion and future work

Protocol is needed

The protocol avoids the data size knowledge (stream capability) permits prematurely end requested by user distinguishes the report than data improves the fault tolerance (correct ending despite failures)

n1 n2

×

n3

TCP connection TCP connection GET 0 GET 0 DATA x DATA x DATA x DATA x . . . . . . DATA x TCP connection GET a DATA x . . . DATA y END REPORT z PASSED

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 13 / 20

slide-16
SLIDE 16

Context Related work Kascade Experimental validation Conclusion and future work

Validation questions

How do the various solutions perform and scale up to large number

  • f nodes?

How does Kascade perform on high-performance networks (10 Gbps Ethernet, IP over InfiniBand)? What is the impact of network topology and communication structure on performance? What it the impact of I/O performance on the overall performance? How does Kascade perform on large-scale (Internet-like) setups? How does Kascade perform on smaller files? How well does Kascade’s fault tolerance mechanism perform?

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 14 / 20

slide-17
SLIDE 17

Context Related work Kascade Experimental validation Conclusion and future work

How do the various solutions perform and scale up to large number of nodes ?

50 100 150 200 50 100 Number of clients Throughput (MB/s) Kascade TakTuk/chain TakTuk/tree UDPCast MPI BCast 2GB file transfer 1Gbps Ethernet It scales

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 15 / 20

slide-18
SLIDE 18

Context Related work Kascade Experimental validation Conclusion and future work

What is the impact of network topology and communication structure on performance ?

50 100 150 200 50 100 Number of clients Throughput (MB/s) Kascade TakTuk/chain TakTuk/tree MPI BCast Kascade/ordered Shuffle order of nodes 2GB file transfer 1Gbps Ethernet Topology awareness is important

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 16 / 20

slide-19
SLIDE 19

Context Related work Kascade Experimental validation Conclusion and future work

How well does Kascade’s fault tolerance mechanism perform?

Simultaneous

! ! !

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 17 / 20

slide-20
SLIDE 20

Context Related work Kascade Experimental validation Conclusion and future work

How well does Kascade’s fault tolerance mechanism perform?

Simultaneous

! ! ! ? ? ? ? ? ? ? ping ? ping ? ping ?

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 17 / 20

slide-21
SLIDE 21

Context Related work Kascade Experimental validation Conclusion and future work

How well does Kascade’s fault tolerance mechanism perform?

Simultaneous

! ! ! connect connect connect X X X

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 17 / 20

slide-22
SLIDE 22

Context Related work Kascade Experimental validation Conclusion and future work

How well does Kascade’s fault tolerance mechanism perform?

Sequencial

!

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 18 / 20

slide-23
SLIDE 23

Context Related work Kascade Experimental validation Conclusion and future work

How well does Kascade’s fault tolerance mechanism perform?

Sequencial

! ping ?

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 18 / 20

slide-24
SLIDE 24

Context Related work Kascade Experimental validation Conclusion and future work

How well does Kascade’s fault tolerance mechanism perform?

Sequencial

! X connect

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 18 / 20

slide-25
SLIDE 25

Context Related work Kascade Experimental validation Conclusion and future work

How well does Kascade’s fault tolerance mechanism perform?

Sequencial

! ! X

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 18 / 20

slide-26
SLIDE 26

Context Related work Kascade Experimental validation Conclusion and future work

How well does Kascade’s fault tolerance mechanism perform?

Sequencial

! ! X ping ? pong ping ?

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 18 / 20

slide-27
SLIDE 27

Context Related work Kascade Experimental validation Conclusion and future work

How well does Kascade’s fault tolerance mechanism perform?

Sequencial

! ! X X connect

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 18 / 20

slide-28
SLIDE 28

Context Related work Kascade Experimental validation Conclusion and future work

How well does Kascade’s fault tolerance mechanism perform?

Sequencial

! ! ! X X

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 18 / 20

slide-29
SLIDE 29

Context Related work Kascade Experimental validation Conclusion and future work

How well does Kascade’s fault tolerance mechanism perform?

Sequencial

! ! ! X X ping ? pong ping ? pong ping ?

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 18 / 20

slide-30
SLIDE 30

Context Related work Kascade Experimental validation Conclusion and future work

How well does Kascade’s fault tolerance mechanism perform?

Sequencial

! ! ! X X X connect

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 18 / 20

slide-31
SLIDE 31

Context Related work Kascade Experimental validation Conclusion and future work

How well does Kascade’s fault tolerance mechanism perform?

n

  • f

a i l u r e 2 % s i m . f a i l u r e s 5 % s i m . f a i l u r e s 1 % s i m . f a i l u r e s 2 % s e q . f a i l u r e s 5 % s e q . f a i l u r e s 1 % s e q . f a i l u r e s 20 40 60 80 Throughput (MB/s)

100 virtual nodes, fault injected with Distem 5GB file transfer 1Gbps Ethernet Fault tolerant

  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 19 / 20

slide-32
SLIDE 32

Context Related work Kascade Experimental validation Conclusion and future work

Conclusions

Efficient use of fat tree network

Saturate the 1Gbps network

Fault-tolerance

Mechanisms works Performance is acceptable in hostile environment

Stream capability Future work: Slow node elimination → when a node in the pipeline slows down the transfer rate because

  • f a malfunction
  • S. Martin, T. Buchert, P. Willemet, O. Richard, E. Jeanvoine, L. Nussbaum

Scalable and Reliable Data Broadcast with Kascade 20 / 20