Accel Ac celeration of Er eration of Erasur asure e Cod Codin - - PowerPoint PPT Presentation

β–Ά
accel ac celeration of er eration of erasur asure e cod
SMART_READER_LITE
LIVE PREVIEW

Accel Ac celeration of Er eration of Erasur asure e Cod Codin - - PowerPoint PPT Presentation

To Towar ards ds In In-networ network k Accel Ac celeration of Er eration of Erasur asure e Cod Codin ing Yi Qiao, Xiao Kong, Menghao Zhang, Yu Zhou, Mingwei Xu, Jun Bi Tsinghua University Eras rasure ure Coding ing (EC) In


slide-1
SLIDE 1

To Towar ards ds In In-networ network k Ac Accel celeration of Er eration of Erasur asure e Cod Codin ing

Yi Qiao, Xiao Kong, Menghao Zhang, Yu Zhou, Mingwei Xu, Jun Bi Tsinghua University

slide-2
SLIDE 2

Eras rasure ure Coding ing (EC)

  • In data centers, machine failures happen very frequently. Facebook

reports up to 50 machine failures per day in their data warehouses.

  • EC provides data fault tolerance with much lower storage overheads

(~1.4x) than replication (3x), with similar degree of availability.

  • EC reconstructs missing data with remaining data and pre-calculated

parities.

  • For example:
  • XOR (RAID 5)
  • Reed-Solomon Codes
slide-3
SLIDE 3

EC C Examples xamples

𝑏 𝑐 𝑑 π‘ž = 𝑏⨁𝑐⨁𝑑 Reconstruct b with 𝑐 = π‘β¨π‘‘β¨π‘ž XOR (RAID 5) 𝑏 𝑐 𝑑 π‘ž1 = 𝑏 + 𝑐 + 𝑑 π‘ž2 = 𝑏 + 2𝑐 + 2𝑑 Reconstruct a with 𝑏 = 2π‘ž1 βˆ’ π‘ž2 Reconstruct c with 𝑑 = π‘ž2 βˆ’ π‘ž1 βˆ’ 𝑐 Reed Solomon Code (Conceptual) These are Galois Field arithmetics. For simplicity, just comprehend them as integer arithmetics.

  • Conclusion: EC reconstruction can

be modelled with

𝑛 = ෍

𝑗=1 𝑙

𝑏𝑗𝑦𝑗 , 𝑛 : reconstructed symbol 𝑦𝑗 : symbols from remaining machines 𝑏𝑗 : pre-computed coefficients

  • Addition refers to XOR
  • Multiplication is on Galois Field

linear combinations

slide-4
SLIDE 4

EC Pro roblems blems

  • Low reconstruction rate
  • Several hours to reconstruct a disk
  • Several seconds for degraded reads
  • EC is mostly used for storing β€œcold” data in data warehouses.
  • Why so slow?
slide-5
SLIDE 5

Motiva tivation tion

𝑩 π‘ͺ𝟐 π‘ͺπŸ‘ π‘ͺπŸ’

ToR

NetEC 𝑩 π‘ͺ𝟐 π‘ͺπŸ‘ π‘ͺπŸ’

ToR

Forward Forward DISK CPU Multiplexed NIC DISK CPU NIC

Disk Reconstruction Rate = 1/3

  • f available NIC capacity

No NIC Sharing/multiplexing

Near 100% of available NIC capacity

Line width represents throughput

slide-6
SLIDE 6

Ne NetE tEC

  • We present NetEC that offloads EC reconstruction to

programmable switches.

  • It improves reconstruction rates by k times, where k is the

number of the machines to download from.

  • It also entirely removes CPU usage.
slide-7
SLIDE 7

Bri rief ef Ov Over erview view of f Ne NetE tEC Da Data ta Pla lane ne

π’šπŸ π’›πŸ

Partial XOR Sum Buffer Progress Tracker

π’ƒπŸπ’šπŸ+π’ƒπŸ‘π’šπŸ‘+π’ƒπŸ’π’šπŸ’ π’ƒπŸπ’›πŸ+π’ƒπŸ‘π’›πŸ‘+π’ƒπŸ’π’›πŸ’

𝐢1 𝐢2 𝐢3

…… …… On Switch Decoding Buffer … … Drop Drop

β‘  β‘‘ β‘’ β‘€ β‘₯ ⑦ β‘§

A

π’ƒπŸπ’šπŸ+π’ƒπŸ‘π’šπŸ‘+π’ƒπŸ’π’šπŸ’ 111 π’ƒπŸπ’šπŸ 100 π’ƒπŸπ’šπŸ+π’ƒπŸ‘π’šπŸ‘ 110 000

P1 P2 P3 P1 arrives P2 arrives P3 arrives

GF Mult.

… …

π’šπŸ‘ π’›πŸ‘

…

π’šπŸ’ π’›πŸ’

…

π’ƒπŸπ’šπŸ π’ƒπŸπ’›πŸ

…

π’ƒπŸ‘π’šπŸ‘ π’ƒπŸ‘π’›πŸ‘

…

π’ƒπŸ’π’šπŸ’ π’ƒπŸ’π’›πŸ’

…

β‘£

Extracted In PHVs Stateful Registers

slide-8
SLIDE 8

Cha halle llenges nges an and Des Design ign

  • Galois Field Multiplication Offloading
  • Rate Synchronization
  • Deep Payload Inspection/assembly
slide-9
SLIDE 9

Cha halle llenges nges an and Des Design( ign(1) 1)

  • Galois Field Multiplication Offloading
  • We convert it to addition, logarithm and exponents
  • To calculate π’ƒπŸπ’šπŸ,
  • Look up π’Žπ’‘π’‰(π’šπŸ) in the logarithm table
  • Add with a pre-known π’Žπ’‘π’‰(π’ƒπŸ) : π’Žπ’‘π’‰(π’ƒπŸπ’šπŸ) = π’Žπ’‘π’‰(π’ƒπŸ)+π’Žπ’‘π’‰(π’šπŸ)
  • Look up π’ƒπŸπ’šπŸin the exponent table: π’ƒπŸπ’šπŸ = 𝒇 π’Žπ’‘π’‰(π’ƒπŸπ’šπŸ)
  • Note that the logarithms and exponents are also on the Galois Field,

where this method is valid.

  • Rate Synchronization
  • Deep Payload Inspection/assembly
slide-10
SLIDE 10

Cha halle llenges nges an and Des Design( ign(2) 2)

  • Computation Offloading
  • Rate Synchronization
  • Switch has to temporarily buffer partial XOR sums since first packet

arrives until last packet leaves.

  • One-to-many TCP
  • The switch only needs to buffer partial XOR sums whose size is equal

to in-flight packets, bounded by BDP (bandwidth-delay product)

  • SSD peak write speed: 1GB/s
  • DC RTT : 250 us
  • BDP = 250KB
  • Deep Payload Inspection/assembly
slide-11
SLIDE 11

Cha halle llenges nges an and Des Design( ign(3) 3)

  • Computation Offloading
  • Rate Synchronization
  • Deep Payload Inspection/assembly
  • Many switch constraints leads to limited number of processed bytes,

while small-sized packets reduce throughput.

  • Use recirculation inspired by PPS (SOSR 19)
  • Redesign l4 checkcum updates.
slide-12
SLIDE 12

Di Discu scussion ssions s an and li limitation mitations

  • Will NetEC cause incast?
  • NetEC actually prevents incast.
  • Most incoming packets are dropped in the ingress pipeline.
  • Outbound PPS β‰ˆ Inbound PPS
  • Is NetEC scalable?
  • The number of machines to download from: 3, 6, 10
  • The number of concurrent tasks
  • Problem:
  • Currently, a table or register can only be accessed once per packet, so that we

need multiple logarithm/exponent tables.

  • Limited number of registers per stage.
slide-13
SLIDE 13

Im Implementatio plementation n an and Eva valuat luation ion

  • We implement a prototype of NetEC on commodity switches,

and integrate it with HDFS-EC.

slide-14
SLIDE 14

Conc nclusion lusion

  • EC low reconstruction rate is due to multiplexed NIC capacity
  • In-network computation resolves this problem, leading to

great performance improvement.

  • We design and implement NetEC, addressing three challenges,

and conduct preliminary evaluations to show effectiveness.

slide-15
SLIDE 15

Tha hank nk yo you! u!