High throughput High throughput kafka for science kafka for - - PowerPoint PPT Presentation

high throughput high throughput kafka for science kafka
SMART_READER_LITE
LIVE PREVIEW

High throughput High throughput kafka for science kafka for - - PowerPoint PPT Presentation

High throughput High throughput kafka for science kafka for science Testing Kafkas limits for science J Wyngaard, PhD wyngaard@jpl.nasa.gov UTLINE O UTLINE O Streaming Science Data Benchmark Context Tests and Results


slide-1
SLIDE 1

High throughput High throughput kafka for science kafka for science

Testing Kafka’s limits for science

J Wyngaard, PhD wyngaard@jpl.nasa.gov

slide-2
SLIDE 2

O OUTLINE

UTLINE

  • Streaming Science Data
  • Benchmark Context
  • Tests and Results
  • Conclusions
slide-3
SLIDE 3
  • Streaming Science Data
  • Benchmark Context
  • Tests and Results
  • Conclusions

Streaming Science Data

SOODT, Kafka, Science data streams

slide-4
SLIDE 4

DIA GROUP DIA GROUP

  • Using open source tools extensively, to enable JPL

scientists to handle their big data.

– Apache OODT – Apache Tika – Apache Hadoop – Apache Kafka – Apache Mesos – Apache Spark – ...so many more...

slide-5
SLIDE 5

SCIENCE DATA SCIENCE DATA

  • Earth Science

– Satellite data ~5GB/day

  • Radio Astronomy

– Antenna arrays ~4Tbps >>1K 10Gbps

  • Airborne missions

– ~5GB files, 0.5TB per flight

  • Bioinformatics
slide-6
SLIDE 6

STREAMING SOODT STREAMING SOODT

slide-7
SLIDE 7

APACHE KAFKA APACHE KAFKA

Broker Cluster . Consumer Nodes . T00, T01, T02, T03 G0 G1 . T00, T01, T02, T03 T00, T01, T02, T03 G2 G3 . Producer nodes . P0- Topic0 P1- Topic1 P2- Topic2 P3- Topic0 P4- Topic1 P5- Topic2

slide-8
SLIDE 8

10G X 1024 - ? 10G X 1024 - ?

  • Low Frequency Apature

array:

  • 0.25M antennas

– 1024 stations

  • 16 Processing

modules

  • = 4Tbps from 1024

stations at 10Gbps each

Artists' impression of LFAA, SKA image

https://www.skatelescope.org/multimedia/image/l

  • w-frequency-array-ska-wide-field/
slide-9
SLIDE 9
  • Streaming Science Data
  • Benchmark Context
  • Tests and Results
  • Conclusions

Benchmark Context Benchmark Context

Reality check – kafka was not designed for this

slide-10
SLIDE 10
  • Primary system

– 96 nodes

  • 24Core Haswells
  • 128GB RAM

– Infniband FDR and 40 Gb/s

Ethernet connectivity.

– 0.5PB NAND Flash

  • 1 Tbps
  • >200 million IOPS.
  • A 24 node replicate cluster

resides at University of Indiana, connected by a 100 Gb/s link

TACC WRANGLER TACC WRANGLER

slide-11
SLIDE 11

“ “LAZY” BENCHMARKING LAZY” BENCHMARKING

https://engineering.linkedin.com/kafka/benchm arking-apache-kafka-2-million-writes-second- three-cheap-machines

  • “Lazy” being:

– Off-the shelf

cheap hardware

– Untuned default

configuration

slide-12
SLIDE 12

6 CHEAP MACHINES 6 CHEAP MACHINES

  • OTS benchmark

– 6 core 2.5GHz

Xeons

– ~ 100 IOPS

harddrives

– 1Gb Ethernet

  • Wrangler nodes

– 2x 12core 2.5GHz

Xeons

– >200 IOPS flash – 128GB RAM – 40Gb Ethernet

slide-13
SLIDE 13

“ “LAZY” CONFIGURATION LAZY” CONFIGURATION

  • Kafka trunk 0.8.1
  • New producer
  • Default configurations
  • Small messages
  • Setup

– 3 Broker nodes – 3 Zookeeper, Consumer, Producer nodes

  • Kafka builtin performance tools
slide-14
SLIDE 14

STRAIGHTLINE “LAZY” STRAIGHTLINE “LAZY” SPEED TEST SPEED TEST

  • 1 Producer
  • 0 Consumer
  • 1 Topic
  • 6 partition
  • 1 replicates (i.e 0)
  • 50M 100B messages

(small for worst case)

Producer nodes . Broker Cluster . . . . P0- Topic0 T00, T01, T02, T03, T04, T05 Consumer Nodes .

slide-15
SLIDE 15

STRAIGHTLINE “LAZY” STRAIGHTLINE “LAZY” SPEED TEST SPEED TEST

  • 1 Producer
  • 0 Consumer
  • 1 Topic
  • 6 partition
  • 1 replicates (i.e 0)
  • 50M 100B messages

(small for worst case) *Network overhead not accounted for *Network overhead not accounted for

6 cheap machines 78.3MB/s* 78.3MB/s* (0.6Gbps) (0.6Gbps) Wrangler 170.27 MB/sec* (1.3Gbps)

slide-16
SLIDE 16

Δ MESSAGE SIZE Δ MESSAGE SIZE

~100MBps at 100KB message sizesec

slide-17
SLIDE 17

OTHER PARAMETER IMPACTS OTHER PARAMETER IMPACTS

  • Replication:

– Single producer thread, 3x replication, 1 partition

  • Asynchronous

– 0.59Gbps

  • Synchronous

– 0.31 Gbps

  • Parallelism:

– Three producers, 3x asynchronous replication

  • Independant machines

– 1.51 MB/sec < 3*0.59 = 1.77

Reference straight line producer speed: Reference straight line producer speed: 0.61Gbps 0.61Gbps

slide-18
SLIDE 18
  • Streaming Science Data
  • Benchmark Setup
  • Wrangler Performance
  • Conclusions

Wrangler Performance Wrangler Performance

Limits

slide-19
SLIDE 19

TARGETTING 10G TARGETTING 10G

  • 40x networks speed
  • 4x core counts
  • 2x IOPS
  • 128x RAM
  • Starting point

– Bigger messages – No replication – In node paralleism – Big Buffers – Large Batches

slide-20
SLIDE 20

Δ MESSAGE SIZE Δ MESSAGE SIZE

B KB MB

Averaged throughput over changing message size

?

Sustainable

slide-21
SLIDE 21

PARTITIONS PARTITIONS

  • 3 producers, 1 topic, asynchronous, 3

consumer threads

– Averager 6.49Gbps (8000 messages)

slide-22
SLIDE 22

PARTITIONS PARTITIONS

  • 6 producers, 1 topic, asynchronous, 6

consumer threads

–Averager 2.6Gbps (8000 messages)

slide-23
SLIDE 23

PARTITIONS PARTITIONS

  • 6 producers, 1 topic, asynchronous, 6

consumer threads, and 6 brokers

– Averager 1.2Gbps (8000 messages)

slide-24
SLIDE 24
  • Context
  • TACC Wrangler Data Analysis

System

  • Benchmark Setup
  • Tests and Results
  • Conclusions

Conclusions Conclusions

And where to from here

slide-25
SLIDE 25

TARGETTING 10G TARGETTING 10G

  • Apparent optimum for a single node producer on

this hardware:

– ~10MB messages – 3 Producers matching 3 consumers/consumer

trheads

  • More brokers, producers, consumers are

detremental

  • 6.49Gbps < 10Gbps
slide-26
SLIDE 26

AL TERNATIVE AVENUES AL TERNATIVE AVENUES

  • Parallelism -multiple topics if this is tollerable

– (Potential ordering and chunking overheads)

  • In a shared file system environment perhaps the

file pointers rather than files should be moved

– (not suitable in many applications)

  • Nothing to be gained in better hardware

Taget

slide-27
SLIDE 27

HPC PRODUCTION CLUSTER HPC PRODUCTION CLUSTER ENVIRONMENT ENVIRONMENT

  • Pros

– Shared files system – tmpfs – Scale

  • Cons:

– User space installs only – SLURM

  • Idev
  • Job times-out loosing configurations, leaving a mess

– Queuing for time – Loading cost and impremanance of data – Stability of Kafka / Other users interferring - ?

slide-28
SLIDE 28

HPC PRODUCTION CLUSTER HPC PRODUCTION CLUSTER ENVIRONMENT ENVIRONMENT

  • Lessons learned:

– Develop in your destination environment – Flash Storage makes life easy

  • Caveat -it is wiped when your reservation runs
  • uts.

– Lustr…

  • No battle scars – credit to XSEDE wrangler

management team and Kafak builders

slide-29
SLIDE 29

REFERENCES REFERENCES

  • Benchmarking Apache Kafka: 2 Million Writes Per

Second (On Three Cheap Machines)Benchmark

https://engineering.linkedin.com/kafka/benchmarking-apache-kafka- 2-million-writes-second-three-cheap-machines

slide-30
SLIDE 30

ACKNOWLEDGEMENTS ACKNOWLEDGEMENTS

  • NASA Jet Propulsion Laboratory

– Research & Technology Development: “Archiving,

Processing and Dissemination for the Big Data Era”

  • XSEDE

– This work used the Extreme Science and Engineering

Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1053575.

– "XSEDE: Accelerating Scientific Discovery"

  • John Towns, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly

Gaither, Andrew Grimshaw, Victor Hazlewood, Scott Lathrop, Dave Lifka, Gregory D. Peterson, Ralph Roskies, J. Ray Scott, Nancy Wilkins-Diehr, , Computing in Science & Engineering, vol.16, no. 5, pp. 62-74, Sept.-Oct. 2014, doi:10.1109/MCSE.2014.80