better stream processing with python
play

Better Stream Processing with Python Taking the Hipster out of - PowerPoint PPT Presentation

Better Stream Processing with Python Taking the Hipster out of Streaming Andreas Heider, Robert Wall 12.07.2017 EuroPython Who are we? Developers at Winton Winton is a global investment management and data science company, founded in


  1. Better Stream Processing with Python Taking the Hipster out of Streaming Andreas Heider, Robert Wall 12.07.2017 EuroPython

  2. Who are we? • Developers at Winton • Winton is a global investment management and data science company, founded in 1997 • We believe the scientific method can be profitably applied to the field of investing 2

  3. What do we mean by Stream processing? Batch Stream 3

  4. Example: Real Time Financial Market Data Time Symbol Price Qty 10:15:01 AAPL $144 10 10:15:02 10:15:01 Exchange 10:15:02 GOOG $940 5 GOOG AAPL 5 @ $940 10 @ $144 10:15:03 AAPL $145 11 … Trades 4

  5. Stream processing: Binning Binning Process Time Symbol Price Qty Time Symbol Avg. Volume Price 10:15:01 AAPL $144 10 10:15 AAPL $144.5 1300 10:15:02 GOOG $940 5 10:15 GOOG $943 1250 10:15:03 AAPL $145 11 10:16 AAPL $145.3 1450 … … 5

  6. Streaming Data at Winton Monitoring Market Transformations Investment Data Management Event Streams Risk Alternative Event Streams Management Data Internal/ Research Business Events Databases Analytics 6

  7. Apache Kafka Topic Partition 1 Consumer Producer Partition 2 Partition 3 7

  8. Sprawl of Stream Processing systems 8

  9. Kafka Streams • Simple library, not a framework • Event at a time stream processing • Stateful processing, joins and aggregations • Distributed processing and fault tolerance • Part of main Apache Kafka project • Java only so far :( 9

  10. Python at Winton Many users, with different skillsets: • Developers • Researchers • Operations • … 10

  11. Talking to Kafka using kafka-python Hipster Stream Processing 11

  12. Python Kafka Clients https://github.com/dpkp/kafka-python • Pure Python implementation • Friendly, pythonic interface https://github.com/confluentinc/confluent-kafka-python • Wrapper around C library • Amazingly high performance and robustness 12

  13. Experiences using low-level client • What starts out as a 10 line script ends up as yet another homegrown streaming framework • The devil is in the details: Guaranteeing at least once (or even exactly-once processing) • Handling stateful processing • Distributing load over various machines • Microbatching • Handling rebalances nicely • 13

  14. Kafka Streams for Python https://github.com/wintoncode/winton-kafka-streams 14

  15. Demo 15

  16. Goals / Roadmap 1. Clean implementation of Kafka’s core streams API in Python 2. Experiment with more pythonic API/DSL 3. Optimise performance via batching/numpy/Arrow 4. Implement more advanced features of Kafka’s streams API (exactly once, …) 16

  17. Get in touch! • Project on GitHub: https://github.com/wintoncode/winton-kafka-streams • Roadmap: https://github.com/wintoncode/winton-kafka-streams/blob/master/ROADMAP.md • Announcement on kafka-dev • Come to our stand and talk to us • Thanks to Confluent 17

  18. Questions? • Project on GitHub: https://github.com/wintoncode/winton-kafka-streams • Roadmap: https://github.com/wintoncode/winton-kafka-streams/blob/master/ROADMAP.md • Announcement on kafka-dev • Come to our stand and talk to us • Thanks to Confluent 18

  19. Backup 19

  20. Some words of experience • Not everything fits the streaming model • Manually changing data is tricky Be careful what you put in, have recovery method • • Stable deployment can be challenging Especially Zookeeper and buggy clients • • Set up monitoring from the start We use Prometheus and Grafana • https://github.com/yahoo/kafka-manager • 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend