A Whirlwind Overview of Apache Beam Eugene Kirpichov - PowerPoint PPT Presentation

Aug 17, 2023 •475 likes •669 views

A Whirlwind Overview of Apache Beam Eugene Kirpichov <kirpichov@google.com> Staff Software Engineer (2008) FlumeJava High-level API (2016) Apache Beam (2014) Dataflow (2004) MapReduce Unified batch/streaming, Open ecosystem, SELECT

A Whirlwind Overview of Apache Beam Eugene Kirpichov <kirpichov@google.com> Staff Software Engineer
(2008) FlumeJava High-level API (2016) Apache Beam (2014) Dataflow (2004) MapReduce Unified batch/streaming, Open ecosystem, SELECT + GROUPBY Portable Community-driven Vendor-independent (2013) Millwheel Deterministic Streaming Google Cloud Platform 2
Pipeline p = Pipeline.create(options); Read text files PCollection<String> lines = p.apply( TextIO.read().from ( "gs://.../*" )); Split into words PCollection<KV<String, Long>> wordCounts = lines .apply( FlatMapElements.via (word → word.split( "\\W+" ))) .apply( Count.perElement() ); Count wordCounts .apply( MapElements.via ( Format count → count.getKey() + ": " + count.getValue()) .apply( TextIO.write().to ( "gs://.../..." )); Write text files p.run(); Google Cloud Platform 3
Beam PTransforms DoFn ParDo GroupByKey Composite ("map") ("reduce") Google Cloud Platform 4
Pillars of Beam Ecosystem Unified model Portability Google Cloud Platform 5
Unified Model Batch doesn't exist Google Cloud Platform Confidential & Proprietary 6
E T L Grows Evolves Computes updates (Always expect new data) Growing data is temporal ⇒ All data has timestamps ( event-time: t happened ) Google Cloud Platform 7
Dealing with new data ParDo GroupByKey ⇒ Apply to new data ⇒ ? Google Cloud Platform 8
Continuous aggregation Idea: per-key buffering (K, V) (K, V[]) GroupByKey K i , V K i , V[] Group (K, V) (K, V[]) Group Group Google Cloud Platform 9
t in :V t (event time) K i t out :V[] See: Streams and Tables https://www.infoq.com/presentations/beam-model-stream-table=theory Google Cloud Platform 10
Continuous aggregation Idea: temporal windowing 14:03: (k, v) event time K i Element counts toward 1 or more windows T watermark Apply (user-specified) trigger ⇒ closes old windows drop / add to buffer / emit buffer Google Cloud Platform 11
There is no batch / streaming. Only different ways to control aggregation Google Cloud Platform Confidential & Proprietary 12
Portability (vision for 2018) Google Cloud Platform Confidential & Proprietary 13
Code in any . . . supported language (or a mix) Portable pipeline representation . . . Run on any supported runner Google Cloud Platform 14
No vendor lock-in Run any language on any runner No language lock-in Users: Use all transforms from all languages Library authors: Will be usable by all languages Accelerated ecosystem growth New runner / new SDK ⇒ access all Beam libraries Google Cloud Platform 15
Ecosystem Google Cloud Platform Confidential & Proprietary 16
Community . . . User code Powered by Beam Third-party IO SQL Other libs SDKs Language SDKs Portable Unified Model . . . Runners Google Cloud Platform 17
250 contributors 31 committers ( 11 orgs) ~5000 PRs ~12,500 commits 25+ IO connectors 5 stable releases 9 runners Google Cloud Platform 18
Thank you! Google Cloud Platform Confidential & Proprietary 19

Recommend

Introduction to Apache Beam Dan Halperin JB Onofr Google Talend Beam podling PMC Beam

Introduction to Apache Beam Dan Halperin JB Onofr Google Talend Beam podling PMC Beam Champion & PMC Apache Member Apache Beam is a unified programming model designed to provide efficient and portable data processing pipelines What

667 views • 37 slides

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF What Is Apache CXF Production

465 views • 25 slides

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About cziegeler@apache.org @cziegeler RnD Team at Adobe Research Switzerland Member of the Apache So fu ware Foundation Apache Felix and Apache

725 views • 26 slides

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The Apache Way The Apache Way The Apache Way The Apache Way A collaborative slide deck with A collaborative slide deck with A collaborative slide deck

493 views • 45 slides

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian Tzolov Whoami Christian Tzolov Engineer at Pivotal, Big-Data, Hadoop, Spring Cloud Dataflow, Apache Geode, Apache HAWQ, Apache Committer, Apache

796 views • 41 slides

Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau

Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau Software Engineer at Google Apache Beam PMC + Apache Beam Portable data-processing pipelines + Example pipelines Python Java + Cross-language

680 views • 40 slides

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache Crail (crail.apache.org) Apache Crail (crail.apache.org) Ephemeral Data HDFS, Input data S3 Map-reduce job Broadcast Map Shuffle Reduce

393 views • 36 slides

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb serb@apache.org 2016.11.15 @ErbStephan Apache Aurora https://aurora.apache.org Mesos

325 views • 31 slides

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise @atrato_io October 30, 2017, Dagstuhl Seminar Stream Processing with Apache Apex Real-time visualization, Transform / Analytics Data Sources Data

398 views • 22 slides

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's New with Apache Tika? Nick Burch @Gagravarr @Gagravarr Nick Burch @Gagravarr Nick Burch @Gagravarr Nick Burch CTO,

941 views • 65 slides

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng Wang, Intel (huafengw@apache.org) Apache: Big Data Europe 2016 Sevilla, Spain 14 November 2016 Agenda What is Gearpump? Why Apache

854 views • 60 slides

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

9/7/12 Avoiding Vendor Lock-in Using Apache Libcloud [www.tomaz.me] Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud Tomaz Muraus Tomaz Muraus tomaz@apache.org tomaz@apache.org Cloud Open 2012,

609 views • 26 slides

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr Gordon Russell Lecturers: G. Russell This lecture Configuring Apache Mod_rewrite Discussions Configuring Apache Apache

625 views • 48 slides

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker Introduction Introducing Apache Syncope Apache Syncope basics Apache Syncope is an Open Source system for managing digital identities in enterprise

929 views • 33 slides

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation Member uschindler@apache.org http://www.thetaphi.de, http://blog.thetaphi.de @ThetaPh1 SD DataSolutions GmbH , Wtjenstr. 49, 28213 Bremen,

860 views • 51 slides

An Apache Based, Intelligent IoT Stack Trevor Grant PMC Apache Mahout Project PPMC Apache

An Apache Based, Intelligent IoT Stack Trevor Grant PMC Apache Mahout Project PPMC Apache Streams-Incubator Open Source Evangelist, IBM @rawkintrevo About Me rawkintrevo@apache.org Trevor Grant http://rawkintrevo.org Huge shout out to Joe

564 views • 43 slides

CS 360 Programming Languages Day 15 Delayed Evaluation & Streams The truth comes out!

CS 360 Programming Languages Day 15 Delayed Evaluation & Streams The truth comes out! Everything that looks like a function call in Racket is not necessarily a function call. Everything that looks like a function call is

464 views • 19 slides

Programming Language A programming language is a translator between human and machine.

Programming Language A programming language is a translator between human and machine. Machine language is a set of codes each consists of a binary string usually of length multiple of 8 (00000101,1000000,..) Since it is

143 views • 10 slides

Programming 1 Lecture 1 COP 3014 Summer 2019 May 15, 2019 Programming I - Course Information

Programming 1 Lecture 1 COP 3014 Summer 2019 May 15, 2019 Programming I - Course Information Instructor: Sharanya Jayaraman Teaching Faculty, PhD Candidate in Computer Science Research Interests: High Performance Computing, Numerical

955 views • 32 slides

Indefinite Loops Date and time of exam Exam location while statements Format of exam

As you arrive: 1. Start up your computer and plug it in Plus in-class time 2. Check out todays project: working on these concepts AND practicing previous concepts, continued Session11_WhileLoops as homework. Exam 1 preview Indefinite

849 views • 12 slides

GRAD SEC A WHIRLWIND TOUR CMSC 818O AUG 31 2017 TODAYS PAPERS THE SECURITY MINDSET To

GRAD SEC A WHIRLWIND TOUR CMSC 818O AUG 31 2017 TODAYS PAPERS THE SECURITY MINDSET To anticipate attackers we must be able to think like attackers + = Proof of ownership Uniquely identifiable liquid What would an attacker do?

633 views • 37 slides

CONNECT Deeper Friendships! Deeper Faith! Where Is God In All This? Why F y Frien iendship

CONNECT Deeper Friendships! Deeper Faith! Where Is God In All This? Why F y Frien iendship is hip is Challen Challenging ing A suffering Job longed for his day in court. Transience Job 13:3 & 22, NIV But I desire to speak to the

316 views • 4 slides

The partonic structure of protons and nuclei: from current facilities to the EIC Alberto Accardi

The partonic structure of protons and nuclei: from current facilities to the EIC Alberto Accardi Hampton U. and Jefferson Lab Frontiers in Nuclear and Hadronic Physics Galileo Galilei Institute, Florence, Italy 20-24 February 2017 Plan

731 views • 54 slides

Enhancements to ACL2 in Versions 5.0, 6.0, and 6.1 Matt Kaufmann J Strother Moore The

O VERVIEW L ICENSING AND DISTRIBUTION CHANGES A QUICK TOUR Enhancements to ACL2 in Versions 5.0, 6.0, and 6.1 Matt Kaufmann J Strother Moore The University of Texas at Austin May 31, 2013 1/13 O VERVIEW L ICENSING AND DISTRIBUTION CHANGES A

482 views • 35 slides

A Whirlwind Overview of Apache Beam Eugene Kirpichov - PowerPoint PPT Presentation

A Whirlwind Overview of Apache Beam Eugene Kirpichov <kirpichov@google.com> Staff Software Engineer (2008) FlumeJava High-level API (2016) Apache Beam (2014) Dataflow (2004) MapReduce Unified batch/streaming, Open ecosystem, SELECT

Introduction to Apache Beam Dan Halperin JB Onofr Google Talend Beam podling PMC Beam

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation

An Apache Based, Intelligent IoT Stack Trevor Grant PMC Apache Mahout Project PPMC Apache

CS 360 Programming Languages Day 15 Delayed Evaluation & Streams The truth comes out!

Programming Language A programming language is a translator between human and machine.

Programming 1 Lecture 1 COP 3014 Summer 2019 May 15, 2019 Programming I - Course Information

Indefinite Loops Date and time of exam Exam location while statements Format of exam

GRAD SEC A WHIRLWIND TOUR CMSC 818O AUG 31 2017 TODAYS PAPERS THE SECURITY MINDSET To

CONNECT Deeper Friendships! Deeper Faith! Where Is God In All This? Why F y Frien iendship

The partonic structure of protons and nuclei: from current facilities to the EIC Alberto Accardi

Enhancements to ACL2 in Versions 5.0, 6.0, and 6.1 Matt Kaufmann J Strother Moore The

Sambuz

Useful Links

Newsletter

Mail Us

A Whirlwind Overview of Apache Beam Eugene Kirpichov - PowerPoint PPT Presentation

A Whirlwind Overview of Apache Beam Eugene Kirpichov <kirpichov@google.com> Staff Software Engineer (2008) FlumeJava High-level API (2016) Apache Beam (2014) Dataflow (2004) MapReduce Unified batch/streaming, Open ecosystem, SELECT

Introduction to Apache Beam Dan Halperin JB Onofr Google Talend Beam podling PMC Beam

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Simplifying ML Workflows with Apache Beam &amp; TensorFlow Extended Tyler Akidau @takidau

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Multi-tenant Machine Learning Apache Aurora &amp; Apache Mesos Stephan Erb

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC &amp; Apache Software Foundation

An Apache Based, Intelligent IoT Stack Trevor Grant PMC Apache Mahout Project PPMC Apache

CS 360 Programming Languages Day 15 Delayed Evaluation &amp; Streams The truth comes out!

Programming Language A programming language is a translator between human and machine.

Programming 1 Lecture 1 COP 3014 Summer 2019 May 15, 2019 Programming I - Course Information

Indefinite Loops Date and time of exam Exam location while statements Format of exam

GRAD SEC A WHIRLWIND TOUR CMSC 818O AUG 31 2017 TODAYS PAPERS THE SECURITY MINDSET To

CONNECT Deeper Friendships! Deeper Faith! Where Is God In All This? Why F y Frien iendship

The partonic structure of protons and nuclei: from current facilities to the EIC Alberto Accardi

Enhancements to ACL2 in Versions 5.0, 6.0, and 6.1 Matt Kaufmann J Strother Moore The

Sambuz

Useful Links

Newsletter

Mail Us

Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation

CS 360 Programming Languages Day 15 Delayed Evaluation & Streams The truth comes out!