Better TV & Broadband with Kafka & Spark Phill Radley - PowerPoint PPT Presentation

Nov 02, 2023 •119 likes •290 views

Better TV & Broadband with Kafka & Spark Phill Radley Chief Data Architect British Telecommunications plc In the beginning ( 2012 ) Hadoop HaaS Hadoop - Admin as a Service Admin Group Early adoption Spark will replace

Better TV & Broadband with Kafka & Spark Phill Radley Chief Data Architect British Telecommunications plc
In the beginning ( 2012 )
Hadoop HaaS Hadoop - Admin as a Service Admin Group
Early adoption
“Spark will replace map/reduce as the standard execution for Hadoop” Doug Cutting – Sep 2015
HaaS 2.0 Denser Nodes doubled #cores trebled RAM Same node count 
Cluder migration
TV Set Top Box Broadband Home Hub
TV & BB Data Pipeline Overview YARN Cluster big Kafka Broker XML Spark Gateway payload Kafka raw consumer Producer Enrich Atomic Aggregate Firewall metrics every rich Producer flume HDFS HIVE Impala ESB Tables CRM HAAS enrichment data
Data Ingest Kafka - Raw topic
Data Serving – Impala Concurrency
Schema Design … on read … DEVOPS approach  Flat (De-Normalised) Tables, table per query  Queried with SELECT * FROM …. WHERE …  Table Dimensions ( rows & columns )  Table File formats optimised for table query pattern ( up to 10 x difference ) 1. AVRO for tables being queried row oriented queries 2. Parquet – default time series 3. Parquet with snappy compression for deep time queries
Impala Tuning… - There’s lots of options, the default will not be good enough - ( it’s not as mature as an Oracle DB ; -) - Isolate operational tenant loads with their own Dedicated Impala Resource Pool - “Dedicated SQL Queue” added to platform service portfolio - Chargeable platform feature ( as its dedicated resource ) - Tuning Impala Daemons - Query Executor & Scanner Threads for MAX concurrency, shortest que - HDFS Caching - Currently in test, expecting a 2-5x speed up, more importantly eliminates unnecessary physical I/O ( these are hot tables keep them in memory )
Conclusions after months in production….  Spark 1.6 very stable  Impala requires a lot of tuning & table design to get working  High demand to use the data for other customer experience work  This solution runs on a multi-tenant cluster running hundreds of batch loads, and dozens of ad-hoc self-service analytics and data science users - i.e. the isolation using cgroups seems to work ( mostly )  Next Steps - Another similar data pipeline from internal nework - Multi-tenant Kafka ( Topic as a Service ) to service more clients - Second Data centre Site with dual ingest for high availability
Thank you 

Recommend

Day 4 Lab1: Docker container for Kafka - Spark streaming - Cassandra This Dockerfile sets up

Day 4 Lab1: Docker container for Kafka - Spark streaming - Cassandra This Dockerfile sets up a complete streaming environment for experimenting with Kafka, Spark streaming (PySpark), and Cassandra. It installs Kafka 0.10.2.1 Spark 2.1.1

156 views • 4 slides

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark Streaming and Spark SQL Explored Streaming API of Apache Spark on Ukko Cluster Window based Stream Content Direct Stream content

221 views • 9 slides

FROM HTTP TO KAFKA-BASED FROM HTTP TO KAFKA-BASED MICROSERVICES MICROSERVICES Wojciech Rzsa,

7/17/2019 From HTTP to Kafka-based microservices FROM HTTP TO KAFKA-BASED FROM HTTP TO KAFKA-BASED MICROSERVICES MICROSERVICES Wojciech Rzsa, FLYR Poland @wrzasa localhost:4567/index.html?print-pdf#/ 1/83 From HTTP to Kafka-based

1.15k views • 83 slides

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust - @michaelarmbrust What is Apache Spark? Fast and general cluster computing system, interoperable with Hadoop, included in all major distros

667 views • 43 slides

Day 3 Lab1: Spark Streaming with Kafka Example Introductions In this example, we will write a

Day 3 Lab1: Spark Streaming with Kafka Example Introductions In this example, we will write a Spark Streaming program that consumes messages from Kafka. We will reuse the whole setup from the previous lab, so this lab is best done as a

233 views • 4 slides

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is SPARK? A sub-language of Ada 83 and 95 with particular properties that make it ideally suited to the most critical of applications: completely

851 views • 10 slides

Kafka Needs No Keeper Colin McCabe 2 Introduction Kafka has gotten its mileage out of

1 Kafka Needs No Keeper Colin McCabe 2 Introduction Kafka has gotten its mileage out of Zookeeper But it is still a second system KIP-500 has been adopted by the community This is not a 1-1 replacement Weve been

1.92k views • 124 slides

READING KAFKA IN QATAR Qatar-TESOL Conference, April 2011 Magdalena Rostron Academic Bridge

READING KAFKA IN QATAR Qatar-TESOL Conference, April 2011 Magdalena Rostron Academic Bridge Program Qatar Foundation, Doha, Qatar mrostron@qf.org.qa Introduction my story Reading Kafka In Qatar Reading Kafka in Qatar

562 views • 10 slides

Apache Kafka Real-Time Data Pipelines http://kafka.apache.org/ Joe Stein Developer,

Apache Kafka Real-Time Data Pipelines http://kafka.apache.org/ Joe Stein Developer, Architect & Technologist Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly Big Data Open Source

637 views • 40 slides

Kafka in Jail Running Kafka in container orchestrated clusters Sean Glover, Lightbend @seg1o

Kafka in Jail Running Kafka in container orchestrated clusters Sean Glover, Lightbend @seg1o Who am I? Im Sean Glover Senior Software Engineer at Lightbend Member of the Fast Data Platform team Contributor to various projects in

789 views • 60 slides

High throughput High throughput kafka for science kafka for science Testing Kafkas limits

High throughput High throughput kafka for science kafka for science Testing Kafkas limits for science J Wyngaard, PhD wyngaard@jpl.nasa.gov UTLINE O UTLINE O Streaming Science Data Benchmark Context Tests and Results

473 views • 30 slides

Introduction to Kafka Instructor: Ekpe Okorafor 1. Big Data Academy - Accenture 2. Computer

Introduction to Kafka Instructor: Ekpe Okorafor 1. Big Data Academy - Accenture 2. Computer Science - African University of Science & Technology Agenda Introduction - Messaging Basics Kafka Architecture Kafka

558 views • 26 slides

Apache Kafka + Apache Mesos Highly Scalable Streaming Microservices with Kafka Streams Kai

Apache Kafka + Apache Mesos Highly Scalable Streaming Microservices with Kafka Streams Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.kai-waehner.de Confidential 1 Abstract Microservices establish many

978 views • 44 slides

Broadband Mobile Communications Broadband Mobile Communications Broadband Mobile Communications

ITU/MIC New Initiatives Workshop ITU/MIC New Initiatives Workshop ITU/MIC New Initiatives Workshop ITU/MIC New Initiatives Workshop Broadband Mobile Communications Broadband Mobile Communications Broadband Mobile Communications Broadband

490 views • 24 slides

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

Nov / 14 / 16 Nick Pentreath Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning Author of Machine Learning

668 views • 53 slides

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark Architecture From MX to Spark MX Rich, styleable components Heavy components => Easy to use (most of the time) Spark introduces

502 views • 30 slides

S7672 OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open CFD Solver Goal Adapt the

David Gutzwiller, NUMECA USA (david.gutzwiller@numeca.com) Dr. Ravi Srinivasan, Dresser-Rand Alain Demeulenaere, NUMECA USA 5/9/2017 GTC 2017 S7672 OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open CFD Solver Goal Adapt the

1.28k views • 18 slides

NAV 08: WE ARE HERE THE DREAM THE REALITY ? 2006 BUILT BULK CARRIER! JOHN

NAV 08: WE ARE HERE THE DREAM THE REALITY ? 2006 BUILT BULK CARRIER! JOHN CLANDILLON-BAKER FNI CLASS 1 MASTER MARINER AND NOW A SENIOR CLASS 1 PILOT PILOTING SINCE 1983 LAST 19 YEARS WITH: PORT OF LONDON AUTHORITY

867 views • 45 slides

2017 Canadian Junior Championships TECHNICAL MEETING July 25, 2017 TPASC - Toronto INTRODUCTION

2017 Canadian Junior Championships TECHNICAL MEETING July 25, 2017 TPASC - Toronto INTRODUCTION Organizing Committee Meet Manager: Andrea Pittis Swimming Canada National Meet Director: Nicole Parent Swimming Canada National Meet Referee:

587 views • 30 slides

Sagittarius A* and Low Luminosity Accreting Sources EWAS 2017, 26-30 June 2017 * Prague, Czech

Sagittarius A* and Low Luminosity Accreting Sources EWAS 2017, 26-30 June 2017 * Prague, Czech Republic; No. 1387 S12f Accretion Blasck holes at their extremes Andreas Eckart I.Physikalisches Institut der Universitt zu Kln

599 views • 55 slides

H120 Investor Presentation February 2020 Agenda Raiz Overview & H120 Highlights 1 3

A mobile-led financial services platform H120 Investor Presentation February 2020 Agenda Raiz Overview & H120 Highlights 1 3 2 Strategy 11 Raiz Invest Australia 13 Asia Expansion 19 FY20 Outlook 4 22 5 Appendix 24 2

581 views • 29 slides

Advances and the future of grading structural timber Prof. Charlotte Bengtsson SP Trtek

The Future of Quality Control for Wood & Wood Products, 4-7th May 2010, Edinburgh The Final Conference of COST Action E53 Advances and the future of grading structural timber Prof. Charlotte Bengtsson SP Trtek Linnaeus University

609 views • 17 slides

with Embedded Piezoelectric Sensors Kirsten P. Duffy University of Toledo / NASA GRC Bradley A.

https://ntrs.nasa.gov/search.jsp?R=20150010341 2017-09-14T05:42:59+00:00Z National Aeronautics and Space Administration Mechanical and Vibration Testing of Carbon Fiber Composite Material with Embedded Piezoelectric Sensors Kirsten P. Duffy

563 views • 17 slides

Coom Green Energy Park Technical Presentation on Noise Presentation Overview Project Overview

Coom Green Energy Park Technical Presentation on Noise Presentation Overview Project Overview Environmental Noise Assessment Procedure Background Noise Surveys Noise Limits and Guidelines Noise Predictions and Potential Impacts

811 views • 34 slides

Better TV & Broadband with Kafka & Spark Phill Radley - PowerPoint PPT Presentation

Better TV & Broadband with Kafka & Spark Phill Radley Chief Data Architect British Telecommunications plc In the beginning ( 2012 ) Hadoop HaaS Hadoop - Admin as a Service Admin Group Early adoption Spark will replace

Day 4 Lab1: Docker container for Kafka - Spark streaming - Cassandra This Dockerfile sets up

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

FROM HTTP TO KAFKA-BASED FROM HTTP TO KAFKA-BASED MICROSERVICES MICROSERVICES Wojciech Rzsa,

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Day 3 Lab1: Spark Streaming with Kafka Example Introductions In this example, we will write a

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

Kafka Needs No Keeper Colin McCabe 2 Introduction Kafka has gotten its mileage out of

READING KAFKA IN QATAR Qatar-TESOL Conference, April 2011 Magdalena Rostron Academic Bridge

Apache Kafka Real-Time Data Pipelines http://kafka.apache.org/ Joe Stein Developer,

Kafka in Jail Running Kafka in container orchestrated clusters Sean Glover, Lightbend @seg1o

High throughput High throughput kafka for science kafka for science Testing Kafkas limits

Introduction to Kafka Instructor: Ekpe Okorafor 1. Big Data Academy - Accenture 2. Computer

Apache Kafka + Apache Mesos Highly Scalable Streaming Microservices with Kafka Streams Kai

Broadband Mobile Communications Broadband Mobile Communications Broadband Mobile Communications

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark

S7672 OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open CFD Solver Goal Adapt the

NAV 08: WE ARE HERE THE DREAM THE REALITY ? 2006 BUILT BULK CARRIER! JOHN

2017 Canadian Junior Championships TECHNICAL MEETING July 25, 2017 TPASC - Toronto INTRODUCTION

Sagittarius A* and Low Luminosity Accreting Sources EWAS 2017, 26-30 June 2017 * Prague, Czech

H120 Investor Presentation February 2020 Agenda Raiz Overview & H120 Highlights 1 3

Advances and the future of grading structural timber Prof. Charlotte Bengtsson SP Trtek

with Embedded Piezoelectric Sensors Kirsten P. Duffy University of Toledo / NASA GRC Bradley A.

Coom Green Energy Park Technical Presentation on Noise Presentation Overview Project Overview

Sambuz

Useful Links

Newsletter

Mail Us

Better TV & Broadband with Kafka & Spark Phill Radley - PowerPoint PPT Presentation

Better TV & Broadband with Kafka & Spark Phill Radley Chief Data Architect British Telecommunications plc In the beginning ( 2012 ) Hadoop HaaS Hadoop - Admin as a Service Admin Group Early adoption Spark will replace

Day 4 Lab1: Docker container for Kafka - Spark streaming - Cassandra This Dockerfile sets up

Spark Code Camp Discover Spark Streaming &amp; Spark SQL Project Overview Focus on Spark

FROM HTTP TO KAFKA-BASED FROM HTTP TO KAFKA-BASED MICROSERVICES MICROSERVICES Wojciech Rzsa,

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Day 3 Lab1: Spark Streaming with Kafka Example Introductions In this example, we will write a

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

Kafka Needs No Keeper Colin McCabe 2 Introduction Kafka has gotten its mileage out of

READING KAFKA IN QATAR Qatar-TESOL Conference, April 2011 Magdalena Rostron Academic Bridge

Apache Kafka Real-Time Data Pipelines http://kafka.apache.org/ Joe Stein Developer,

Kafka in Jail Running Kafka in container orchestrated clusters Sean Glover, Lightbend @seg1o

High throughput High throughput kafka for science kafka for science Testing Kafkas limits

Introduction to Kafka Instructor: Ekpe Okorafor 1. Big Data Academy - Accenture 2. Computer

Apache Kafka + Apache Mesos Highly Scalable Streaming Microservices with Kafka Streams Kai

Broadband Mobile Communications Broadband Mobile Communications Broadband Mobile Communications

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark

S7672 OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open CFD Solver Goal Adapt the

NAV 08: WE ARE HERE THE DREAM THE REALITY ? 2006 BUILT BULK CARRIER! JOHN

2017 Canadian Junior Championships TECHNICAL MEETING July 25, 2017 TPASC - Toronto INTRODUCTION

Sagittarius A* and Low Luminosity Accreting Sources EWAS 2017, 26-30 June 2017 * Prague, Czech

H120 Investor Presentation February 2020 Agenda Raiz Overview &amp; H120 Highlights 1 3

Advances and the future of grading structural timber Prof. Charlotte Bengtsson SP Trtek

with Embedded Piezoelectric Sensors Kirsten P. Duffy University of Toledo / NASA GRC Bradley A.

Coom Green Energy Park Technical Presentation on Noise Presentation Overview Project Overview

Sambuz

Useful Links

Newsletter

Mail Us

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

H120 Investor Presentation February 2020 Agenda Raiz Overview & H120 Highlights 1 3