Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product - PowerPoint PPT Presentation

Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product Manager, Pentaho

Agenda – Worker Nodes Hear about new upcoming capabilities for scaling out the Pentaho platform in large enterprise operations. This will cover 8.0 and roadmap topics. • Worker Nodes: Overview and Business Benefits • How is this different from AEL / Hadoop MapReduce • Typical Customer Scenarios • Architecture & Capabilities including Monitoring & Logging • Improvements in Related Areas • Demonstration • Availability & Roadmap

Worker Nodes – Overview • Worker Nodes can scale work items across multiple nodes (containers) like: Worker Node (a) – PDI jobs and transformations (in 8.0) – Report executions (not in 8.0) Worker Node (b) Distribute and Scale – […] Worker Node (c…) • It operates easily and securely across an elastic architecture, which adds additional machine resources as they are required for processing • Worker Nodes can operate on premise or in the cloud • Uses Popular technologies under the hood such as Docker (Container Platform), Chronos (Scheduler) and Mesos/Marathon (Container Orchestration)

Worker Nodes – Business Benefits Large enterprises need the ability to seamlessly and efficiently spin up resources to handle 100s+ work items at different times, with different dependencies and processing requirements. Worker Nodes addresses these needs and delivers: • Faster time to value and reduced TCO because it enables customers to deploy their own scale-out processes without required services • Manage changing workloads more efficiently by spinning resources up and down as needed • Increased business agility thanks to containerization – which enables portability of processes across on-prem and cloud environments without the need to re-engineer them. – Even in pure on-prem, WN provides elasticity and resource optimization.

How Is This Different from AEL / Hadoop MapReduce? AEL / Hadoop Map Reduce (simplified): • Data is distributed across nodes SCALE OUT ON DATA • The processing takes place at the node level • Helps in scale out data volume Worker Nodes (simplified): • Work Items like PDI Jobs, PDI Transformations get distributed across nodes – this is about the SCALE OUT ON PROCESSES processing and orchestration (in contrast to (WORK ITEMS) distributing data) • Helps in scale out Pentaho processes These two architectures can also be combined: Within a Worker Node, a PDI transformation can also scale out with AEL or Map Reduce

Typical Customer Scenarios Customer Type Typical Number of Work Items Scale-Out Need Small Up to 10 No Medium 10 through 100 Sometimes Enterprise with one department +/- 100 Yes Enterprise with multiple departments Hundreds or thousands Yes

Typical Customer Examples – SLA’s and Time Windows • Need to meet customer SLA’s – Data from hundreds of sources need to get collected and aggregated – This is done by hundreds of PDI jobs and transformations – All these jobs and transformations need to be finished within a defined time window (for example between 5am and 7am) so that the data is available and accurate for the target audience • Worker Nodes provides the technology to run processes in parallel and scale out when needed, for example at peak times (end of month)

Typical Customer Examples – Shared Services Example of one project: • 800 daily batches from different departments in an enterprise • One server with 120GB memory and many CPUs • This machine hosts lots of VM in parallel Issue: When there is too much workload, one machine is not enough • Worker Nodes solves this in scaling out on a cluster

Typical Customer Examples – Scalable on Demand • Need to support growing data volumes and customer requirements • Worker Nodes provides a flexible and scalable architecture on-promise or in the cloud for growing demand • This is seamless and does not need to change the underlying architecture BASE TIMES PEAK TIMES Worker Node (1) Worker Node (1) Worker Node (2) Worker Node (2) Distribute and Scale Worker Node (3) Distribute and Scale Worker Node (3) Worker Node (4) Worker Node (5)

Worker Nodes – New in 8.0 • Containerized scale-out WORKER NODES Orchestration Framework • Pentaho PDI “work items” Orchestration (Scheduler, monitoring, security, etc.) Controller Master (Working) Master Master Pentaho Clients (Standby) (Standby) Container Framework Pentaho Server WN 1 WN 2 WN …n e.g. KJB e.g. KTR “Executor” Pentaho Repository

Worker Nodes Capabilities • Deploy consistently in physical, virtual, and cloud environments Adapts to customer needs (bare-metal vs. virtualization vs. Cloud) and no need to modify the product when the strategy changes • Scale and load balance services This helps to deal with peaks and limited time-windows, allocate the resources that are needed. • Hybrid deployments can be used to distribute load Even when the on-premise resources are not sufficient, scaling out into the Cloud is possible to provide more resources.

Monitoring and Logging

Monitoring – Overview

Monitoring – Worker Node Example

Improvements in Related Areas Open and Save Dialogs

Pain Point: Save a New Job/Transformation • Whenever you save a new transformation/job into the repository, the default folder is set to the user’s home folder. In previous versions: The user will need to change the folder for every time they save a new transformation or job.

New Save Dialog in 8.0 – Overview • Remembers the last opened folder! • Just enter the filename! (and/or change the folder) • Similar to the Open Dialog with additional functionality (see next slide).

New Open Dialog in 8.0 – Overview Search Recents Open shows the last opened folder. This is a big time saver!

Improvements in Related Areas Run Configurations

Pain Point: Remote Pentaho Server Execution before 8.0 To execute on the Pentaho Server before 8.0, you need to define a Slave server and give the credentials. Then execute on the selected Server.

Execute on the Pentaho Server • By selecting the Pentaho server option, you do not need to define a Slave server anymore when you want to execute remotely. • Behind the scenes, this option executes the transformation or job via the Scheduler. This is the same as you would do a “Schedule Now.” This new functionality improves the ease of use, also for Worker Nodes

Run Configurations within Job Entries • Run Configuration can be used in the Run dialog and also in the job entries that could execute jobs or transformations remotely and on Worker Nodes 7.1 Example 8.0

Demonstration

Availability and Roadmap

Availability • Worker Nodes is EE only • Initially, 8.0 Worker Nodes will be Limited Availability – Fully supported, production deployment – Distribution to a limited number of customers • Requires additional download and implementation services

Roadmap • Pentaho Server & Repository as a Service including High Availability Container Framework Pentaho Server WN 1 WN 2 WN …n “Executor” e.g. KJB e.g. KTR Pentaho Repository • Improved Monitoring and Logging • Extend to other Pentaho work items such as Reports • Integrated with other Hitachi Vantara Services and Products

Summary What we covered today: • The upcoming capabilities for scaling out the Pentaho platform and when to use them • How to use the new way of scaling out work items (Pentaho processes such as PDI jobs and transformations) across multiple nodes

Next Steps Want to learn more? • Meet-the-Expert: – Pedro Teixera • Other recommended breakout sessions: – Matt Howard: Pentaho 8.0 and Roadmap – Rakesh Saha and Jens Bleuel: Roadmap: Processing Big Data – Matt Casters: PDI Best Architecture Practices – Steve Szabo: PDI Sizing Overview and Case Study – Jonathan Jarvis: Understanding Parallelism with PDI and Adaptive Execution with Spark – Mark Burnett: Understanding the Big Data Technology Ecosystem

Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product - PowerPoint PPT Presentation

Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product Manager, Pentaho Agenda Worker Nodes Hear about new upcoming capabilities for scaling out the Pentaho platform in large enterprise operations. This will cover 8.0 and roadmap

Pentaho Business Analytics Evolves Pedro Alves Pentaho SVP Community & Product Designer,

Monitoring and Analyzing London's Air Quality with Pentaho Mark Semenenko, Pentaho Sales Engineer,

Install/Update to Pentaho 8.0 From Hitachi Vantara Steven Brown Pentaho Manager, Enterprise

Automated Machine Learning (AutoML) and Pentaho Caio Moreno de Souza Pentaho Senior Consultant,

Integrating New Visualizations with Pentaho Using the Viz API Nick Keune, Pentaho Embedded &

Leverage the Power of Pentaho Visualizations Within Your Application Andrew Grohe Pentaho

Processing Big Data with Pentaho Rakesh Saha Pentaho Senior Product Manager, Hitachi Vantara

Pentaho 8.0 and Beyond Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

Pentaho Data Integration Best Architecture Practices Matt Casters Pentaho Chief Architect of

Data Science 101 Arik Pelkey Pentaho Senior Director Product Marketing, Hitachi Vantara

Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Best Practices for Choosing Content Reporting Tools and Datasources Andrew Grohe Pentaho

Design Patterns Leveraging Spark in PDI Chris Skirde Pentaho Director of Sales Engineering,

Pentaho World 2017 Agenda 1. Safehub by incentro a. Who b. What c. Why d. How 2. Some

ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap

FIFE Roadmap Workshop Mike Kirby FIFE Roadmap Workshop Dec 5, 2017 FIFE Roadmap Workshop The

Chemspace KNIME nodes Chemspace Search Chemspace KNIME nodes Chemspace Search and Chemspace

Brought to you by coloradosprings.js Presented by Brian Parks Who am I? Brian Parks

Introduction to Node.js Andrew Lively Co-op Student CECH IT Solutions Center What is Node.js

Leadership 10% of NZ and Australian hospitalized patients experienced Managing difficult an

Permutation Routing over Sparse Networks Presented by Nithish Kumar and Shubhang Kulkarni

Exploiting the Spatial Dimension Akshay Jajoo Rohan Gandhi Y. Charlie Hu Chengkok-Koh 1

MAPS GPRS GB INTERFACE EMULATOR GPRS Gb Interface Emulation over IP 818 West Diamond Avenue -

ON O N DRUPAL SENTATION Prepared pared By, JUGA GAL ME MEHTA (123059 5901 010) &

Sambuz

Useful Links

Newsletter

Mail Us

Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product - PowerPoint PPT Presentation

Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product Manager, Pentaho Agenda Worker Nodes Hear about new upcoming capabilities for scaling out the Pentaho platform in large enterprise operations. This will cover 8.0 and roadmap

Pentaho Business Analytics Evolves Pedro Alves Pentaho SVP Community &amp; Product Designer,

Monitoring and Analyzing London's Air Quality with Pentaho Mark Semenenko, Pentaho Sales Engineer,

Install/Update to Pentaho 8.0 From Hitachi Vantara Steven Brown Pentaho Manager, Enterprise

Automated Machine Learning (AutoML) and Pentaho Caio Moreno de Souza Pentaho Senior Consultant,

Integrating New Visualizations with Pentaho Using the Viz API Nick Keune, Pentaho Embedded &amp;

Leverage the Power of Pentaho Visualizations Within Your Application Andrew Grohe Pentaho

Processing Big Data with Pentaho Rakesh Saha Pentaho Senior Product Manager, Hitachi Vantara

Pentaho 8.0 and Beyond Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

Pentaho Data Integration Best Architecture Practices Matt Casters Pentaho Chief Architect of

Data Science 101 Arik Pelkey Pentaho Senior Director Product Marketing, Hitachi Vantara

Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Best Practices for Choosing Content Reporting Tools and Datasources Andrew Grohe Pentaho

Design Patterns Leveraging Spark in PDI Chris Skirde Pentaho Director of Sales Engineering,

Pentaho World 2017 Agenda 1. Safehub by incentro a. Who b. What c. Why d. How 2. Some

ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap

FIFE Roadmap Workshop Mike Kirby FIFE Roadmap Workshop Dec 5, 2017 FIFE Roadmap Workshop The

Chemspace KNIME nodes Chemspace Search Chemspace KNIME nodes Chemspace Search and Chemspace

Brought to you by coloradosprings.js Presented by Brian Parks Who am I? Brian Parks

Introduction to Node.js Andrew Lively Co-op Student CECH IT Solutions Center What is Node.js

Leadership 10% of NZ and Australian hospitalized patients experienced Managing difficult an

Permutation Routing over Sparse Networks Presented by Nithish Kumar and Shubhang Kulkarni

Exploiting the Spatial Dimension Akshay Jajoo Rohan Gandhi Y. Charlie Hu Chengkok-Koh 1

MAPS GPRS GB INTERFACE EMULATOR GPRS Gb Interface Emulation over IP 818 West Diamond Avenue -

ON O N DRUPAL SENTATION Prepared pared By, JUGA GAL ME MEHTA (123059 5901 010) &amp;

Sambuz

Useful Links

Newsletter

Mail Us

Pentaho Business Analytics Evolves Pedro Alves Pentaho SVP Community & Product Designer,

Integrating New Visualizations with Pentaho Using the Viz API Nick Keune, Pentaho Embedded &

ON O N DRUPAL SENTATION Prepared pared By, JUGA GAL ME MEHTA (123059 5901 010) &