Pentaho 8.0 and Beyond Matt Howard Pentaho Sr. Director of Product - - PowerPoint PPT Presentation

pentaho 8 0 and beyond
SMART_READER_LITE
LIVE PREVIEW

Pentaho 8.0 and Beyond Matt Howard Pentaho Sr. Director of Product - - PowerPoint PPT Presentation

Pentaho 8.0 and Beyond Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara Safe Harbor Statement The forward-looking statements contained in this document represent an outline of our current intended product direction. It is


slide-1
SLIDE 1

Pentaho 8.0 and Beyond

Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

slide-2
SLIDE 2

The forward-looking statements contained in this document represent an outline of our current intended product direction. It is provided for information purposes only and is not a commitment to deliver any new or enhanced product or functionality, or that we will pursue the product direction described. Facts and circumstances may occur which may impact current plans, resulting in changes to the information in this presentation. This information is current only as of the date it is made and should not be relied upon in making purchasing

  • decisions. The development, release (if at all), and timing of any features or functionality

described for the Pentaho products remains at the sole discretion of Pentaho.

Safe Harbor Statement

slide-3
SLIDE 3

Pentaho 8.0 and Beyond

1

Product Vision

2

Pentaho 8.0

3

Product Roadmap

slide-4
SLIDE 4

Product Vision

slide-5
SLIDE 5

HITACHI DATA SYSTEMS > Content platform > Storage solutions

The Power of Three

PENTAHO > Data Integration > Business Analytics HITACHI INSIGHT GROUP > Lumada IoT

slide-6
SLIDE 6

Operational Data Big Data Data Stream Public/Private Clouds Consumer Business Analyst Data Analyst / Data Scientist Data Engineer

Custom and Self-Service Dashboards Interactive Query and Analysis

Pentaho Data Integration

Data Preparation | Integrated Machine learning

O P E N A N D E M B E D DA B L E

Production Reporting

Operational Data Big Data Data Stream Public/Private Clouds Consumer Business Analyst Data Analyst / Data Scientist Data Engineer

Custom and Self-Service Dashboards Interactive Query and Analysis Production Reporting

Pentaho Data Integration

Data Preparation | Integrated Machine Learning

Pentaho Business Analytics Platform

O P E N A N D E M B E D DA B L E O P E N A N D E M B E D DA B L E O P E N A N D E M B E D DA B L E

slide-7
SLIDE 7

Future Vision: A Single Consistent Experience

Data Prep Data Engineering Analytics

Ingestion Processing Blending Data Delivery Data Discovery / Analysis Analysis & Dashboards

Administration Security Lifecycle Management Data Provenance Dynamic Data Pipeline Monitoring Automation

slide-8
SLIDE 8

Pentaho 8.0

slide-9
SLIDE 9

Introducing Pentaho 8.0

Challenge #1 Data volumes and velocity are growing exponentially Challenge #2 Processing and storage resources are constrained Challenge #3 Shortage of Big Data talent and lack of productivity Pentaho 8.0 Broadens connectivity to streaming data sources

  • Connect to Kafka streams
  • Stream processing with Spark
  • Big data security with Knox

Pentaho 8.0 Optimizes processing resources

  • Enhanced Adaptive Execution (AEL)
  • Native Avro and Parquet handling
  • Worker nodes for “Scale-out”

Pentaho 8.0 Boosts team productivity across the pipeline

  • Data explorer filters
  • Improved repository UX
  • Extended operations mart
slide-10
SLIDE 10

Streaming for Time Sensitive Insight

Enable use cases that require real-time processing, monitoring and aggregation

  • Real-time device monitoring
  • Log-file aggregation
  • Notifications
  • And more…

NEW in Pentaho 8.0

ü Kafka Producer Step ü Kafka Consumer Step ü Get records from stream Step ü Spark streaming via AEL

slide-11
SLIDE 11

Pentaho 7.1 – Adaptive Execution for Spark

ü No Coding ü Build Once ü Execute on Any* Engine

PDI Pentaho Kettle

*Currently Available Engines

slide-12
SLIDE 12

Enhanced Adaptive Execution

Simplified setup

  • Eliminated “Zookeeper” component
  • Reduced number of setup steps

Hardened deployment

  • Fail-over at the edge
  • Kerberos impersonation for client

More flexible

  • Support multiple run configurations
  • Customize cluster settings per job type

PDI Client Spark/Hadoop Processing Nodes HADOOP CLUSTER AEL-Spark Engine

(Spark Driver)

AEL-Spark Daemon on Edge Nodes Hadoop/Spark Compatible Storage Cluster

HDFS Azure Storage Amazon S3 Etc…

Spark Executors

slide-13
SLIDE 13

Worker Nodes for Scaling Out

Scale work items across multiple nodes (containers)

  • Easily add and remove resources as required
  • Monitor and balance changing workloads
  • Deploy on premise, cloud and hybrid

Worker Node (a) Worker Node (b) Worker Node (c…) Distribute and Scale

NEW in Pentaho 8.0

ü Container framework ü Orchestration framework ü Node monitoring ü Enhanced HA implementation

slide-14
SLIDE 14

Worker Nodes Architecture

WORKER NODES

Orchestration Framework Container Framework

Pentaho Server

WN 1

e.g. KJB

WN 2

e.g. KTR

WN …n “Executor” Orchestration (Scheduler, monitoring, security, etc.) Controller (HA) Master (Standby) Master (Standby) Master (Working)

Pentaho Repository Pentaho Clients

Powered by …

slide-15
SLIDE 15

Pentaho 7.0 – Data Explorer

Access visualizations during data prep for inspection and prototyping

slide-16
SLIDE 16

Data Explorer Filters

Enhanced data inspection in PDI

  • Identify data to be cleaned or removed
  • Deliver data to the business more quickly

ENHANCED in Pentaho 8.0

ü Numeric filters ü String filters ü Include/Exclude data points

slide-17
SLIDE 17

Pentaho 8.0 – Complete

Data Integration

  • Filters in Data Explorer for enhanced data

inspection during prep

  • New PDI Repository Dialogs for better usability
  • Run Configurations for Jobs for seamless user

experience

Big Data

  • Stream Data Processing to simplify near real

time integration with Kafka

  • Enhanced AEL for reliability, performance, and

security

  • Big Data File Formats to support crucial

Hadoop use cases

  • Big Data Security with HDP Knox Gateway
  • VFS Improvements for named Hadoop clusters

Enterprise Platform

  • Worker Nodes Scale-Out to drive superior

agility and TCO for enterprises

  • Ruby Theme – new platform branding

Additional Items

  • Ops Mart for Oracle, MySQL, SQL Server
  • Big Data Sandbox VM updates
  • Platform password security improvements
  • PDI Mavenization for infra alignment
  • Documentation improvements on

help.pentaho.com

slide-18
SLIDE 18

Product Roadmap

slide-19
SLIDE 19

Scale-out Deployment Metadata Management Operations Management Cloud Deployment Adaptive Execution Spark Execution Stream Processing Machine Learning Data Exploration Visual Data Prep Embedded Analytics Data Catalog

Enterprise Platform Big Data Processing

EMERGING TRENDS AND TECHNOLOGY Advanced Analytics | Real-time

Visual Data Experience

PENTAHO FOUNDATIONAL INVESTMENT AREAS

Roadmap Initiatives

slide-20
SLIDE 20

Strengthening the Bridge Between Data and Insight

DATA EXPLORER

Source 1 Source 2 Source 3 Source 4 Source 5

ü Visual data inspection ü Intuitive data prep ü Advanced visualization ü Governed access ü Searchable metadata ü Collaboration

CATALOG

slide-21
SLIDE 21

Inline Data Prep – Vision

Intuitive, excel-like transformation design

Field Statistics Field Type: Integer Records: 10,000 Cardinality: 273 Min <count>: 1 Max <count>: 23 Bin Size (%): Quintile

Integrated Profiling Inline Model

Merge Fields

Inline Transformation

slide-22
SLIDE 22

Pentaho Machine Learning Orchestration

Data Explorer Notebook Integrations Native Algorithms Catalog Adaptive Execution

Roadmap projects that serve emerging needs of data scientists.

slide-23
SLIDE 23

Pentaho Roadmap

Features and dates are subject to change.

Nov 2017 1H18 (8.1) Future

VISUAL DATA EXPERIENCE

  • Data Explorer Filters
  • Catalog I
  • Visual Profiling
  • Catalog Search
  • Data Prep from DET
  • Layout Manager
  • New User Console
  • Data Science Viz
  • Real-time Viz

(BIG) DATA PROCESSING

  • Kafka Interface
  • Spark Streaming
  • Parquet and Avro
  • Enhanced AEL
  • Streaming II
  • Enhanced JSON/XML/ORC
  • AEL - extend distros
  • Advanced Profiling
  • Rules Validator
  • Native ML algorithms
  • AEL – Flink
  • Thin Kettle (Composer)
  • Web Designer
  • Data Operations Mgr.
  • AEL – Next

ENTERPRISE PLATFORM

  • Scale-out Framework
  • Foundry Integration
  • Unified Monitoring
  • Harden Metadata Bridges
  • Vantara Integrations
  • Enhanced Upgrade
  • Enhanced Security
  • New Content Lifecycle
  • Vantara Integrations
  • Metadata Manager
  • Business Glossary
  • Multi-tenancy
  • Vantara Integrations

ECOSYSTEM

  • AEL HDP, MapR
  • Google Cloud Platform
  • Cassandra/NoSQL Update
  • Multi-cloud Orchestration
  • Cloud App Connectors
  • Mainframe
  • Enhanced SAP and SFDC
slide-24
SLIDE 24

Hitachi Vantara Portfolio

Foundry Service Platform

Workflow Scheduling Security Clustering Monitoring Repository Search

Application Studio

Dashboards Visualization Notifications App Development

Storage

Converged Infrastructure Automated Management Data Protection Flash Storage

Data Integration Asset Management Analytics Edge Processing

  • Asset registry
  • Data catalog
  • Metadata management
  • Modeling and lineage
  • Governance
  • Data connectors
  • Transformation engines
  • Profiling and quality
  • Data blending
  • Data preparation
  • Business analytics
  • Content analytics
  • Artificial intelligence
  • Batch and stream

Software Platform Application Framework Storage Edge Processing Asset Management Analytics Data Integration

slide-25
SLIDE 25

IoT Solutions – from Edge to Outcomes

Sensors Things People

Fog Layer Core

IoT Data Pipeline Telemetry Edge Asset Registry Stream Queues

Edge Core

Sensors Things People Edge Filtering Asset Registry Stream Queues Lumada IoT Data Pipeline

Insights Outcomes

Ingest Process Visualize Model Predict Notify IoT Analytic Processor SMART CITY SMART BUSINESS SMART DATA CENTER SMART INDUSTRY

slide-26
SLIDE 26

Unlock the Business Value in YOUR Data

YOUR DATA

Video, Image and Audio Email and Documents Transactional Data IT, Sensor and Machine Logs Social Media

Hitachi Content Platform

TX TX YOUR STRATEGY

Need for Better Insights To Achieve Better Outcomes

Big Data Analytics Content Exploration Pentaho Hitachi Content Intelligence

YOUR INSHGTS

slide-27
SLIDE 27

HITACHI DATA SYSTEMS > Content platform > Storage solutions

The Power of Three

PENTAHO > Data Integration > Business Analytics HITACHI INSIGHT GROUP > Lumada IoT

slide-28
SLIDE 28

Summary

slide-29
SLIDE 29

Summary

What we covered today:

  • Product Vision
  • Pentaho 8.0 Release
  • Product Roadmap
slide-30
SLIDE 30

Next Steps

Want to learn more about Pentaho 8.0 and product roadmap?

  • Other recommended breakout sessions:

– Processing Big Data with Pentaho: Rakesh Saha – Operating Pentaho at Scale: Jens Bleul

  • Solution Expo

– Pentaho 8.0 and Beyond – Lumada IoT Platform – Hitachi Content Platform – Spark Processing – And more….

slide-31
SLIDE 31
slide-32
SLIDE 32

Pentaho 8.1 – Preview

Some Candidate Projects

  • Enhanced Streaming
  • Enhanced Profiling
  • Google Cloud Platform
  • Unified Monitoring and Logging
  • Enhanced Metadata Handling

Pentaho 8.1 Expected Availability Q2 2017