Pentaho 8.0 and Beyond Matt Howard Pentaho Sr. Director of Product - - PowerPoint PPT Presentation
Pentaho 8.0 and Beyond Matt Howard Pentaho Sr. Director of Product - - PowerPoint PPT Presentation
Pentaho 8.0 and Beyond Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara Safe Harbor Statement The forward-looking statements contained in this document represent an outline of our current intended product direction. It is
The forward-looking statements contained in this document represent an outline of our current intended product direction. It is provided for information purposes only and is not a commitment to deliver any new or enhanced product or functionality, or that we will pursue the product direction described. Facts and circumstances may occur which may impact current plans, resulting in changes to the information in this presentation. This information is current only as of the date it is made and should not be relied upon in making purchasing
- decisions. The development, release (if at all), and timing of any features or functionality
described for the Pentaho products remains at the sole discretion of Pentaho.
Safe Harbor Statement
Pentaho 8.0 and Beyond
1
Product Vision
2
Pentaho 8.0
3
Product Roadmap
Product Vision
HITACHI DATA SYSTEMS > Content platform > Storage solutions
The Power of Three
PENTAHO > Data Integration > Business Analytics HITACHI INSIGHT GROUP > Lumada IoT
Operational Data Big Data Data Stream Public/Private Clouds Consumer Business Analyst Data Analyst / Data Scientist Data Engineer
Custom and Self-Service Dashboards Interactive Query and Analysis
Pentaho Data Integration
Data Preparation | Integrated Machine learning
O P E N A N D E M B E D DA B L E
Production Reporting
Operational Data Big Data Data Stream Public/Private Clouds Consumer Business Analyst Data Analyst / Data Scientist Data Engineer
Custom and Self-Service Dashboards Interactive Query and Analysis Production Reporting
Pentaho Data Integration
Data Preparation | Integrated Machine Learning
Pentaho Business Analytics Platform
O P E N A N D E M B E D DA B L E O P E N A N D E M B E D DA B L E O P E N A N D E M B E D DA B L E
Future Vision: A Single Consistent Experience
Data Prep Data Engineering Analytics
Ingestion Processing Blending Data Delivery Data Discovery / Analysis Analysis & Dashboards
Administration Security Lifecycle Management Data Provenance Dynamic Data Pipeline Monitoring Automation
Pentaho 8.0
Introducing Pentaho 8.0
Challenge #1 Data volumes and velocity are growing exponentially Challenge #2 Processing and storage resources are constrained Challenge #3 Shortage of Big Data talent and lack of productivity Pentaho 8.0 Broadens connectivity to streaming data sources
- Connect to Kafka streams
- Stream processing with Spark
- Big data security with Knox
Pentaho 8.0 Optimizes processing resources
- Enhanced Adaptive Execution (AEL)
- Native Avro and Parquet handling
- Worker nodes for “Scale-out”
Pentaho 8.0 Boosts team productivity across the pipeline
- Data explorer filters
- Improved repository UX
- Extended operations mart
Streaming for Time Sensitive Insight
Enable use cases that require real-time processing, monitoring and aggregation
- Real-time device monitoring
- Log-file aggregation
- Notifications
- And more…
NEW in Pentaho 8.0
ü Kafka Producer Step ü Kafka Consumer Step ü Get records from stream Step ü Spark streaming via AEL
Pentaho 7.1 – Adaptive Execution for Spark
ü No Coding ü Build Once ü Execute on Any* Engine
PDI Pentaho Kettle
*Currently Available Engines
Enhanced Adaptive Execution
Simplified setup
- Eliminated “Zookeeper” component
- Reduced number of setup steps
Hardened deployment
- Fail-over at the edge
- Kerberos impersonation for client
More flexible
- Support multiple run configurations
- Customize cluster settings per job type
PDI Client Spark/Hadoop Processing Nodes HADOOP CLUSTER AEL-Spark Engine
(Spark Driver)
AEL-Spark Daemon on Edge Nodes Hadoop/Spark Compatible Storage Cluster
HDFS Azure Storage Amazon S3 Etc…
Spark Executors
Worker Nodes for Scaling Out
Scale work items across multiple nodes (containers)
- Easily add and remove resources as required
- Monitor and balance changing workloads
- Deploy on premise, cloud and hybrid
Worker Node (a) Worker Node (b) Worker Node (c…) Distribute and Scale
NEW in Pentaho 8.0
ü Container framework ü Orchestration framework ü Node monitoring ü Enhanced HA implementation
Worker Nodes Architecture
WORKER NODES
Orchestration Framework Container Framework
Pentaho Server
WN 1
e.g. KJB
WN 2
e.g. KTR
WN …n “Executor” Orchestration (Scheduler, monitoring, security, etc.) Controller (HA) Master (Standby) Master (Standby) Master (Working)
Pentaho Repository Pentaho Clients
Powered by …
Pentaho 7.0 – Data Explorer
Access visualizations during data prep for inspection and prototyping
Data Explorer Filters
Enhanced data inspection in PDI
- Identify data to be cleaned or removed
- Deliver data to the business more quickly
ENHANCED in Pentaho 8.0
ü Numeric filters ü String filters ü Include/Exclude data points
Pentaho 8.0 – Complete
Data Integration
- Filters in Data Explorer for enhanced data
inspection during prep
- New PDI Repository Dialogs for better usability
- Run Configurations for Jobs for seamless user
experience
Big Data
- Stream Data Processing to simplify near real
time integration with Kafka
- Enhanced AEL for reliability, performance, and
security
- Big Data File Formats to support crucial
Hadoop use cases
- Big Data Security with HDP Knox Gateway
- VFS Improvements for named Hadoop clusters
Enterprise Platform
- Worker Nodes Scale-Out to drive superior
agility and TCO for enterprises
- Ruby Theme – new platform branding
Additional Items
- Ops Mart for Oracle, MySQL, SQL Server
- Big Data Sandbox VM updates
- Platform password security improvements
- PDI Mavenization for infra alignment
- Documentation improvements on
help.pentaho.com
Product Roadmap
Scale-out Deployment Metadata Management Operations Management Cloud Deployment Adaptive Execution Spark Execution Stream Processing Machine Learning Data Exploration Visual Data Prep Embedded Analytics Data Catalog
Enterprise Platform Big Data Processing
EMERGING TRENDS AND TECHNOLOGY Advanced Analytics | Real-time
Visual Data Experience
PENTAHO FOUNDATIONAL INVESTMENT AREAS
Roadmap Initiatives
Strengthening the Bridge Between Data and Insight
DATA EXPLORER
Source 1 Source 2 Source 3 Source 4 Source 5
ü Visual data inspection ü Intuitive data prep ü Advanced visualization ü Governed access ü Searchable metadata ü Collaboration
CATALOG
Inline Data Prep – Vision
Intuitive, excel-like transformation design
Field Statistics Field Type: Integer Records: 10,000 Cardinality: 273 Min <count>: 1 Max <count>: 23 Bin Size (%): Quintile
Integrated Profiling Inline Model
Merge Fields
Inline Transformation
Pentaho Machine Learning Orchestration
Data Explorer Notebook Integrations Native Algorithms Catalog Adaptive Execution
Roadmap projects that serve emerging needs of data scientists.
Pentaho Roadmap
Features and dates are subject to change.
Nov 2017 1H18 (8.1) Future
VISUAL DATA EXPERIENCE
- Data Explorer Filters
- Catalog I
- Visual Profiling
- Catalog Search
- Data Prep from DET
- Layout Manager
- New User Console
- Data Science Viz
- Real-time Viz
(BIG) DATA PROCESSING
- Kafka Interface
- Spark Streaming
- Parquet and Avro
- Enhanced AEL
- Streaming II
- Enhanced JSON/XML/ORC
- AEL - extend distros
- Advanced Profiling
- Rules Validator
- Native ML algorithms
- AEL – Flink
- Thin Kettle (Composer)
- Web Designer
- Data Operations Mgr.
- AEL – Next
ENTERPRISE PLATFORM
- Scale-out Framework
- Foundry Integration
- Unified Monitoring
- Harden Metadata Bridges
- Vantara Integrations
- Enhanced Upgrade
- Enhanced Security
- New Content Lifecycle
- Vantara Integrations
- Metadata Manager
- Business Glossary
- Multi-tenancy
- Vantara Integrations
ECOSYSTEM
- AEL HDP, MapR
- Google Cloud Platform
- Cassandra/NoSQL Update
- Multi-cloud Orchestration
- Cloud App Connectors
- Mainframe
- Enhanced SAP and SFDC
Hitachi Vantara Portfolio
Foundry Service Platform
Workflow Scheduling Security Clustering Monitoring Repository Search
Application Studio
Dashboards Visualization Notifications App Development
Storage
Converged Infrastructure Automated Management Data Protection Flash Storage
Data Integration Asset Management Analytics Edge Processing
- Asset registry
- Data catalog
- Metadata management
- Modeling and lineage
- Governance
- Data connectors
- Transformation engines
- Profiling and quality
- Data blending
- Data preparation
- Business analytics
- Content analytics
- Artificial intelligence
- Batch and stream
Software Platform Application Framework Storage Edge Processing Asset Management Analytics Data Integration
IoT Solutions – from Edge to Outcomes
Sensors Things People
Fog Layer Core
IoT Data Pipeline Telemetry Edge Asset Registry Stream Queues
Edge Core
Sensors Things People Edge Filtering Asset Registry Stream Queues Lumada IoT Data Pipeline
Insights Outcomes
Ingest Process Visualize Model Predict Notify IoT Analytic Processor SMART CITY SMART BUSINESS SMART DATA CENTER SMART INDUSTRY
Unlock the Business Value in YOUR Data
YOUR DATA
Video, Image and Audio Email and Documents Transactional Data IT, Sensor and Machine Logs Social Media
Hitachi Content Platform
TX TX YOUR STRATEGY
Need for Better Insights To Achieve Better Outcomes
Big Data Analytics Content Exploration Pentaho Hitachi Content Intelligence
YOUR INSHGTS
HITACHI DATA SYSTEMS > Content platform > Storage solutions
The Power of Three
PENTAHO > Data Integration > Business Analytics HITACHI INSIGHT GROUP > Lumada IoT
Summary
Summary
What we covered today:
- Product Vision
- Pentaho 8.0 Release
- Product Roadmap
Next Steps
Want to learn more about Pentaho 8.0 and product roadmap?
- Other recommended breakout sessions:
– Processing Big Data with Pentaho: Rakesh Saha – Operating Pentaho at Scale: Jens Bleul
- Solution Expo
– Pentaho 8.0 and Beyond – Lumada IoT Platform – Hitachi Content Platform – Spark Processing – And more….
Pentaho 8.1 – Preview
Some Candidate Projects
- Enhanced Streaming
- Enhanced Profiling
- Google Cloud Platform
- Unified Monitoring and Logging
- Enhanced Metadata Handling