Data Explorer Inspect, Visualize, and Collaborate from Any PDI Step - - PowerPoint PPT Presentation

data explorer
SMART_READER_LITE
LIVE PREVIEW

Data Explorer Inspect, Visualize, and Collaborate from Any PDI Step - - PowerPoint PPT Presentation

Data Explorer Inspect, Visualize, and Collaborate from Any PDI Step Ben Hopkins Pentaho Senior Product Manager, Hitachi Vantara Agenda Today, we will cover: Background on Data Explorer (DE) and its main use cases Deeper dive on specific


slide-1
SLIDE 1

Data Explorer

Inspect, Visualize, and Collaborate from Any PDI Step

Ben Hopkins Pentaho Senior Product Manager, Hitachi Vantara

slide-2
SLIDE 2

Agenda

Today, we will cover:

  • Background on Data Explorer (DE) and its main use cases
  • Deeper dive on specific DE features and how to use them
  • Demonstration of DE in action
slide-3
SLIDE 3

Data Explorer Background

slide-4
SLIDE 4

What is Data Explorer?

PDI module to visually inspect data at virtually any transformation step:

  • Access analytic views and charts without switching in and out of tools
  • Rapidly publish data sources to share with business colleagues
  • Reduce iterative work needed while building data pipelines
slide-5
SLIDE 5

Use Case – Accelerating Data Prep

Scenario:

  • Designing data pipelines to cleanse

data as it is onboarded to databases and applications DE Benefits:

  • Identify duplicates, misspellings,

missing data, and other discrepancies

  • Confirm trends and outliers
  • Informs the PDI user how to adjust

transformations to deliver clean

slide-6
SLIDE 6

Use Case – BI Prototyping

Scenario:

  • Responding to new business requests

for data to visualize and report on DE Benefits:

  • Publish preliminary data sources and

schemas to Pentaho BA for business to validate

  • No staging required – boosts agility
  • Faster time to insight due to fewer

iterations between IT and business

slide-7
SLIDE 7

Deeper Dive on Data Explorer Features

slide-8
SLIDE 8

Accessing Data Explorer

Just click to inspect data from the selected step in your transformation -- but you have 2 choices from the canvas: Run and Inspect Data: Runs the transformation up to the selected step and then launches DE Inspect Data: Launches DE directly; but this only works after the transformation has run If you’ve run the transformation since making your changes, just hit “Inspect” to use the cached data for DE

slide-9
SLIDE 9

Default Visual – Flat Table

  • Table – simple row and

column view of data

  • Fields can be sorted,

moved, or removed

  • Great when you want to

see particular rows or max/min for a field

  • Have choice of 14 other

visualizations

slide-10
SLIDE 10

Visualizations – Commonly Used

Bar: Good for quick comparisons and finding missing data Scatter: Good for correlations, outliers, numeric relationships Geo Map: Good for validating geolocation data

slide-11
SLIDE 11

Stream and Model Views

Stream View

  • No modeling layer

used, just SQL

  • Uses PDI data types

and masks

  • Required for flat

table Model View

  • Uses Measures and

Attributes specified in BA model layer

  • Required for pivot

table, geo map, and sunburst charts A decision to make before you explore your data further FOR DATA PREP FOR BI PUBLISHING

slide-12
SLIDE 12

Modeling and Annotations

  • DE generates an ‘auto model’ which

guesses at basic measures, dimensions, and geo hierarchies

  • Use Annotate Stream step to

complete the model with more hierarchies, formatting, aggregation

  • This is necessary for building and

validating prototype data sources to publish to BA

– Provides full business context

slide-13
SLIDE 13

Drill-Down Capabilities

  • Can click to drill-down into hierarchies –

i.e. territory to country to city

  • Drill-down (and hierarchies) only available

in model view

  • Annotate Stream is normally required to

create useful hierarchies for drill-down

  • DE helps to validate these hierarchies

before publishing data to PUC

slide-14
SLIDE 14

Filtering – New in 8.0

  • Apply restrictions to include/exclude

data when using charts

  • Filters can be applied to numeric

and non-numeric fields

  • Examples: ‘Greater than,’ ‘Contains

(string),’ ‘is not null’

  • Create filters by dragging fields to

the Filters panel, then configure

slide-15
SLIDE 15

Filtering – New in 8.0

  • Filters and other actions can be

invoked directly from charts as well

  • Exclude certain data
  • Keep only certain data
  • Drill down (if hierarchy available)
slide-16
SLIDE 16

Publishing

  • From any step, you can publish a data

source to BA/PUC for business users

  • PDI must be connected to repository
  • Published data source includes:

– JDBC to data service (virtual table) – Model (business context layer)

  • Can then create reports with Pentaho

BA tools like Analyzer

slide-17
SLIDE 17

Role of Pentaho Data Services

  • For DE, the data service enables rapidly prototyping and publishing

BA data sources without staging the data

  • However, data services are broadly applicable beyond DE:

– Data service can be created off any step (separately from DE) – Transformations can be queried by BA as if data were in physical table – Good for when you want to blend and visualize data sets on the fly – Can also be queried by other JDBC-compliant tools like RStudio, DBVisualizer, or SQuirreL

slide-18
SLIDE 18

A Word on Row Limits

  • By default, DE will only show the first

50,000 rows from the PDI step

  • Can be changed in Kettle properties:

– det.dataservice.dynamic.limit

  • Keep in mind performance and

resource constraints if you scale it up

slide-19
SLIDE 19

Demonstration

slide-20
SLIDE 20

Summary

What we covered today:

  • Background on Data Explorer (DE) and its main use cases – Accelerating Data

Prep and BI Prototyping

  • Deeper dive on specific DE features and how to use them – visualizing,

modeling, filtering, publishing, and more

  • Demonstration of DE in action
slide-21
SLIDE 21

Next Steps

Want to learn more?

  • For documentation on DE, search “Inspect Data” on help.pentaho.com
  • To see where DE is headed, check out the session “Pentaho 8.0 and Roadmap”
slide-22
SLIDE 22