Saiku Saiku taking taking OLAP OLAP databases databases into - - PowerPoint PPT Presentation

saiku saiku taking taking olap olap databases databases
SMART_READER_LITE
LIVE PREVIEW

Saiku Saiku taking taking OLAP OLAP databases databases into - - PowerPoint PPT Presentation

Saiku Saiku taking taking OLAP OLAP databases databases into into 21 21st st century century Tomasz Tomasz Nurkiewicz Nurkiewicz nurkiewicz nurkiewicz.com com @tnurkiewicz | | tnurkiewicz Slides: bit.ly/33degree What


slide-1
SLIDE 1

Saiku Saiku – – taking taking OLAP OLAP databases databases into into 21 21st st century century Tomasz Tomasz Nurkiewicz Nurkiewicz

| | nurkiewicz nurkiewicz.com com @tnurkiewicz tnurkiewicz

Slides: bit.ly/33degree

slide-2
SLIDE 2

What What is is Saiku Saiku? ?

DEMO DEMO

slide-3
SLIDE 3

Core Core concepts concepts

OLAP Fact Dimension Hierarchy

slide-4
SLIDE 4

Example Example facts facts

Sold product Tweet/forum post/shared photo Website hit Incoming text message ...you name it

slide-5
SLIDE 5

Dimension Dimension

"Properties of facts" When? What? Where? Who? How?

slide-6
SLIDE 6

Example Example dimensions dimensions

Access Access log log

Timestamp IP URL resource HTTP response code

slide-7
SLIDE 7

Hierarchy Hierarchy

Multi Multi-level level aggregation aggregation

Example Example: : location location hierarchy hierarchy

(All) Continent Country State City

slide-8
SLIDE 8
slide-9
SLIDE 9

Measures Measures

Quantitative properties Aggregate matching facts over them Count/Sum/Average/Min/Max

slide-10
SLIDE 10
slide-11
SLIDE 11

Example Example measures measures

Load time (page hit fact) Total price (sale fact) Age of customer

slide-12
SLIDE 12

Charting Charting - DEMO DEMO

slide-13
SLIDE 13

Exporting Exporting - DEMO DEMO

slide-14
SLIDE 14

Drill Drill down down - DEMO DEMO

slide-15
SLIDE 15

Ignored Ignored concepts concepts

Hypercube Mondrian MDX

slide-16
SLIDE 16

Your Your own

  • wn cube

cube

slide-17
SLIDE 17

Star Star schema schema

slide-18
SLIDE 18

ETL ETL

slide-19
SLIDE 19

ETL ETL - challenges challenges

Missing or incomplete data Heuristics Incremental, periodic updates Various data sources

slide-20
SLIDE 20

Schema Schema file file

<Schema name="Twitter"> <Cube name="Tweets" defaultMeasure="Count"> <Table name="tweet"> <DimensionUsage name="Time" source="Time" foreignKey="time_id"/> <Dimension name="Location" foreignKey="location_id"> <Hierarchy hasAll="true" allMemberName="All locations"> <Table name="location"/> <Level name="Continent" column="continent"/> <Level name="Country" column="country"/> <Level name="City" column="city"/> </Hierarchy> </Dimension> <!-- ... --> </Schema>

slide-21
SLIDE 21

Schema Schema Workbench Workbench

Source: www.stratebi.com/cursos/olap-mdx

slide-22
SLIDE 22

Security Security - users users

Standard user/password Roles Spring Security - customizable

slide-23
SLIDE 23

Security Security - data data

By role Restrict what can be seen Top/bottom limit

slide-24
SLIDE 24

Performance Performance

Big data, before it was cool Indexes on foreign keys Aggregate tables

slide-25
SLIDE 25

Without Without Aggregate Aggregate table table

SELECT COUNT(id) FROM tweet NATURAL JOIN locations GROUP BY locations.continent

slide-26
SLIDE 26

With With aggregate aggregate table table

INSERT INTO agg (cnt, l.city, l.country, l.continent) SELECT COUNT(t.id) AS cnt, city, country, continent FROM tweet t NATURAL JOIN locations l GROUP BY l.city

Usages:

SELECT SUM(agg.count) FROM agg GROUP BY locations.continent

slide-27
SLIDE 27

Pentaho Pentaho Aggregation Aggregation Designer Designer

Source: infocenter.pentaho.com/help/index.jsp

slide-28
SLIDE 28

Deployment Deployment

mondrian.jar - engine saiku.war - RESTful web services ui.war - JS front-end

slide-29
SLIDE 29

Disadvantages Disadvantages

Horizontal scalability? Stuck with SQL databases Complex schema definition (XML) Aggregate tables are hard

slide-30
SLIDE 30

Thank Thank you you! !

Slides: nurkiewicz.github.io/talks/2014/33degree