How am I going to skim through these data …?
1
How am I going to skim through these data ? 1 Trends Computers - - PowerPoint PPT Presentation
How am I going to skim through these data ? 1 Trends Computers keep getting faster But data grows faster yet! Remember? BIG DATA! Queries are becoming more complex Remember? ANALYTICS! 2 Analytic Queries Analytic
1
2
sales of each region
SELECT SUM(S.sales) FROM SALES S GROUP BY region
─ find the average supplier‐quantity supplied by suppliers of a particular part
SELECT AVG(quantity) FROM (SELECT supp, part, SUM(quantity) as quantity FROM lineitem WHERE part = 10 GROUP BY supp, part);
3
users to drill down or roll up between multiple nodes of the data cube operation
SELECT SUM(S.sales) FROM SALES S GROUP BY CUBE(pid, locid, timeid)
{pid, locid, timeid} {pid, locid} {pid, timeid} {locid, timeid} {pid} {locid} {timeid} { }
4
4
5
Time System 1 System 2 System 3 1.0000 3.01325 4.32445 7.5654 2.0000 4.54673 6.56784 8.6562 3.0000 5.46571 6.87658 10.3343
VS
6
– A losing proposition as data volume grows – Hardware improvements not sufficient
– E.g., spreadsheet programs (1 M row, 16k column limit)
– No user feedback or control in big DBMS queries (“back to the 60’s”) – Long processing time – Fundamental mismatch with preferred modes of HCI
– precompute (store answers of queries beforehand), e.g. OLAP – Don’t handle ad hoc queries or data sets well
7
Time
100%
Online Traditional
8
9
10
11
5% Avg Stock Price $2031+/‐$523 90% Sampling Progress
12
15% Avg Stock Price $1890+/‐$420 95% Sampling Progress
13
40% Avg Stock Price $1150+/‐$210 97% Sampling Progress
14
95% Avg Stock Price $1040+/‐$70 99% Sampling Progress
Speed up Slow down Terminate
15
16
17
18
40% Avg Stock Price $1150+/‐$210 97% Sampling Progress
Stop early
19
QphH = Query‐per‐Hour Performance
20 20
21
15% 20% 35% 40% Average Sales: $22,131+/‐$523 85% $21,255+/‐$286 90% $21,795+/‐$105 95% $21,712+/‐$47 98% Sampling Progress For a retailer, approximate result, such as $21,712+/‐$47, can provide a good estimation for its daily sale’s statistics. And it is more cost effective.
21
22
22
23
23
24
24
25
26
26
27
aggregated attributes, e.g., name is not correlated to salary
28
29
2* 3*
Root
17 30 14* 16* 33* 34* 38* 39* 13 5 7* 5* 8* 22* 24* 27 27* 29*
30
20 40 60 80 100 . 2 . 4 . 6 . 8 1 1 . 2 1 . 4 1 . 6 1 . 8 2 Sampling Rat e (% ) Pages fet ched (% )
31
32
32
33
33
34