Analy&c Window Fu Func&ons A prac'cal look at using analy'c - - PowerPoint PPT Presentation

analy c window fu func ons
SMART_READER_LITE
LIVE PREVIEW

Analy&c Window Fu Func&ons A prac'cal look at using analy'c - - PowerPoint PPT Presentation

Analy&c Window Fu Func&ons A prac'cal look at using analy'c func'ons Olympia Area SqlServer User Group By Gary Melhaff March 15, 2017 About me me Over 30 years in IT Cer'fica'ons & experience in Oracle, Teradata and Sql


slide-1
SLIDE 1

Analy&c Window Fu Func&ons

A prac'cal look at using analy'c func'ons Olympia Area SqlServer User Group

By Gary Melhaff March 15, 2017

slide-2
SLIDE 2

About me me

  • Over 30 years in IT
  • Cer'fica'ons & experience in Oracle, Teradata and Sql Server
  • Can be blamed for designing around dozen datamarts and data warehouses since late 1990s
  • Worked for DSHS and DOT in the 1980s, then DNR un'l 1996, then Weyerhaeuser,

Freeinternet.com, consul'ng gigs, then Washington Mutual Bank for around 7 years

  • Last 7 years as Data Architect at World Vision
  • Product reviews at h]ps://www.itcentralsta'on.com for SSIS, Wherescape RED, MDS and

Melissa Data Quality

slide-3
SLIDE 3

Environme ment

  • Data warehouse environment - 100% Microsod
  • Sources: Microsod Dynamics CRM Online, Oracle Enterprise Business Suite, Blackbaud CRM

(hosted), flat files from mul'ple sources, Master Data Services (MDS), Adobe Campaign, Oracle NetSuite (new)

  • MelissaData Matchup for SSIS, A]unity OLEDB for Oracle, Kingswaysod CRM Adapter for SSIS,

Visual Studio Online (TFS), BiXpress (for SSIS monitoring&no'fica'on), SSIS Mul'pleHash

  • Master Data Services
  • SSAS tabular (2016 SP1)
  • BI tools: SSRS, PowerBI and Excel
  • VMWare and SSD San
slide-4
SLIDE 4

Wh Why y y you sho u should c uld car are e

Analy'c Window Func'ons are extremely powerful Require very li]le coding and are easy to use once you learn the basic concepts Workarounds would be rela'vely complex and less efficient

slide-5
SLIDE 5

Wha What ar t are w e we t e talking abo alking about? ut?

Any calcula.on that is defined by an “OVER()” clause…GROUP BY is not required They give you access to rows outside of the immediate row you are on In other words you can do things like…

  • access aggregate values along with detail rows
  • access row values from other rows besides the current row

Examples of prac:cal use

  • Record De-Duplica'on (picking rows that share same key values)
  • Assigning effec've date ranges
slide-6
SLIDE 6

Analy&c and Ranking Window Fu Func&ons

  • Lead/Lag
  • First_Value/Last_Value
  • Row_Number
  • Rank
  • Dense_Rank
  • N'le
  • Cume_Dist
  • Percent_Rank
  • Percen'le_Disc
  • Percen'le_Cont

Func'ons most oden used in ETL opera'ons Aggregate window func:ons

  • MIN, MAX, AVG, SUM, COUNT, COUNT_BIG
  • CHECKSUM_AGG
  • STDEV, STDEVP, VAR, VARP
slide-7
SLIDE 7

Concepts – – The Window

It is set with the “par::on by” clause

  • It’s the set of rows that share the same column(s)
  • Par''on clause is op'onal – if you don’t use it then

all rows are the window size

slide-8
SLIDE 8

Concepts – – The Window Fr Frame me

Frame set by keywords shown below

  • Defines the subset of rows within the window that the

func'on will u'lize

  • Careful - Defaults to start of window up to current row!
  • You set it using ROW or RANGE syntax
slide-9
SLIDE 9

Rows in perspec&ve of the window frame me

Preceding Following ROWS 1 2 3 4 5 6 7 8 9 10 Current Row “Unbounded preceding” sets the frame to the start of the window. “Unbounded following” sets the frame to the end

  • f the window.

You can also specify an offset of rows preceding or following.

slide-10
SLIDE 10

Ca Cavea eats or

  • r R

Res estric& c&on

  • ns
  • Cannot use one of these within a “where” or “having” clause!
  • Certain func'ons will go to the beginning or end of a set of rows but not in-

between (eg. first/last)

  • Some op'ons such as window frame size are not available for all func'ons.

For example row_number() only supports the default window frame size.

slide-11
SLIDE 11

Examp mple Use Case

Rolling11Months_Amt = SUM(ISNULL(Total_Amt,0)) OVER (PARTITION BY Customer_Dim_Id ORDER BY Calendar_Year_Month_Nbr ROWS BETWEEN 11 PRECEDING AND CURRENT ROW)

Window here is the set of rows for each customer Id Ordering within window frame Window frame boundaries

* SSIS executes this in a dataflow compu.ng rolling total by customer by month for 150 million records across over 8 million customers in just over 16 minutes on 12 vproc server

slide-12
SLIDE 12

Demo mos