Working with Hash Tables Daniel Petrolito (ANZ Bank) Working With - - PowerPoint PPT Presentation

working with hash tables
SMART_READER_LITE
LIVE PREVIEW

Working with Hash Tables Daniel Petrolito (ANZ Bank) Working With - - PowerPoint PPT Presentation

Working with Hash Tables Daniel Petrolito (ANZ Bank) Working With Hash Tables Daniel SAS 3+ years Programming 6 years ANZ Data Scientist Combining Data Combining tables Base table Lookup table(s) Want


slide-1
SLIDE 1

Working with Hash Tables

Daniel Petrolito (ANZ Bank)

slide-2
SLIDE 2

Working With Hash Tables

  • Daniel
  • ANZ – Data Scientist
  • SAS 3+ years
  • Programming 6 years
slide-3
SLIDE 3

Combining Data

  • Combining tables
  • Base table
  • Lookup table(s)
  • Want flexibility without too much complexity
  • Options
  • Data Step

✓ Flexible ❖ Must pre-sort data

  • SQL

✓ In-database ❖ Cartesian joins

  • Others

✓ Macros ✓ Arrays ✓ User-defined formats ✓ Hash tables

slide-4
SLIDE 4

Combining Data – techniques

You may have tried the following:

  • Index the fields you are joining on
  • Reduce number of rows and columns as early in your process as

possible

  • Do joins where the data originates
  • SAS dataset compression (compress=yes)
  • Column lengths
  • Hash Tables
slide-5
SLIDE 5

Hash Tables - Introduction

  • Optimized for speed through

complex hashing algorithms

  • Think of it as a SAS indexed

data table loaded in memory

slide-6
SLIDE 6

Benefits & Consequences of Hash Tables

Pros

  • In-memory, faster lookups than

any other technique

  • Can join multiple tables, using

different keys, in a single data step

  • No sorting required

Cons

  • Object-oriented syntax - not

intuitive to code

  • Large memory requirement
  • Data step only
slide-7
SLIDE 7

Simple Example

People (Base Table) name age John 41 Sally 33 Rebecca 22 Reference (Lookup Table) Name age state John 50VIC Sally 35NSW

Hashtest (Results) name age state rc John 50VIC Sally 35NSW Rebecca 22 160038

Once only, define & populate the hash table For each record in the base table, look up the hash table

Lookup failed – program accordingly

slide-8
SLIDE 8

Hash Table Methods

  • Find()
  • Add()
  • Replace()
  • Iterate()
  • Output()
slide-9
SLIDE 9

When Not To Use Hash Tables

  • Speed is not required
  • Team not technical savvy
  • Joins are more optimized to be done from external

databases/sources

  • Not used to replace every left join
  • Look up table is too big (memory constraint)
slide-10
SLIDE 10

Real World Example

  • Build a fast SAS solution working with millions
  • f rows of data that would join on multiple

tables and calculate complicated metrics.

  • Run thousands of times (Monte Carlo

simulation) Issues

  • First SQL solution was too slow
  • Dynamic & complex joins
slide-11
SLIDE 11

Real World Example – Solution with Hash Tables

key range value X123 60 0.1 X123 70 0.2 X123 80 0.3

  • Hash tables flexible to only fetch data when

needed.

  • Dynamic look up keys:
  • hh.find(key:some_variable)
  • Ranged look ups:
  • hh.find_next()
slide-12
SLIDE 12

Further Reading

  • http://support.sas.com/resources/papers/proceedings17/0821-2017.pdf
  • http://www.lexjansen.com/nesug/nesug07/bb/bb16.pdf
  • https://support.sas.com/resources/papers/sgf2008/better-hashing-

sas92.pdf