Leveraging Customer Behavioral Data to Drive Revenue the GPU way - - PowerPoint PPT Presentation

leveraging customer behavioral data
SMART_READER_LITE
LIVE PREVIEW

Leveraging Customer Behavioral Data to Drive Revenue the GPU way - - PowerPoint PPT Presentation

Leveraging Customer Behavioral Data to Drive Revenue the GPU way @arnon86 S7456 1 Hi! Arnon Shimoni Senior Solutions Architect I like hardware & parallel / concurrent stuff In my 4 th year at SQream Technologies Send gifs to @arnon86 or


slide-1
SLIDE 1

1

@arnon86 S7456

Leveraging Customer Behavioral Data to Drive Revenue the GPU way

slide-2
SLIDE 2

2

@arnon86 S7456

Hi! Arnon Shimoni

Senior Solutions Architect I like hardware & parallel / concurrent stuff In my 4th year at SQream Technologies Send gifs to @arnon86 or arnon@sqream.com

slide-3
SLIDE 3

3

@arnon86 S7456

tl;dr

  • GPUs are good number crunchers – makes

them good for data processing

  • SQream DB with GPUs is fast
  • Rethink current solutions, the GPU can help
  • Simple hardware is good enough, let’s avoid

throwing lots of hardware at issues. Don’t need to shovel money at the problem!

slide-4
SLIDE 4

4

@arnon86 S7456

SQream DB – an SQL database powered by GPUs

Fast

  • Columnar storage
  • Always on compression
  • 2 TB / hour / GPU ingest speed

Scalable

  • 10 TB to 1 PB with ease

SQL Database

  • Familiar ANSI SQL
  • Standard connectors (ODBC, JDBC)

Extensible for AI

  • Python, Jupyter, etc
  • Data science

Powered by GPUs

  • Massively parallel engine
  • Relies on GPUs for power, not RAM

</>

slide-5
SLIDE 5

5

@arnon86 S7456

This story starts at MWC last year

That’s my ear!

slide-6
SLIDE 6

SQream knows telecoms

We’ve helped operators with

  • Better analysis of network events
  • Speeding up CDR preparations
  • More history with security management (SIEM)
  • And now – customer behaviour
slide-7
SLIDE 7

7

@arnon86 S7456

There is a lot of data about customers in telecoms

  • Where and when they wake up and where they spend their days

(daily grinders)

  • When/where were they were Instagramming

(When and where data was used)

  • How frustrated they got

(what the network experience was in each location)

  • What modes of transport they use
  • How close they are to competitor locations

Bu But t are e th they y act ctually using sing th this is da data ta? ? Ar Are e th they y get etting anyth ything act ctionabl ble? Ar Are e th they y loo looking at t th the e en entir tire cu customer r ba base, se, and d not

  • t just

just a sing single cu cust stomer? r?

slide-8
SLIDE 8

8

@arnon86 S7456

“You know, Telefonica has this multi-million dollar product based on Hadoop for selling this customer behaviour data to 3rd party companies. Have you thought about maybe getting the same solution for your company, but much simpler?”

slide-9
SLIDE 9

9

@arnon86 S7456

“Oh, and we’ll do it for you with a single machine”

slide-10
SLIDE 10

10

@arnon86 S7456

Why their current setup wasn’t good enough for this

  • Data scientists and BI professionals have only short windows of time to run queries,

because of overloaded systems

  • Windows cut even shorter due to long overnight loading
  • Queries take hours, and iterations become painful

Long queries  Coffee breaks  Bathroom breaks  Unhappy managers  Unhappy everyone

slide-11
SLIDE 11

11

@arnon86 S7456

Databases that displease data scientists

  • When data scientists or BI professionals want to ask questions that no one has asked before, these

systems tend to ‘break’ and not deliver what’s expected

  • They’re just not designed for ad-hoc querying
  • Le

Legacy da data tabases require indexing and a lot of manual tuning

  • Ne

Newe wer da data taba bases like Vertica also require creating projections, which is time-consuming and inflexible

  • Dist

Distrib ibuted ed da data taba bases don’t perform well when JOIN operations are necessary

  • In

In-memory da data tabases are very painful on the wallet if you need more than a couple of terabytes

slide-12
SLIDE 12

12

@arnon86 S7456

Picking the wrong databases will cause pain!

Just some of what we saw

  • Cloudera – for the BI team
  • Teradata – for the marketing team
  • Oracle Exadata – Transactional - for CDR collection and customer records
  • Vertica, Netezza – for financial
  • Lots of Greenplum – to collect from many sources, for marketing and BI
slide-13
SLIDE 13

13

@arnon86 S7456

Chanel says racks are fashionable. Our customers think otherwise

slide-14
SLIDE 14

14

@arnon86 S7456

SQream DB software in a standard 2U server

Configured with 96GB RAM and a single Tesla K80 for a $4,000 total investment. Designed to handle ~40 TB of telecom data

slide-15
SLIDE 15

15

@arnon86 S7456

Sample dashboards generated

Dashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, …). Larger circles represent more data throughput. Colour becomes darker as the day progresses. Dark-outline circles mean more night-time traffic. Dashboard aggregates directly off SQream DB, with no intermediate steps. Represents 3 table join

(3.3B rows ⋈ 40M rows ⋈ 300K rows)

slide-16
SLIDE 16

16

@arnon86 S7456

Sample dashboards generated

Dashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, …). Larger circles represent more data throughput. Colour becomes darker as the day progresses. Dark-outline circles mean more night-time traffic. Dashboard aggregates directly off SQream DB, with no intermediate steps. Represents 3 table join

(3.3B rows ⋈ 40M rows ⋈ 300K rows)

slide-17
SLIDE 17

17

@arnon86 S7456

Saving hours on reporting with SQream DB

Augmenting legacy MPP with a faster, easier to use GPU-powered analytics database CDR 4G CDR 3G Non CDR Dozens of Reports Aggregations ETL Process 80 node 5 hours Data Sources Direct Loading, 2TB/h ingest rate 20 minutes with SQream DB 15x faster

slide-18
SLIDE 18

The cost of performance

80 80 nodes s – 5 5 full racks ks 960 CPU cores, 5.12 TB RAM SQream DB v1.9.6 HP DL380g 80g9 with NVIDI IDIA Tesla K80 96 GB RAM + 6 TB storage

$ $

$10,000,000 120 m 300 m 20 m 10 m $200,000 ETL time 15x faster Reporting time 12x faster TCO w/license 50x more cost effective

slide-19
SLIDE 19

33.70 56 4.0 12,000,000

That wasn’t an anomaly

We’ve done it against Netezza, Teradata, Oracle, Vertica, and even Hadoop based systems.

31.70 4 4.7 500,000

Netezza

8 full 42 42U U racks, 56 S-Blades 7 TB RAM

SQream DB v1.9.7

Dell C413 130 0 with h 4x NVIDIA Tesla a K80 80 512 GB RAM + iSCSI JBOD (20TB) Averag age e quer ery y time (second nds) Processi sing ng Units ts (S (S-Blad ade e / GPUs) Compressi ession n ratio Cost of Ownershi hip

$ $

slide-20
SLIDE 20

Find out more about SQream’s high performance GPU-driven database software

www.sqream.com

  • r

arnon@sqream.com