1
@arnon86 S7456
Leveraging Customer Behavioral Data to Drive Revenue the GPU way - - PowerPoint PPT Presentation
Leveraging Customer Behavioral Data to Drive Revenue the GPU way @arnon86 S7456 1 Hi! Arnon Shimoni Senior Solutions Architect I like hardware & parallel / concurrent stuff In my 4 th year at SQream Technologies Send gifs to @arnon86 or
1
@arnon86 S7456
2
@arnon86 S7456
3
@arnon86 S7456
them good for data processing
throwing lots of hardware at issues. Don’t need to shovel money at the problem!
4
@arnon86 S7456
Fast
Scalable
SQL Database
Extensible for AI
Powered by GPUs
5
@arnon86 S7456
That’s my ear!
7
@arnon86 S7456
(daily grinders)
(When and where data was used)
(what the network experience was in each location)
Bu But t are e th they y act ctually using sing th this is da data ta? ? Ar Are e th they y get etting anyth ything act ctionabl ble? Ar Are e th they y loo looking at t th the e en entir tire cu customer r ba base, se, and d not
just a sing single cu cust stomer? r?
8
@arnon86 S7456
9
@arnon86 S7456
10
@arnon86 S7456
because of overloaded systems
Long queries Coffee breaks Bathroom breaks Unhappy managers Unhappy everyone
11
@arnon86 S7456
systems tend to ‘break’ and not deliver what’s expected
Legacy da data tabases require indexing and a lot of manual tuning
Newe wer da data taba bases like Vertica also require creating projections, which is time-consuming and inflexible
Distrib ibuted ed da data taba bases don’t perform well when JOIN operations are necessary
In-memory da data tabases are very painful on the wallet if you need more than a couple of terabytes
12
@arnon86 S7456
Just some of what we saw
13
@arnon86 S7456
14
@arnon86 S7456
15
@arnon86 S7456
Dashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, …). Larger circles represent more data throughput. Colour becomes darker as the day progresses. Dark-outline circles mean more night-time traffic. Dashboard aggregates directly off SQream DB, with no intermediate steps. Represents 3 table join
(3.3B rows ⋈ 40M rows ⋈ 300K rows)
16
@arnon86 S7456
Dashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, …). Larger circles represent more data throughput. Colour becomes darker as the day progresses. Dark-outline circles mean more night-time traffic. Dashboard aggregates directly off SQream DB, with no intermediate steps. Represents 3 table join
(3.3B rows ⋈ 40M rows ⋈ 300K rows)
17
@arnon86 S7456
Augmenting legacy MPP with a faster, easier to use GPU-powered analytics database CDR 4G CDR 3G Non CDR Dozens of Reports Aggregations ETL Process 80 node 5 hours Data Sources Direct Loading, 2TB/h ingest rate 20 minutes with SQream DB 15x faster
80 80 nodes s – 5 5 full racks ks 960 CPU cores, 5.12 TB RAM SQream DB v1.9.6 HP DL380g 80g9 with NVIDI IDIA Tesla K80 96 GB RAM + 6 TB storage
$10,000,000 120 m 300 m 20 m 10 m $200,000 ETL time 15x faster Reporting time 12x faster TCO w/license 50x more cost effective
33.70 56 4.0 12,000,000
We’ve done it against Netezza, Teradata, Oracle, Vertica, and even Hadoop based systems.
31.70 4 4.7 500,000
Netezza
8 full 42 42U U racks, 56 S-Blades 7 TB RAM
SQream DB v1.9.7
Dell C413 130 0 with h 4x NVIDIA Tesla a K80 80 512 GB RAM + iSCSI JBOD (20TB) Averag age e quer ery y time (second nds) Processi sing ng Units ts (S (S-Blad ade e / GPUs) Compressi ession n ratio Cost of Ownershi hip
$ $