10 B EST P RACTICES F OR S OLUTION A RCHITECTURES THAT WOULD TAME BIG - - PowerPoint PPT Presentation

10 b est p ractices f or s olution a rchitectures that
SMART_READER_LITE
LIVE PREVIEW

10 B EST P RACTICES F OR S OLUTION A RCHITECTURES THAT WOULD TAME BIG - - PowerPoint PPT Presentation

10 B EST P RACTICES F OR S OLUTION A RCHITECTURES THAT WOULD TAME BIG DATA !!! B IG D ATA B EST P RACTICE -1 U SE CASE ! U SE C ASE ! U SE C ASE ( F RAME IT TIGHT ) T HE IDEA IN B RIEF What are the questions at the heart of the problem ?


slide-1
SLIDE 1
slide-2
SLIDE 2

10 BEST PRACTICES FOR SOLUTION ARCHITECTURES THAT WOULD TAME BIG DATA !!!

slide-3
SLIDE 3

BIG DATA BEST PRACTICE-1

USE CASE ! USE CASE ! USE CASE ( FRAME IT TIGHT)

slide-4
SLIDE 4

THE IDEA IN BRIEF …

 What are the questions

at the heart of the problem ?

 Formulate the

hypothesis/questions at the heart of the issue ! Distill them into a clear set of hypothesis to be tested

 Remember Hadoop and

associated technology components are a means

 Isolate $ Denting

Analytical Use Case

slide-5
SLIDE 5

REAL LIFE EXAMPLE : CURATING USE CASE IN TELECOM SECURITY INTELLIGENCE

 Business Context  What new signals to listen to

prevent adverse events from happening ?

 4 Data Pools  Netsweepeer logs  Radius logs  Switch CDR  MMS logs  2 Use cases  Watch list analysis + Network

link analysis

 MMS Video virality

slide-6
SLIDE 6

Have an intensive ½ day cross functional workshop with business to boil down the game changing use case

Is it a “nice to have” use case or a “$ impacting use case” ?

Who is the consumer of the use case ?

How does it help him optimize cost or reduce risk or increase revenue ?

Business backwards and NOT technology forward

slide-7
SLIDE 7

BIG DATA BEST PRACTICE-2

IMPACT “AHA” MOMENT IN 6090 DAYS. START WITH A SKELETAL WORKING SOLUTION ( MVP )

slide-8
SLIDE 8

THE IDEA IN BRIEF DELIVER FIRST BIG DATA “AHA” MOMENT IN 6090 DAYS

 Skeletal MVP : End to

end implementation that links all architectural components together

 Could be the answer

to a previously unanswered question

 Propels momentum

  • f Big data project
slide-9
SLIDE 9

A REAL LIFE EXAMPLE

Industry = OTA

Context : Important to improve look to book

Is there a co-relation between response time of a web page and the look to book ratio ?

Hadoop cluster + Infobright + Hive jobs ready in 3 weeks

Scaled data and improvised dashboard experience for another 3 weeks

Business readout in 6 weeks

slide-10
SLIDE 10

THEREFORE

Break it into 3 chunks

30 day milestones

60 day milestones

90 day milestones

In 30 days plan to cover functional breadth

Hadoop infrastructure + cluster

Integrate disparate components – data pipeline, Columnar database, machine learning process , Hadoop cluster

Have a small file go from start to end thru the process chain

In 60 days plan to cover scalability

Scale for 12 months data atleast

Tableau / Pentaho

In 90 days plan to cover bells n whistles

Configurators

Alerters

Additional abtraction

Don’t wait for 6-9 months !

slide-11
SLIDE 11

BIG DATA BEST PRACTICE-3

ACTIONS NOT INSIGHTS

DATA INSIGHTS ACTION

slide-12
SLIDE 12

BEST PRACTICE-3 ACTIONS NOT INSIGHTS

 Actions are executed in the frontline

Call centre

Mobile

Store channel

Digital channel

 Actions could be 

Behaviour based discounts

Help close a digital transaction

Serve customized webpage

Take proactive actions

 Insights are nice to know  Actions impact $

slide-13
SLIDE 13

THEREFORE

 WHAT ACTIONS ARE

DRIVEN AS A RESULT OF THESE INSIGHTS ?

 HOW ARE WE

DISSEMINATING INSIGHTS TO FRONT LINE CHANNELS ?

 ASK “SO WHAT” 5 TIMES

!!!

slide-14
SLIDE 14

BIG DATA BEST PRACTICE-4 :

LISTEN TO UNSTRUCTURED INTELLIGENCE FOR STRONG SIGNALS

slide-15
SLIDE 15

REAL LIFE EXAMPLE

 Keyword frequency  “Leaks”, “Leakage”,  “Noise”, “Sound”,  “Vibrations”  Noise / leakage frequency is a better

predictor of repeat sales than any

  • ther indicators including marketing

spends !!!

slide-16
SLIDE 16

A REAL LIFE EXAMPLE

Slide 16 XYZ Online Buzz analysis

How can we create a strategy to respond to what we are hearing about XYZs buzz

  • nline ?

Business Question

  • Text mining
  • Visual data exploration
  • Hypothesis testing
  • Affinity analysis

Statistical Technique

Sentiment trends :+/- Sentiment benchmark with McDonalds Top keywords for XYZ Top keywords for McDonalds Keyword affinities

Insights derived

  • Theme specific campaigns
  • NPD process
  • Instore experience
  • Reverse impact of negative buzz

Business Action

www.yelp.com

Raw data

www.twitter.com

slide-17
SLIDE 17

WHERE DO CUSTOMERS EXPRESS THEMSELVES ?

Slide 17

Universe of XYZ sentiment data = 5 sources, 5556 posts,3 years data we’s phase-1 analysis = www.yelp.com, 136 posts, 2 years data

136 posts

Yelp.com

552 posts

Epinions.com

2854

posts

planetfeedback.com

1500

posts

Twitter.com

500 posts

Facebook.com

slide-18
SLIDE 18

SOURCE = TWITTER.COM

Slide 18

slide-19
SLIDE 19

SOURCE = YELP.COM

Slide 19

slide-20
SLIDE 20

SOURCE = FACEBOOK.COM

Slide 20

slide-21
SLIDE 21

STEP BY STEP SENTIMENT TEXT MINING

PROCESS

Slide 21

Process

  • Blogs
  • Customer

review sites

  • Online

consumer forum

  • Customers\Ven

dors emails

  • Unstructured

data from Applications

Input Output

  • Inferences
  • Customer’s

sentiments

slide-22
SLIDE 22

OVERALL SENTIMENTS DASHBOARD

Slide 22

slide-23
SLIDE 23

THEREFORE

 R text mining algorithm  RHadoop

slide-24
SLIDE 24

BIG DATA BEST PRACTICE-5 :

COLUMNAR &IN MEMORY ARCHITECTURES TO SPEED UP CHAIN OF THOUGHT Which devices are infected from a malicious attack ?

slide-25
SLIDE 25

HOW TO HANDLE “NEEDLE IN A HAYSTACK” WORKLOADS ?

 What happened on

firewall-3 between 3:17 and 3:21 am ?

 How many payment

gateway drops happened between 9:47 am and 9:52 am

  • n 15-Nov-2012 ?

 Data forensic queries

supporting chain of thoughts

slide-26
SLIDE 26

26

Id Name Designation Tenure S1 Prem Founder 8 S2 Simon Security Architect 5 S3 Bhavana Sales Head 6 S4 Ram CEO 3 S5 Shyam Developer 1 S1PremFounder8 S2SimonSecurityArchitect5 S3BhavanaSalesHead6 S4RamCEO3 S5ShyamDeveloper1 S1S2S3S4S5PremSimonBhavanaRamShyamFounderSecurityHeadSalesHeadCEODeveloper85631

Columnar DB – Concept in Brief

slide-27
SLIDE 27
  • interactive or real-time query for large datasets =key to analyst productivity

(support chain of thought analysis).

  • Chain of thought analysis = Explore data torrent by quickly running off a series of

iterative queries, each informed by the last.

  • Most solutions aren’t fast enough and reduce analytical effectiveness when

users chain of thought process is interrupted In memoy DB Tools  Dremel at Google, Druid at Metamarkets,  Sting at Netflix, Cloudera’s Impala C Berkeley’s AMPLab’s Spark, SAP Hana, Platfora.

IN MEMORY DATABASES !

slide-28
SLIDE 28

THEREFORE

 Examine columnar databases and inmemory databases to

speed up important query workloads

 Download evaluation version of Actian, Infobright and do a

POC

slide-29
SLIDE 29

BEST PRACTICE-6 HOW TO PLAN FOR 100 X

SCALABILITY ?

BIG DATA BEST PRACTICE-6 :

THINK 100 X SCALABILITY !!!

slide-30
SLIDE 30

REAL LIFE EXAMPLE

 Industry

= Telecom

 Business context  National content filtering solution  Events Generated Per Day

: 1 Billion Events

 New URL’s Classified per Day

: 1 Million

 Daily log Volume

: 400Gb average

slide-31
SLIDE 31

Price sensitive search Store search Ratings based

  • rdering

Comparator events Basket add events Payment Gateway events

The data torrent The Organisation

BIG DATA BEST PRACTICE-7 :

DETECT DATA PATTERNS IN REAL TIME !!!

Real time sense making

slide-32
SLIDE 32

THE CONTEXT

 Velocity is high  Decision making window is low  Cost of not intervening is high

slide-33
SLIDE 33

REAL TIME EXAMPLE

 Decision window = 8 mins  If a high value customer ( decile = 1 on last 36 months revenue )

and intra book interval > threshold and recency of search < 70 then route to call center channel

slide-34
SLIDE 34

THEREFORE

 Include S4 and other real time analytics into your

Big data reference architecture

slide-35
SLIDE 35

BIG DATA BEST PRACTICE-8

CAPTOLOGY = PERSUASION THRU TECHNOLOGY

slide-36
SLIDE 36

THE BASICS

 Captology = Persuasion thru technology  DESIGN FOR BEHAVIOURAL CHANGE  Persuasion examples

 Users to change channel behaviour ( Move from Desktop to

Mobile channel )

 Persuade users to advocate friends

slide-37
SLIDE 37

CAPTOLOGY IN ACTION

Captology in Insurance Reduce rates each time a person reports his or her exercise behaviour to a group of peers

  • nline

Captology in Social

slide-38
SLIDE 38

THERE ARE TOO MANY GOOD PRODUCTS HIDDEN BEHIND

BAD USER INTERFACES

PRODUCT = INTERFACE

FOR BIZ USER, WHAT LIES UNDER THE HOOD DOES NOT MATTER

slide-39
SLIDE 39

BIG DATA BEST PRACTICE-9

STRETCH KEY BIG DATA COMPONENTS TO SEE WHAT BREAKS !

slide-40
SLIDE 40

BEST PRACTICE-9 INTERSECT OF MOVING PARTS ARE THE WEAK LINKS

Big Data Moving moving parts

Columnar databases

Hadoop clusters

Advanced visualisation layer

Real time components

Data pipelines

API’s scrappers to syndicate info

Bridge to existing DW

The intersect can give away as data / user volumes increase

A real life big data architecture architecure

Event loggers

Hbase/Cassandra for high velocity event absorption

Sqoop/Flume for data ingestion

Hadoop cluster for massive data crunching

R for extracting patterns

Columnar database for 10 x lightning retrieval

Tableau for advanced visualisation

S4 for real time analytics

Channel integration components

Hadoop Cluster R Predictor ranking Infobright Columnar DB

slide-41
SLIDE 41

THEREFORE … WATCH THE FOLLOWING 4 WEAK LINKS

1.

Link between Operational event streams and Hadoop cluster

2.

Link between Hadoop cluster and Columnar database

3.

Link between Columnar database and the visualisation tool

4.

Time it takes for the machine learning algorithm to run

slide-42
SLIDE 42

HIGH VELOCITY DATA PIPELINE WHAT'S THE INGESTION RATE IN EVENTS PER SECOND OF THE DATA PIPELINE ?

slide-43
SLIDE 43

BIG DATA BEST PRACTICE-10

EMBED MACHINE LEARNING PROCESSES TO DETECT PATTERNS

slide-44
SLIDE 44

BEST PRACTICE-10 : 7 CORE MACHINE LEARNING BUILDING BLOCKS FOR ORCHESTRATING ANALYTICAL PROCESSES

Collaborative filtering

Apriori

Text mining

A/B testing

Clustering

Scoring models

Optimization

slide-45
SLIDE 45

TO SUMMARIZE THE 10 CORE BIG DATA BEST PRACTICES !

slide-46
SLIDE 46

THANK YOU !

QUESTIONS ? COMMENTS ? THOUGHTS ?

WWW.FLUTURASOLUTIONS.COM