10 b est p ractices f or s olution a rchitectures that
play

10 B EST P RACTICES F OR S OLUTION A RCHITECTURES THAT WOULD TAME BIG - PowerPoint PPT Presentation

10 B EST P RACTICES F OR S OLUTION A RCHITECTURES THAT WOULD TAME BIG DATA !!! B IG D ATA B EST P RACTICE -1 U SE CASE ! U SE C ASE ! U SE C ASE ( F RAME IT TIGHT ) T HE IDEA IN B RIEF What are the questions at the heart of the problem ?


  1. 10 B EST P RACTICES F OR S OLUTION A RCHITECTURES THAT WOULD TAME BIG DATA !!!

  2. B IG D ATA B EST P RACTICE -1 U SE CASE ! U SE C ASE ! U SE C ASE ( F RAME IT TIGHT )

  3. T HE IDEA IN B RIEF …  What are the questions at the heart of the problem ?  Formulate the hypothesis/questions at the heart of the issue ! Distill them into a clear set of hypothesis to be tested  Remember Hadoop and associated technology components are a means  Isolate $ Denting Analytical Use Case

  4. R EAL LIFE EXAMPLE : C URATING USE CASE IN TELECOM SECURITY INTELLIGENCE  Business Context  What new signals to listen to prevent adverse events from happening ?  4 Data Pools  Netsweepeer logs  Radius logs  Switch CDR  MMS logs  2 Use cases  Watch list analysis + Network link analysis  MMS Video virality

  5. Have an intensive ½ day cross functional  workshop with business to boil down the game changing use case Is it a “nice to have” use case or a “$  impacting use case” ? Who is the consumer of the use case ?  How does it help him optimize cost or  reduce risk or increase revenue ? Business backwards and NOT technology forward

  6. B IG D ATA B EST P RACTICE -2 I MPACT “ AHA ” MOMENT IN 60  90 DAYS . S TART WITH A S KELETAL WORKING SOLUTION ( MVP )

  7. T HE IDEA IN BRIEF D ELIVER F IRST B IG D ATA “ AHA ” MOMENT IN 60  90 DAYS  Skeletal MVP : End to end implementation that links all architectural components together  Could be the answer to a previously unanswered question  Propels momentum of Big data project

  8. A REAL LIFE EXAMPLE Industry = OTA  Context : Important to  improve look to book Is there a co-relation  between response time of a web page and the look to book ratio ? Hadoop cluster + Infobright  + Hive jobs ready in 3 weeks Scaled data and  improvised dashboard experience for another 3 weeks Business readout in 6  weeks

  9. T HEREFORE Break it into 3 chunks  30 day milestones  60 day milestones  90 day milestones  In 30 days plan to cover functional breadth  Hadoop infrastructure + cluster  Integrate disparate components – data pipeline, Columnar  database, machine learning process , Hadoop cluster  Have a small file go from start to end thru the process chain In 60 days plan to cover scalability  Scale for 12 months data atleast  Tableau / Pentaho  In 90 days plan to cover bells n whistles  Configurators  Alerters  Additional abtraction  Don’t wait for 6 -9 months !

  10. B IG D ATA B EST P RACTICE -3 A CTIONS NOT INSIGHTS ACTION INSIGHTS DATA

  11. B EST P RACTICE -3 A CTIONS NOT INSIGHTS  Actions are executed in the frontline Call centre  Mobile  Store channel  Digital channel   Actions could be Behaviour based discounts  Help close a digital transaction  Serve customized webpage  Take proactive actions   Insights are nice to know  Actions impact $

  12. T HEREFORE  W HAT ACTIONS ARE DRIVEN AS A RESULT OF THESE INSIGHTS ?  H OW ARE WE DISSEMINATING INSIGHTS TO FRONT LINE CHANNELS ?  A SK “ SO WHAT ” 5 TIMES !!!

  13. B IG D ATA B EST P RACTICE -4 : L ISTEN TO UNSTRUCTURED INTELLIGENCE FOR S TRONG SIGNALS

  14. R EAL LIFE EXAMPLE  Keyword frequency  “Leaks”, “Leakage”,  “Noise”, “Sound”,  “Vibrations”  Noise / leakage frequency is a better predictor of repeat sales than any other indicators including marketing spends !!!

  15. A REAL LIFE EXAMPLE Statistical Technique Raw data • Text mining Business Question • Visual data exploration www.yelp.com How can we create a strategy • Hypothesis testing Slide 16 to respond to what we are • Affinity analysis hearing about XYZs buzz www.twitter.com online ? Insights derived Sentiment trends :+/- Sentiment benchmark with McDonalds XYZ Online Top keywords for XYZ Buzz analysis Top keywords for McDonalds Keyword affinities Business Action • Theme specific campaigns • NPD process • Instore experience • Reverse impact of negative buzz

  16. W HERE DO CUSTOMERS EXPRESS THEMSELVES ? 2854 136 posts 552 posts posts Yelp.com Epinions.com planetfeedback.com 1500 500 posts posts Twitter.com Facebook.com Universe of XYZ sentiment data = 5 sources, 5556 posts,3 years data we’s phase-1 analysis = www.yelp.com, 136 posts, 2 years data Slide 17

  17. S OURCE = T WITTER . COM Slide 18

  18. S OURCE = Y ELP . COM Slide 19

  19. S OURCE = F ACEBOOK . COM Slide 20

  20. S TEP BY STEP SENTIMENT TEXT MINING PROCESS Process • Blogs • Customer review sites • Inferences • Online consumer • Customer’s forum sentiments • Customers\Ven dors emails • Unstructured data from Applications Output Input Slide 21

  21. O VERALL S ENTIMENTS D ASHBOARD Slide 22

  22. T HEREFORE  R text mining algorithm  RHadoop

  23. Which devices are infected from a malicious attack ? B IG D ATA B EST PRACTICE - 5 : C OLUMNAR &I N M EMORY ARCHITECTURES TO SPEED UP CHAIN OF THOUGHT

  24. H OW TO H ANDLE “N EEDLE IN A H AYSTACK ” W ORKLOADS ?  What happened on firewall-3 between 3:17 and 3:21 am ?  How many payment gateway drops happened between 9:47 am and 9:52 am on 15-Nov-2012 ?  Data forensic queries supporting chain of thoughts

  25. Columnar DB – Concept in Brief Id Name Designation Tenure S1 Prem Founder 8 S2 Simon Security Architect 5 S3 Bhavana Sales Head 6 S4 Ram CEO 3 S5 Shyam Developer 1 S1PremFounder8 S2SimonSecurityArchitect5 S3BhavanaSalesHead6 S4RamCEO3 S5ShyamDeveloper1 S1S2S3S4S5PremSimonBhavanaRamShyamFounderSecurityHeadSalesHeadCEODeveloper85631 26

  26. I N M EMORY D ATABASES !  interactive or real-time query for large datasets =key to analyst productivity (support chain of thought analysis).  Chain of thought analysis = Explore data torrent by quickly running off a series of iterative queries, each informed by the last.  Most solutions aren’t fast enough and reduce analytical effectiveness when users chain of thought process is interrupted In memoy DB Tools  Dremel at Google,  Druid at Metamarkets,  Sting at Netflix,  Cloudera’s Impala  C Berkeley’s AMPLab’s Spark,  SAP Hana,  Platfora.

  27. T HEREFORE  Examine columnar databases and inmemory databases to speed up important query workloads  Download evaluation version of Actian, Infobright and do a POC

  28. B EST P RACTICE -6 H OW TO P LAN FOR 100 X SCALABILITY ? B IG D ATA B EST PRACTICE - 6 : T HINK 100 X S CALABILITY !!!

  29. R EAL LIFE EXAMPLE  Industry = Telecom  Business context  National content filtering solution  Events Generated Per Day : 1 Billion Events  New URL’s Classified per Day : 1 Million  Daily log Volume : 400Gb average

  30. The Organisation The data torrent Real time sense making Price sensitive search Ratings based ordering Store search Basket add Comparator events events Payment Gateway events B IG D ATA B EST PRACTICE - 7 : D ETECT D ATA PATTERNS IN REAL TIME !!!

  31. T HE CONTEXT  Velocity is high  Decision making window is low  Cost of not intervening is high

  32. R EAL TIME EXAMPLE  Decision window = 8 mins  If a high value customer ( decile = 1 on last 36 months revenue ) and intra book interval > threshold and recency of search < 70 then route to call center channel

  33. T HEREFORE  Include S4 and other real time analytics into your Big data reference architecture

  34. B IG D ATA B EST P RACTICE -8 C APTOLOGY = P ERSUASION THRU TECHNOLOGY

  35. T HE BASICS  Captology = Persuasion thru technology  D ESIGN FOR B EHAVIOURAL C HANGE  Persuasion examples  Users to change channel behaviour ( Move from Desktop to Mobile channel )  Persuade users to advocate friends

  36. C APTOLOGY IN A CTION Captology in Insurance Reduce rates each time a person reports his or her exercise behaviour to a group of peers online Captology in Social

  37. T HERE ARE TOO MANY GOOD PRODUCTS HIDDEN BEHIND BAD USER INTERFACES P RODUCT = I NTERFACE F OR B IZ USER , WHAT LIES UNDER THE HOOD DOES NOT MATTER

  38. B IG D ATA B EST P RACTICE -9 S TRETCH KEY B IG D ATA COMPONENTS TO SEE WHAT BREAKS !

  39. B EST P RACTICE -9 I NTERSECT OF M OVING P ARTS ARE THE WEAK LINKS Big Data Moving moving parts  Hadoop Columnar databases  Cluster Hadoop clusters   Advanced visualisation layer Real time components  Data pipelines  API’s scrappers to syndicate info  Bridge to existing DW  The intersect can give away as data / user volumes increase  A real life big data architecture architecure  Event loggers  Hbase/Cassandra for high velocity event absorption  Sqoop/Flume for data ingestion  Hadoop cluster for massive data crunching  R for extracting patterns   Columnar database for 10 x lightning retrieval Tableau for advanced visualisation  S4 for real time analytics  Channel integration components  Infobright R Columnar Predictor DB ranking

  40. T HEREFORE … W ATCH THE FOLLOWING 4 W EAK LINKS Link between Operational 1. event streams and Hadoop cluster Link between Hadoop 2. cluster and Columnar database Link between Columnar 3. database and the visualisation tool Time it takes for the 4. machine learning algorithm to run

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend