BIG DATA IN HYBRID WORLDS The Story of M H i ! Im Florian CEO - - PowerPoint PPT Presentation

big data in hybrid worlds
SMART_READER_LITE
LIVE PREVIEW

BIG DATA IN HYBRID WORLDS The Story of M H i ! Im Florian CEO - - PowerPoint PPT Presentation

BIG DATA IN HYBRID WORLDS The Story of M H i ! Im Florian CEO of Dataiku maker Data Science Studio , the Photoshop for Data Science React on twitter COMMUNITY EDITION (its FREE)


slide-1
SLIDE 1

BIG DATA
 IN HYBRID WORLDS

The Story of M

slide-2
SLIDE 2

I’m Florian CEO of Dataiku

maker Data ¡Science ¡Studio,
 the « Photoshop for Data Science » COMMUNITY ¡EDITION ¡(it’s ¡FREE) ¡ ¡

http://www.dataiku.com/dss/trynow/

H i ! React on twitter @fdouetteau #BigDataParis

slide-3
SLIDE 3

B i g o r S m a l l

Startup Big Firm

slide-4
SLIDE 4

H O W D O P E O P L E TA K E D E C I S I O N S

slide-5
SLIDE 5

B U Y I N G D E C I S I O N S

Should I buy it ?

slide-6
SLIDE 6

S O C I A L D E C I S I O N S

Should I talk to him ?

slide-7
SLIDE 7

M LIKE MEETING

B u s i n e s s D e c i s i o n s

slide-8
SLIDE 8

B u s i n e s s I n t e l l i g e n c e

slide-9
SLIDE 9

B u s i n e s s I n t e l l i g e n c e

slide-10
SLIDE 10

IN 2001 man (actually Gartner) invented big data

Volume Variety Velocity

slide-11
SLIDE 11

WHAT IF THE META GROUP HAD CHOSEN ANOTHER LETTER?

Capacity Complexity Celerity Size Serendipity Speed Big Blur Blazing

slide-12
SLIDE 12

Or Combine

Com….. Bu.. Sh..

slide-13
SLIDE 13

BIG DATA RELIGION ?

slide-14
SLIDE 14

M LIKE METRICS

slide-15
SLIDE 15

M L I K E M E T R I C S

How much does it cost to produce and maintain a metric ? How many metrics do I need ? Do I Follow the right metrics ? Do I Have enough data ? Do I Have enough Data?

slide-16
SLIDE 16
  • Self-Service


Build your own metrics

  • Analytical Capabilities


Find your patterns


  • Large Volume


Store it all

M o r e M e t r i c s M e a n s M o r e M e a n s

slide-17
SLIDE 17

DATA MINING

M o r e M e t r i c s M e a n s M o r e A p p l i c a t i o n Mission Critical Small Structured Large Diverse Sheer Curiosity

Reporting for Finance in Any Industry Analyze Each Tweet Web Navigation
 For E-Merchant Ticket Data For Discounts in Retail Phone Call Logs for Security RTB Data For Advertising Customer Consumption For Anti-Churn in Utilities

CLASSIC BI LARGE PRODUCTION PLATFORM DATA EXPLORATION

Optimization

Filings For Fraud in Insurance

slide-18
SLIDE 18

D DATA MINING

TO DAY E A C H O W N A S I T S S TO R E Mission Critical Small Structured Large Diverse Sheer Curiosity

CLASSIC BI LARGE PRODUCTION PLATFORM DATA EXPLORATION

Optimization

DATA WAREHOUSING DATA MINING REPOSITORIES DATA LAKE GOOGLE LIKE PLATFORM

slide-19
SLIDE 19

i t ’s n o t j u s t a b o u t t h e m e t r i c s

slide-20
SLIDE 20

DATA D R I V E N B U S I N E S S

slide-21
SLIDE 21

P r o b l e m i s t h e h u m a n

Cannot take decisions in seconds Limited sight (100 rows) Limited short term memory (10k rows)?

slide-22
SLIDE 22

M LIKE MACHINE

slide-23
SLIDE 23

R i s e o f A I

1997 Deep Blue 2011 Watson’s Jeopardy 2012 Google Cat 2005 Autonomous Vehicule 1974 - 1993 AI Winters

slide-24
SLIDE 24

www.dataiku.com

Churn Volume Forecast Recommender Segmentation Lifetime Value Risk Score Hot Location Pricing Ranking Fraud Event Paths APPLICATIONS OF MACHINE LEARNING TO BUSINESS PROBLEMS

slide-25
SLIDE 25

P R E D I C T I V E M A I N C O N F O R T Z O N E Mission Critical Small Structured Large Diverse Sheer Curiosity

Reporting for Finance in Any Industry Analyze Each Tweet Web Navigation
 For E-Merchant Ticket Data For Discounts in Retail Phone Call Logs for Security RTB Data For Advertising Customer Consumption For Anti-Churn in Utilities

Optimization

Filings For Fraud in Insurance

Not Enough Data To Learn From ? Not Enough “Hard" Examples So that you can learn

slide-26
SLIDE 26
slide-27
SLIDE 27

Dataiku - Pig, Hive and Cascading Welcome to Technoslavia

Hadoop Ceph Sphere Cassandra Kafka Flume Spark

Scikit-Learn GraphLAB prediction.io jubatus Mahout WEKA MLBase LibSVM

RapidMiner Panda Kibana InfiniDB Drill Spark SQL Hive Impala … Elastic Search SOLR MongoDB Riak Membase Pig Cascading Talend

Machine Learning Mystery Land Scalability Central SQL Colunnar Republic Vizualization County Data Clean Wasteland Statistician Old House

R

Real-time island

Storm

NOSQL Nihiland

slide-28
SLIDE 28

E m b r a c e M a n y S k i l l s M a n y - S e t s

Data Plumberer BI Manager Data Scientist Data Waiter Data Cleaner Business Analyst

REAL JOB DREAM JOB

slide-29
SLIDE 29
  • Reformulation de la

recherche

  • Pas de réponse
  • Clic sur un pro
  • Top recherche
  • Clic de navigation ou filtre

COMMENT AMÉLIORER LA PERTINENCE DE NOS RÉPONSES 
 VIA L’ANALYSE DU COMPORTEMENT UTILISATEUR ?

20 M

Analyse & corrections

automatisation

>10

  • ccurrences

1,4M

requêtes

>200M

recherches

✗ ✓

0,5M requêtes priorisées

slide-30
SLIDE 30

"PREDICTIVE CONTENT MANAGEMENT” FROM PAGES JAUNES

Machine Gestion Exploration

pagesjaunes.fr Annuaire hadoop PIG+Hive Export indexation Moteur d’interprétation crawl Autres référentiels Sickit-learn

slide-31
SLIDE 31

O p t i m i z i n g L a s t M i l e w i t h D a t a S c i e n c e S t u d i o Data Science Studio

Historical delivery and retrieval data Modeling of a score for each delivery Cleaning and temporal enrichment of data Data aggregation by geographic location Incorporation of new deliveries to the existing model

by

slide-32
SLIDE 32

E X P LO R E N E W W O R D S Mission Critical Small Structured Large Diverse Sheer Curiosity Optimization

Optimize Existing BI Capabilities Build Mandatory Large Volume Capabilities EXPLORE POTENTIAL NOT BEING RELEVANT DANGER ZONE

Analytics Predictive Self Service Cluster

slide-33
SLIDE 33

www.dataiku.com