BIG DATA IN HYBRID WORLDS
The Story of M
BIG DATA IN HYBRID WORLDS The Story of M H i ! Im Florian CEO - - PowerPoint PPT Presentation
BIG DATA IN HYBRID WORLDS The Story of M H i ! Im Florian CEO of Dataiku maker Data Science Studio , the Photoshop for Data Science React on twitter COMMUNITY EDITION (its FREE)
The Story of M
I’m Florian CEO of Dataiku
maker Data ¡Science ¡Studio, the « Photoshop for Data Science » COMMUNITY ¡EDITION ¡(it’s ¡FREE) ¡ ¡
http://www.dataiku.com/dss/trynow/
H i ! React on twitter @fdouetteau #BigDataParis
B i g o r S m a l l
Startup Big Firm
H O W D O P E O P L E TA K E D E C I S I O N S
B U Y I N G D E C I S I O N S
Should I buy it ?
S O C I A L D E C I S I O N S
Should I talk to him ?
B u s i n e s s D e c i s i o n s
B u s i n e s s I n t e l l i g e n c e
B u s i n e s s I n t e l l i g e n c e
IN 2001 man (actually Gartner) invented big data
Volume Variety Velocity
Capacity Complexity Celerity Size Serendipity Speed Big Blur Blazing
M L I K E M E T R I C S
How much does it cost to produce and maintain a metric ? How many metrics do I need ? Do I Follow the right metrics ? Do I Have enough data ? Do I Have enough Data?
Build your own metrics
Find your patterns
Store it all
M o r e M e t r i c s M e a n s M o r e M e a n s
DATA MINING
M o r e M e t r i c s M e a n s M o r e A p p l i c a t i o n Mission Critical Small Structured Large Diverse Sheer Curiosity
Reporting for Finance in Any Industry Analyze Each Tweet Web Navigation For E-Merchant Ticket Data For Discounts in Retail Phone Call Logs for Security RTB Data For Advertising Customer Consumption For Anti-Churn in Utilities
CLASSIC BI LARGE PRODUCTION PLATFORM DATA EXPLORATION
Optimization
Filings For Fraud in Insurance
D DATA MINING
TO DAY E A C H O W N A S I T S S TO R E Mission Critical Small Structured Large Diverse Sheer Curiosity
CLASSIC BI LARGE PRODUCTION PLATFORM DATA EXPLORATION
Optimization
DATA WAREHOUSING DATA MINING REPOSITORIES DATA LAKE GOOGLE LIKE PLATFORM
i t ’s n o t j u s t a b o u t t h e m e t r i c s
DATA D R I V E N B U S I N E S S
P r o b l e m i s t h e h u m a n
Cannot take decisions in seconds Limited sight (100 rows) Limited short term memory (10k rows)?
R i s e o f A I
1997 Deep Blue 2011 Watson’s Jeopardy 2012 Google Cat 2005 Autonomous Vehicule 1974 - 1993 AI Winters
www.dataiku.com
Churn Volume Forecast Recommender Segmentation Lifetime Value Risk Score Hot Location Pricing Ranking Fraud Event Paths APPLICATIONS OF MACHINE LEARNING TO BUSINESS PROBLEMS
P R E D I C T I V E M A I N C O N F O R T Z O N E Mission Critical Small Structured Large Diverse Sheer Curiosity
Reporting for Finance in Any Industry Analyze Each Tweet Web Navigation For E-Merchant Ticket Data For Discounts in Retail Phone Call Logs for Security RTB Data For Advertising Customer Consumption For Anti-Churn in Utilities
Optimization
Filings For Fraud in Insurance
Not Enough Data To Learn From ? Not Enough “Hard" Examples So that you can learn
Dataiku - Pig, Hive and Cascading Welcome to Technoslavia
Hadoop Ceph Sphere Cassandra Kafka Flume Spark
Scikit-Learn GraphLAB prediction.io jubatus Mahout WEKA MLBase LibSVM
RapidMiner Panda Kibana InfiniDB Drill Spark SQL Hive Impala … Elastic Search SOLR MongoDB Riak Membase Pig Cascading Talend
Machine Learning Mystery Land Scalability Central SQL Colunnar Republic Vizualization County Data Clean Wasteland Statistician Old House
R
Real-time island
Storm
NOSQL Nihiland
E m b r a c e M a n y S k i l l s M a n y - S e t s
Data Plumberer BI Manager Data Scientist Data Waiter Data Cleaner Business Analyst
REAL JOB DREAM JOB
recherche
COMMENT AMÉLIORER LA PERTINENCE DE NOS RÉPONSES VIA L’ANALYSE DU COMPORTEMENT UTILISATEUR ?
Analyse & corrections
automatisation
>10
1,4M
requêtes
>200M
recherches
0,5M requêtes priorisées
Machine Gestion Exploration
pagesjaunes.fr Annuaire hadoop PIG+Hive Export indexation Moteur d’interprétation crawl Autres référentiels Sickit-learn
O p t i m i z i n g L a s t M i l e w i t h D a t a S c i e n c e S t u d i o Data Science Studio
Historical delivery and retrieval data Modeling of a score for each delivery Cleaning and temporal enrichment of data Data aggregation by geographic location Incorporation of new deliveries to the existing model
by
E X P LO R E N E W W O R D S Mission Critical Small Structured Large Diverse Sheer Curiosity Optimization
Optimize Existing BI Capabilities Build Mandatory Large Volume Capabilities EXPLORE POTENTIAL NOT BEING RELEVANT DANGER ZONE
Analytics Predictive Self Service Cluster
www.dataiku.com