A Big Data Arsenal for the 21 st Century Matt Asay VP, Marketing & Business Development, MongoDB MongoDB Inc. Proprietary and Confidential
7 million downloads 150,000 online education registrations 1,000 active subscribers 20,000 MongoDB Days 30,000 user attendees group members World’s fastest- growing database
What We Don’t Do “The relational database market is a $9 billion a year market. I want to shrink it to $3 billion and take a third of the market.” - Marten Mickos 3
What We Do Enable a Generation of Innovative, Modern Applications Previously Impossible Or Too Difficult to Achieve 4
The Big Data Unknown 5
Top Big Data Challenges? Translation? Most struggle to know what Big Data is, how to manage it and who can manage it Source: Gartner 6
Big Data Is Sort of a Matter of Volume • More than 90% of today’s data was created in the last 2 years • Moore’s Law for data: Doubles at regular intervals 9,000 9000 6750 4,400 4500 2,150 2250 1,000 500 250 120 55 10 24 1 4 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 7
Big(ger) Is the New Normal 2013 2014 2010 2011 2008 2009 2006 2007 8
Understanding Big Data – It’s Not Very “Big” 64% - Ingest diverse, new data in real-time 15% - More than 100TB of data 20% - Less than 100TB (average of all? <20TB) from Big Data Executive Summary – 50+ top executives from Government and F500 firms 9
Modern, Big Data Is Messy 10
Data Now Looks Like This 11
And This 12
And This 13
Doesn’t Fit Neatly into a “Spreadsheet” • 90% of the world’s data was created in the last two years • 80% of enterprise data is unstructured • Unstructured data growing 2X faster than structured 14
Back in 1970 … Cars Were Great! 15
So Were Computers! 16
Lots of Great Innovations Since 1970 17
New Tools for New Data 18
Innovation As Iteration
“I have not failed. I've just found 10,000 ways that won't work.” ― Thomas A. Edison
Must Be Open Source 21
Must Not Require Big Upfront Payment 22
Must Not Penalize Success “Clients can also opt to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.” IBM Press Release 28 Aug, 2012 23
Must Not Impede Iteration New New Table Column New Table Name Pet Phone Email New Column 3 months later … 24
Must Be Based on Industry Standards DB-Engines.com Database Ranking Ranking Database Type Score Changes 1 Oracle Relational 1491.8 -8.43 2 MySQL Relational 1290.21 1.83 3 Microsoft SQL Relational 1205.28 -8.99 4 PostgreSQL Relational 235.06 4.61 5 MongoDB Document 199.99 4.81 6 DB2 Relational 187.32 -1.14 7 Microsoft Access Relational 146.48 -6.4 8 SQLite Relational 92.98 -0.03 9 Sybase Relational 81.55 -6.33 10 Cassandra Wide Column 78.09 -2.23 25
Must Be Easy to Find Skills 26
Must Be Easy to Learn/Use “Organizations already have people who know their own data better than mystical data scientists … .Learning Hadoop [or MongoDB] is easier than learning the company’s business.” (Gartner, 2012) 27
When To Use Hadoop, Modern Databases
Enterprise Big Data Stack Management & Monitoring Applications CRM, ERP, Collaboration, Mobile, BI Security & Auditing Data Management Online Data Offline Data RDBMS Hadoop EDW RDBMS Infrastructure OS & Virtualization, Compute, Storage, Network 29
Consideration – Online vs. Offline Online Offline vs. • Real-time • Long-running • Low-latency • High-Latency • High availability • Availability is lower priority 30
Hadoop Is Good for … Recommendation Risk Modeling Churn Analysis Engine Transaction Trade Ad Targeting Analysis Surveillance Network Failure Search Quality Data Lake Prediction 31
MongoDB/NoSQL Is Good for … 360° View of the Mobile & Social Fraud Detection Customer Apps Content User Data Management & Reference Data Management Delivery Machine to Product Catalogs Data Hub Machine Apps 32
How To Use The Two Together?
Finding Waldo 34
Predictive Analytics Government Algorithms MongoDB + Hadoop • Predictive analytics system • Long-form trend analysis for crime, health issues • MongoDB data dumped into • Diverse, unstructured (incl. Hadoop, analyzed, re-inserted geospatial) data from 30+ into MongoDB for better real- agencies time response • Correlate data in real-time 35
Machine Learning Ad-Serving Algorithms MongoDB Connector for Hadoop • Catalogs and products • User segmentation • User profiles • Recommendation engine • Clicks • Prediction engine • Views • Transactions 36
Remember … • Modern data is messy • Your data infrastructure must support iteration • Modern data infrastructure market is crowded – But clear winners are distinguishing themselves – Bet on general purpose over niche, popular over obscure, open source over proprietary • Use MongoDB + Hadoop together 37
@mjasay Don’t believe me? MongoDB booth on Floor 3
Recommend
More recommend