a big data arsenal for the 21 st century

A Big Data Arsenal for the 21 st Century Matt Asay VP, Marketing - PowerPoint PPT Presentation

A Big Data Arsenal for the 21 st Century Matt Asay VP, Marketing & Business Development, MongoDB MongoDB Inc. Proprietary and Confidential 7 million downloads 150,000 online education registrations 1,000 active subscribers 20,000


  1. A Big Data Arsenal for the 21 st Century Matt Asay VP, Marketing & Business Development, MongoDB MongoDB Inc. Proprietary and Confidential

  2. 7 million downloads 150,000 online education registrations 1,000 active subscribers 20,000 MongoDB Days 30,000 user attendees group members World’s fastest- growing database

  3. What We Don’t Do “The relational database market is a $9 billion a year market. I want to shrink it to $3 billion and take a third of the market.” - Marten Mickos 3

  4. What We Do Enable a Generation of Innovative, Modern Applications Previously Impossible Or Too Difficult to Achieve 4

  5. The Big Data Unknown 5

  6. Top Big Data Challenges? Translation? Most struggle to know what Big Data is, how to manage it and who can manage it Source: Gartner 6

  7. Big Data Is Sort of a Matter of Volume • More than 90% of today’s data was created in the last 2 years • Moore’s Law for data: Doubles at regular intervals 9,000 9000 6750 4,400 4500 2,150 2250 1,000 500 250 120 55 10 24 1 4 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 7

  8. Big(ger) Is the New Normal 2013 2014 2010 2011 2008 2009 2006 2007 8

  9. Understanding Big Data – It’s Not Very “Big” 64% - Ingest diverse, new data in real-time 15% - More than 100TB of data 20% - Less than 100TB (average of all? <20TB) from Big Data Executive Summary – 50+ top executives from Government and F500 firms 9

  10. Modern, Big Data Is Messy 10

  11. Data Now Looks Like This 11

  12. And This 12

  13. And This 13

  14. Doesn’t Fit Neatly into a “Spreadsheet” • 90% of the world’s data was created in the last two years • 80% of enterprise data is unstructured • Unstructured data growing 2X faster than structured 14

  15. Back in 1970 … Cars Were Great! 15

  16. So Were Computers! 16

  17. Lots of Great Innovations Since 1970 17

  18. New Tools for New Data 18

  19. Innovation As Iteration

  20. “I have not failed. I've just found 10,000 ways that won't work.” ― Thomas A. Edison

  21. Must Be Open Source 21

  22. Must Not Require Big Upfront Payment 22

  23. Must Not Penalize Success “Clients can also opt to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.” IBM Press Release 28 Aug, 2012 23

  24. Must Not Impede Iteration New New Table Column New Table Name Pet Phone Email New Column 3 months later … 24

  25. Must Be Based on Industry Standards DB-Engines.com Database Ranking Ranking Database Type Score Changes 1 Oracle Relational 1491.8 -8.43 2 MySQL Relational 1290.21 1.83 3 Microsoft SQL Relational 1205.28 -8.99 4 PostgreSQL Relational 235.06 4.61 5 MongoDB Document 199.99 4.81 6 DB2 Relational 187.32 -1.14 7 Microsoft Access Relational 146.48 -6.4 8 SQLite Relational 92.98 -0.03 9 Sybase Relational 81.55 -6.33 10 Cassandra Wide Column 78.09 -2.23 25

  26. Must Be Easy to Find Skills 26

  27. Must Be Easy to Learn/Use “Organizations already have people who know their own data better than mystical data scientists … .Learning Hadoop [or MongoDB] is easier than learning the company’s business.” (Gartner, 2012) 27

  28. When To Use Hadoop, Modern Databases

  29. Enterprise Big Data Stack Management & Monitoring Applications CRM, ERP, Collaboration, Mobile, BI Security & Auditing Data Management Online Data Offline Data RDBMS Hadoop EDW RDBMS Infrastructure OS & Virtualization, Compute, Storage, Network 29

  30. Consideration – Online vs. Offline Online Offline vs. • Real-time • Long-running • Low-latency • High-Latency • High availability • Availability is lower priority 30

  31. Hadoop Is Good for … Recommendation Risk Modeling Churn Analysis Engine Transaction Trade Ad Targeting Analysis Surveillance Network Failure Search Quality Data Lake Prediction 31

  32. MongoDB/NoSQL Is Good for … 360° View of the Mobile & Social Fraud Detection Customer Apps Content User Data Management & Reference Data Management Delivery Machine to Product Catalogs Data Hub Machine Apps 32

  33. How To Use The Two Together?

  34. Finding Waldo 34

  35. Predictive Analytics Government Algorithms MongoDB + Hadoop • Predictive analytics system • Long-form trend analysis for crime, health issues • MongoDB data dumped into • Diverse, unstructured (incl. Hadoop, analyzed, re-inserted geospatial) data from 30+ into MongoDB for better real- agencies time response • Correlate data in real-time 35

  36. Machine Learning Ad-Serving Algorithms MongoDB Connector for Hadoop • Catalogs and products • User segmentation • User profiles • Recommendation engine • Clicks • Prediction engine • Views • Transactions 36

  37. Remember … • Modern data is messy • Your data infrastructure must support iteration • Modern data infrastructure market is crowded – But clear winners are distinguishing themselves – Bet on general purpose over niche, popular over obscure, open source over proprietary • Use MongoDB + Hadoop together 37

  38. @mjasay Don’t believe me? MongoDB booth on Floor 3

Recommend


More recommend