Google Cloud for Data Crunchers
Patrick Chanezon, Developer Advocate, Cloud @chanezon, chanezon@google.com Ryan Boyd, Developer Advocate, Apps @ryguyrg, rboyd@google.com Kirrily Robert, Data Engineer, Freebase.com @skud, skud@google.com
Google Cloud for Data Crunchers Patrick Chanezon, Developer - - PowerPoint PPT Presentation
Google Cloud for Data Crunchers Patrick Chanezon, Developer Advocate, Cloud @chanezon, chanezon@google.com Ryan Boyd, Developer Advocate, Apps @ryguyrg, rboyd@google.com Kirrily Robert, Data Engineer, Freebase.com @skud, skud@google.com
Google Cloud for Data Crunchers
Patrick Chanezon, Developer Advocate, Cloud @chanezon, chanezon@google.com Ryan Boyd, Developer Advocate, Apps @ryguyrg, rboyd@google.com Kirrily Robert, Data Engineer, Freebase.com @skud, skud@google.com
Developer Day Google
2010
Agenda
Developer Day Google
2010
Google App Engine
What is cloud computing?
Developer Day Google
2010
IaaS PaaS SaaS
Source: Gartner AADI Summit Dec 2009Cloud Computing Defined
Developer Day Google
2010
Google Storage Prediction API BigQuery
Your Apps
Google Apps Marketplace
Google App Engine
IaaS PaaS SaaS
Google's Cloud Offerings
Google App Engine
Cloud development in a box
8App Engine Services
Blobstore
Images
Mail XMPP Task Queue Memcache Datastore URL Fetch User Service
9Always free to get started
~5M pageviews/month
Purchase additional resources *
* free monthly quota of ~5 million page views still in full effect
11Developer Day Google
2010
Google App Engine for Business
Same scalable cloud hosting platform. Designed for the enterprise.
– Centralized domain console
– 99.9% Service Level Agreement – Premium Developer Support
– Managed relational SQL database in the cloud
– Including "naked" domain support
– Integrated Single Sign On (SSO)
– Pay only for what you use
Google App Engine for Business * Hosted SQL and SSL on your domain available later this yearDeveloper Day Google
2010
App Engine for Data Crunchers
Developer Day Google
2010
Mapper API
engine/
Developer Day Google
2010
Channel API
io-2010.html
browse_thread/thread/6fa09953ffae2cd3/c1db7de5fdb82b65?pli=1#
Developer Day Google
2010
Matcher API
stream of documents
AppEngineMatcherService
matcher-sample
Developer Day Google
2010
Google Storage for Developers
Store your data in Google's cloud
Developer Day Google
2010
What Is Google Storage?
Developer Day Google
2010
Google Storage Technical Details
RESTful API
http://commondatastorage.googleapis.com/bucket/object
Buckets
Developer Day Google
2010
Performance and Scalability
Object types and size
Replication
Consistency
Developer Day Google
2010
Security and Privacy Features
Authenticated downloads from a web browser
Permissions set on Buckets or Objects
Developer Day Google
2010
Tools
Google Storage Manager gsutil
Developer Day Google
2010
Google Storage Benefits
High Performance and Scalability Backed by Google infrastructure Strong Security and Privacy Control access to your data Easy to Use Get started fast with Google & 3rd party tools
Developer Day Google
2010
Some Early Google Storage Adopters
Developer Day Google
2010
Google Storage usage within Google
Haiti Relief Imagery USPTO data Partner ReportingGoogle BigQuery Google Prediction API
Partner ReportingDeveloper Day Google
2010
Google Storage - Availability
Limited preview in US* currently
* Non-US preview available on case-by-case basis
Developer Day Google
2010
Google Prediction API
Google's prediction engine in the cloud
Developer Day Google
2010
Introducing the Google Prediction API
Developer Day Google
2010
Customer Sentiment Transaction Risk Species Identification Message Routing Legal Docket Classification Suspicious Activity Work Roster Assignment Recommend Products Political Bias Uplift Marketing Diagnostics Inappropriate Content Career Counseling Churn Prediction ... and many more ...
A virtually endless number of applications...
Email Filtering
Developer Day Google
2010
"english" The quick brown fox jumped over the lazy dog. "english" To err is human, but to really foul things up you need a computer. "spanish" No hay mal que por bien no venga. "spanish" La tercera es la vencida.
? To be or not to be, that is the question. ?
La fe mueve montañas.
The Prediction API later searches for those features during prediction.
How does it work?
The Prediction API finds relevant features in the sample data during training.
Developer Day Google
2010
Introducing the Google Prediction API
Developer Day Google
2010
Automatically determine application recommendations
recommendations
using Google Apps around the world
appropriate for a new customer visiting the site
A Prediction API Example
Developer Day Google
2010
Using the Prediction API
Upload your training data to Google Storage Build a model from your data Make new predictions
A simple three step process...
Developer Day Google
2010
Upload your training data to Google Storage
"SlideRocket","EDUCATION","us","en","10","5" "MailChimp","BUSINESS","us","en","7","0" "MailChimp","STANDARD","se","sv","1","0" "Smartsheet","BUSINESS","us","en","13","4" Upload to Google Storage gsutil cp installs gs://appdata/
Step 1: Upload
Developer Day Google
2010
Create a new model by training on data
To train a model: POST prediction/v1.1/training?data=appdata%2Finstalls
Training runs asynchronously. To see if it has finished: GET prediction/v1.1/training/appdata%2Finstalls {"data":{ "data":"appdata/installs", "modelinfo":"estimated accuracy: 0.xx"}}}
Step 2: Train
Developer Day Google
2010
Apply the trained model to make predictions on new data
POST prediction/v1.1/query/appdata%2Finstalls/predict { "data":{ "input": { "mixture" : [ "EDUCATION","us","en","10","0" ]}}} { data : { "kind" : "prediction#output", "outputLabel":"Manymoon", "outputMulti" :[ {"label":"OffiSync", "score": x.xx} {"label":"Zoho CRM", "score": x.xx} {"label":"MailChimp", "score": x.xx}]}}
Step 3: Predict
Developer Day Google
2010
Developer Day Google
2010
Demo Screenshots
Predicting apps for a 501-1,000 seat educational institution
Developer Day Google
2010
Demo Screenshots
Predicting apps for a 501-1,000 seat educational institution
Developer Day Google
2010
Demo Screenshots
Predicting apps for a small business
Developer Day Google
2010
Demo Screenshots
Predicting apps for a small business
Developer Day Google
2010
Data
continuous values Training
Access from many platforms:
Prediction API Capabilities
Developer Day Google
2010
Prediction API - Pricing
Free Quota in trial/development
Paid Usage
Developer Day Google
2010
Google Storage - Availability
Limited preview in US* currently
* Non-US preview available on case-by-case basis
Developer Day Google
2010
Google BigQuery
Interactive analysis of large datasets in Google's cloud
Developer Day Google
2010
Introducing Google BigQuery
– Google's large data adhoc analysis technology
– Simple SQL-like query language – Flexible access
Developer Day Google
2010
Working with large data is a challenge
Why BigQuery?
Developer Day Google
2010
Spam
Trends Detection Web Dashboards Network Optimization Interactive Tools
Many Use Cases ...
Developer Day Google
2010
Key Capabilities of BigQuery
Developer Day Google
2010
Upload your raw data to Google Storage Import raw data into BigQuery table Perform SQL queries
Another simple three step process...
Using BigQuery
Developer Day Google
2010
Compact subset of SQL
WHERE ... GROUP BY ... ORDER BY ... LIMIT ...; Common functions
Additional statistical approximations
Writing Queries
Developer Day Google
2010
GET /bigquery/v1/tables/{table name} GET /bigquery/v1/query?q={query}
Sample JSON Reply:
{ "results": { "fields": { [ {"id":"COUNT(*)","type":"uint64"}, ... ] }, "rows": [ {"f":[{"v":"2949"}, ...]}, {"f":[{"v":"5387"}, ...]}, ... ] } }
Also supports JSON-RPC
BigQuery via REST
Developer Day Google
2010 Standard Google Authentication
HTTPS support
Relies on Google Storage to manage access
Security and Privacy
Developer Day Google
2010
Wikimedia Revision history data from: http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7zWikimedia Revision History
Large Data Analysis Example
Developer Day Google
2010
Python DB API 2.0 + B. Clapper's sqlcmd http://www.clapper.org/software/python/sqlcmd/
Using BigQuery Shell
Developer Day Google
2010
BigQuery from a Spreadsheet
Developer Day Google
2010
Google Fusion Tables
Developer Day Google
2010
Google Fusion Tables
Developer Day Google
2010
Google Fusion Tables
Developer Day Google
2010
Google Visualization API
Developer Day Google
2010
Google Visualization API
Developer Day Google
2010
Example: Weather data
Developer Day Google
2010
Example: Weather data
Developer Day Google
2010
Google Refine
Developer Day Google
2010
Google Refine
Developer Day Google
2010
Google Refine
Developer Day Google
2010
Recap
Developer Day Google
2010
http://code.google.com/apis/ http://code.google.com/more/table/
More information