Big Data Processing Techniques
Chentao Wu Associate Professor
- Dept. of Computer Science and Engineering
Big Data Processing Techniques Chentao Wu Associate Professor - - PowerPoint PPT Presentation
Big Data Processing Techniques Chentao Wu Associate Professor Dept. of Computer Science and Engineering wuct@cs.sjtu.edu.cn Schedule lec1: Introduction on big data and cloud computing Iec2: Introduction on data storage lec3: Data
Contents
Unstructured Quasi-Structured Semi-Structured Structured
different types of files.
effort and software tools
analysis
Increasing Growth
Exponential increase in collected/generated data
To extract knowledge all these types of data need to linked together
you like send promotions right now for store next to you
any abnormal measurements require immediate reaction
Volume
storage and analysis
Velocity
data
real-time analysis
Variety
from numerous sources
integration, and analysis
Variability
changing meaning of data
gathering and interpretation
Veracity
and reliability of data
transforming and trusting data
Value
effectiveness and business value
Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data)
knowledge from the collected data in a timely manner and in a scalable fashion
Old Model: Few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data
than traditional DW applications
Exadata, Teradata) are not well-suited for big data apps
processing, scale out architectures are well-suited for big data apps
data
Contents
A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources, (e.g., servers, storage, networks, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.
– U.S. National Institute of Standards and Technology, Special Publication 800-145
Cloud Computing
Cloud Infrastructure
Applications Platform Software Network Compute Storage
LAN/WAN
Laptop Tablet and Mobile Desktop
Resource Pooling
3
Measured Service
5
Rapid Elasticity
4
Broad Network Access
2
On-demand self- service
1
Cloud Infrastructure
Software as a Service (SaaS)
3
Platform as a Service (PaaS)
2
Infrastructure as a Service (IaaS)
1
Cloud Infrastructure
The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and
control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components, (e.g., host firewalls).
– U.S. National Institute of Standards and Technology, Special Publication 800-145
Infrastructure as a Service
Cloud Infrastructure
Provider’s Resources Consumer’s Resources
Cloud Infrastructure
Provider’s Resources Consumer’s Resources
The capability provided to the consumer is to deploy
acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application- hosting environment.
– U.S. National Institute of Standards and Technology, Special Publication 800-145
Platform as a Service
Cloud Infrastructure
Provider’s Resources
The capability provided to the consumer is to use the provider’s applications running on a cloud
various client devices through either a thin client interface, such as a web browser, (e.g., web-based email, or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
– U.S. National Institute of Standards and Technology, Special Publication 800-145
Software as a Service
Private Cloud
2
Hybrid Cloud
4
Community Cloud
3
Public Cloud
1
Cloud Infrastructure
Enterprise P
Cloud Provider’s Resources
Enterprise Q Individual R
Enterprise P
Resources of Enterprise P
1) On-premise Private Cloud
Cloud Provider’s Resources
Dedicated for Enterprise P
Enterprise P
2) Externally-hosted Private Cloud
Resources of Enterprise P
Enterprise P
Resources of Enterprise Q
Enterprise Q Enterprise R
Cloud Provider’s Resources
Dedicated for Community Enterprise P Enterprise Q Enterprise R Community Users
Enterprise P
Resources of Enterprise P
Individual R
Cloud Provider’s Resources
Enterprise Q
Contents
2003 2004 2006