Introduction to Data Stream Mining Albert Bifet March 2012 - - PowerPoint PPT Presentation

introduction to data stream mining
SMART_READER_LITE
LIVE PREVIEW

Introduction to Data Stream Mining Albert Bifet March 2012 - - PowerPoint PPT Presentation

Introduction to Data Stream Mining Albert Bifet March 2012 Motivation Source: IDCs Digital Universe Study (EMC), June 2011 Data is growing Motivation Memory unit Size Binary size 10 3 2 10 kilobyte (kB/KB) 10 6 2 20 megabyte (MB) 10 9


slide-1
SLIDE 1

Introduction to Data Stream Mining

Albert Bifet March 2012

slide-2
SLIDE 2

Motivation

Source: IDC’s Digital Universe Study (EMC), June 2011

Data is growing

slide-3
SLIDE 3

Motivation

Memory unit Size Binary size kilobyte (kB/KB) 103 210 megabyte (MB) 106 220 gigabyte (GB) 109 230 terabyte (TB) 1012 240 petabyte (PB) 1015 250 exabyte (EB) 1018 260 zettabyte (ZB) 1021 270 yottabyte (YB) 1024 280

Data is growing

slide-4
SLIDE 4

Motivation

Source: IDC’s Digital Universe Study (EMC), June 2011

Data is growing

slide-5
SLIDE 5

Motivation

Source: IDC’s Digital Universe Study (EMC), June 2011

Data is growing

slide-6
SLIDE 6

Motivation

Source: IDC’s Digital Universe Study (EMC), June 2011

Data is growing

slide-7
SLIDE 7

Streaming Data

Big Data & Real Time

slide-8
SLIDE 8

Big Data

McKinsey Global Institute (MGI) Report on Big Data, 2011.

Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.

slide-9
SLIDE 9

Big Data

McKinsey Global Institute (MGI) Report on Big Data, 2011.

Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.

slide-10
SLIDE 10

Methodology

Sampling and distributed systems

slide-11
SLIDE 11

Methodology

Paolo Boldi Big Data does not need big machines, it needs big intelligence

slide-12
SLIDE 12

Real time analytics

We want to analyze what is happening now.

slide-13
SLIDE 13

Real time analytics

We want to analyze what is happening now.

slide-14
SLIDE 14

Time and Memory

Number 8 Wire Mentality

Time and memory are the resource dimensions of the process.

slide-15
SLIDE 15

Time and Memory

Time and memory are the resource dimensions of the process.

slide-16
SLIDE 16

Algorithms

Classification, Regression, Clustering, Frequent Pattern Mining.

slide-17
SLIDE 17

Applications

◮ sensor data: industry, cities ◮ telecomm data ◮ social networks: twitter, facebook, yahoo ◮ marketing: sales business

Data may come from: humans, sensors, or machines.

slide-18
SLIDE 18

Data Streams

Big Data & Real Time