Introduction to Data Stream Mining
Albert Bifet March 2012
Introduction to Data Stream Mining Albert Bifet March 2012 - - PowerPoint PPT Presentation
Introduction to Data Stream Mining Albert Bifet March 2012 Motivation Source: IDCs Digital Universe Study (EMC), June 2011 Data is growing Motivation Memory unit Size Binary size 10 3 2 10 kilobyte (kB/KB) 10 6 2 20 megabyte (MB) 10 9
Albert Bifet March 2012
Source: IDC’s Digital Universe Study (EMC), June 2011
Memory unit Size Binary size kilobyte (kB/KB) 103 210 megabyte (MB) 106 220 gigabyte (GB) 109 230 terabyte (TB) 1012 240 petabyte (PB) 1015 250 exabyte (EB) 1018 260 zettabyte (ZB) 1021 270 yottabyte (YB) 1024 280
Source: IDC’s Digital Universe Study (EMC), June 2011
Source: IDC’s Digital Universe Study (EMC), June 2011
Source: IDC’s Digital Universe Study (EMC), June 2011
McKinsey Global Institute (MGI) Report on Big Data, 2011.
Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.
McKinsey Global Institute (MGI) Report on Big Data, 2011.
Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.
Sampling and distributed systems
Paolo Boldi Big Data does not need big machines, it needs big intelligence
We want to analyze what is happening now.
We want to analyze what is happening now.
Number 8 Wire Mentality
Time and memory are the resource dimensions of the process.
Time and memory are the resource dimensions of the process.
Classification, Regression, Clustering, Frequent Pattern Mining.
◮ sensor data: industry, cities ◮ telecomm data ◮ social networks: twitter, facebook, yahoo ◮ marketing: sales business
Data may come from: humans, sensors, or machines.