Matrix Sketching over Sliding Windows
Zhewei Wei1, Xuancheng Liu1, Feifei Li2, Shuo Shang1 Xiaoyong Du1, Ji-Rong Wen1
1 School of Information, Renmin University of China 2 School of Computing, The University of Utah
Sliding Windows Zhewei Wei 1 , Xuancheng Liu 1 , Feifei Li 2 , Shuo - - PowerPoint PPT Presentation
Matrix Sketching over Sliding Windows Zhewei Wei 1 , Xuancheng Liu 1 , Feifei Li 2 , Shuo Shang 1 Xiaoyong Du 1 , Ji-Rong Wen 1 1 School of Information, Renmin University of China 2 School of Computing, The University of Utah Matrix data
Zhewei Wei1, Xuancheng Liu1, Feifei Li2, Shuo Shang1 Xiaoyong Du1, Ji-Rong Wen1
1 School of Information, Renmin University of China 2 School of Computing, The University of Utah
Data Rows Columns d n Textual Documents Words 105 – 107 >1010 Actions Users Types 101 – 104 >107 Visual Images Pixels, SIFT 105 – 106 >108 Audio Songs, tracks Frequencies 105 – 106 >108 Machine Learning Examples Features 102 – 104 >106 Financial Prices Items, Stocks 103 – 105 >106
𝑏𝑜𝑒 𝑏1𝑒 𝑏11 𝑏𝑜1 ⋮ ⋮ … … … 𝐵 𝜀1 𝜀𝑒 𝜀2 … … ⋮ ⋮ ⋱ 𝑣𝑜𝑜 𝑣1𝑜 𝑣11 𝑣𝑜1 ⋮ ⋮ … … … 𝑤𝑜𝑒 𝑤𝑒1 𝑤11 𝑤1𝑒 ⋮ ⋮ … … … = 𝑉 Σ 𝑊𝑈 × ×
… … … ⋮ ⋮ ⋮
𝜀1
2
… … ⋮ ⋮ ⋱ 𝑤𝑜𝑒 𝑤𝑒1 𝑤11 𝑤1𝑒 ⋮ ⋮ … … … 𝑊𝑈 × × 𝑏𝑜𝑒 𝑏1𝑒 𝑏11 𝑏𝑜1 ⋮ ⋮ … … … = 𝑤𝑜𝑒 𝑤𝑒1 𝑤11 𝑤1𝑒 ⋮ ⋮ … … … 𝑊 𝜀2
2
𝜀𝑒
2
Σ2 𝐵𝑈 × Covariance Matrix 𝐵𝑈𝐵 𝑏𝑜𝑒 𝑏1𝑒 𝑏11 𝑏𝑜1 ⋮ ⋮ … … … 𝐵
𝑆𝑜×𝑒 with B ∈ 𝑆𝑚×𝑒, 𝑚 ≪ 𝑜, in an online fashion.
Woodruff2016]: 𝐵𝑈𝐵 − 𝐶𝑈𝐶 /||𝐵||𝐺
2 ≤ 𝜁.
projection [Papadimitriou2011], …
1 𝜁 , s.t. covariance error ≤ 𝜁. 𝐵 𝑒 𝑜 𝐶 𝑚 𝑏𝑗 𝑏𝑗
Covariance error: ||𝐵𝑋
𝑈 𝐵𝑋 − 𝐶𝑋 𝑈 𝐶𝑋||/||𝐵𝑋||𝐺 2 ≤ 𝜁
𝐵𝑋: 𝑂 rows 𝐵𝑋: rows in Δ time units
real-world applications.
sketching techniques are widely used.
anomalies [Papadimitriou2006, Qahtan2015].
𝑈𝑏𝑗
Theorem 4.1 An algorithm that returns 𝐵𝑈𝐵 for any sequence- based sliding window must use Ω(𝑂𝑒) bits space.
dimension 𝑒 is small.
techniques.
Sketches Update Space Window Interpretable? Sampling 𝑒 𝜁2 log log 𝑂𝑆 𝑒 𝜁2 log 𝑂𝑆 Sequence & time Yes LM-FD 𝑒 log 𝜁𝑂𝑆 1 𝜁2 log 𝜁𝑂𝑆 Sequence & time No DI-FD 𝑒 𝜁 log 𝑆 𝜁 𝑆 𝜁 log 𝑆 𝜁 Sequence No
[Datar2002] + Frequent Directions.
Directions.
Sketches Update Space Window Interpretable? Sampling Slow Large Sequence & time Yes LM-FD Fast Small Sequence & time No DI-FD Slow Best for small 𝑆 Sequence No
[Datar2002] + Frequent Directions.
Directions.
𝑆 = 8.35 𝑆 = 1 𝑆 =90089
Sketches Update Space Window Interpretable? Sampling Slow Large Sequence & time Yes LM-FD Fast Small Sequence & time No DI-FD Slow Best for small 𝑆 Sequence No
𝑆 = 8.35 𝑆 = 1 𝑆 =90089
Sketches Update Space Window Interpretable? Sampling Slow Large Sequence & time Yes LM-FD Fast Small Sequence & time No DI-FD Slow Best for small 𝑆 Sequence No
problem.
different from unbounded streaming model for the matrix sketching problem.
based windows with theoretical guarantee and experimental evaluation.