hidden relations in data a signal processing on graphs
play

Hidden Relations in Data A Signal Processing on Graphs Approach - PowerPoint PPT Presentation

Carnegie Mellon Hidden Relations in Data A Signal Processing on Graphs Approach Jos M. F. Moura Phillip L. and Marsha Dowd University Professor moura@cmu.edu, www.ece.cmu.edu/~moura Acknowledgements: AFOSR grant FA95501210087 Work with:


  1. Carnegie Mellon Hidden Relations in Data – A Signal Processing on Graphs Approach José M. F. Moura Phillip L. and Marsha Dowd University Professor moura@cmu.edu, www.ece.cmu.edu/~moura Acknowledgements: AFOSR grant FA95501210087 Work with: Kar, Sandryhaila, Deri, Mei

  2. Carnegie Mellon Data Data Data Data Data Data Data Data Data Data Data Data  Big Data:  Variety, Volume, Velocity, Veracity, Variability, Value, Visualization  Unstructured, Distributed, •••  By 2020, all digital data created, replicated, consumed, in a year (IDC, Dec 2012):  44 ZB ≈ 170 M US LoC  44,000 EB  44,000,000 PB • 44,000,000,000 TB • 44,000,000,000,000 GB

  3. Carnegie Mellon Data: Traditional Signals  Time signals Speech Radar Signal Time series  Images, video Forbes, 03/05/ 2013 KU Band SAR Image Mendrelic: Time Lapse Nice Sandia Nat Lab http://vimeo.com/18554749

  4. Carnegie Mellon Data: Variety (Social, Web, Companies, …) Social networks Wireless Service Providers Web: hyperlinked blogs Linkedin Contacts Adamic, Glance Sensor Networks Internet Friendship Networks http://vidi.cs.ucdavis.edu/projects/ AggressionNetworks/

  5. Carnegie Mellon Data: The Old and the New  Time series  Company data

  6. Carnegie Mellon Analytics of Data Science: DSP on Graphs  (Linear) DSP for social, biological and physical graph data : Graph signal model, filters, filtering and convolution, impulse response, z - and Fourier transforms, spectrum, frequency response, … Sandryhaila & Moura, “DSP on graphs,” IEEE Tr-SP, 2013

  7. Carnegie Mellon Data Science: Graph Supports Associations  Graphical models: Markov random fields  Machine Learning approaches (Jordan, Willsky, …)  Data transforms:  Diffusion wavelets (Coifman, 2004), Regression analysis, wavelets on irregular sensor network (Baraniuk, 2005), filterbanks (Vandergheynst, 2011), separable wavelet filterbanks (Ortega, 2012)  Graph Laplacian (assumed undirected, non-negative weights)(Vandergheynst, Barbarossa, Ortega …)  Algebraic Signal Processing (ASP):  Pueschel and Moura (SIAM 2003, T-SP 2008, May, August)

  8. Carnegie Mellon Data: The Old and the New  Time series  Company data

  9. Carnegie Mellon DSP: Time Signals  Time signal:

  10. Carnegie Mellon DSP: Time Signals – Shift  Time signal:  Shift operator: Z -1  Shift matrix:

  11. Carnegie Mellon DSP: Time Signals – Graph 0 1 2 ••• n -1  Shift matrix: 0 1 2 • • • n-1  Graph:

  12. Carnegie Mellon Graph signal, shift, filters, DSP G : Graph Signals filtering/ convolution, impulse response, A-, Fourier-transforms, spectrum, frequency response  Time signals:  Cosine signal , k=0, …, 5  Average temperature in US cities  Website topics in hyperkinked blogs  Average # tweets

  13. Carnegie Mellon Graph signal, shift, filters, DSP G : Graph Shift filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Shift: Adjacency matrix  Graph shift: local operation, replace signal value at a node by weighted linear combination of values at neighbors:  1 st order interpolation, weighted averaging, regression on graphs

  14. Carnegie Mellon DSP G : Graph Filtering Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Graph signal:  Graph Filtering:

  15. Carnegie Mellon DSP G : Graph Filtering Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Graph signal:  Graph Filtering:  Shift Invariance: H and A commute H and A have same eigenvectors Graph filter polynomial in shift A

  16. Carnegie Mellon DSP G : Fourier Transform Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Discrete Fourier Transform (DFT):  Fourier Tr.:

  17. Carnegie Mellon DSP G : Fourier Transform Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Discrete Fourier Transform (DFT):  Fourier Tr.:

  18. Carnegie Mellon DSP G : Graph Shift & FT Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Diagonalization of the shift:  DFT and the shift:

  19. Carnegie Mellon Graph signal, shift, filters, DSP G : Graph Fourier Tr. filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  For simplicity: A is diagonalizable  Graph Fourier Tr.:  Inverse Graph Fourier Tr.:

  20. Carnegie Mellon Frequency  Time frequencies:

  21. Carnegie Mellon DSP G : Frequency Response Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Graph filter  Graph filter frequency response

  22. Carnegie Mellon DSP G : Convolution Thm Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Filtering/ convolution theorem

  23. Carnegie Mellon DSP G : Graph Low and High Pass Signals  Ordering frequencies: Total variation  For s a graph frequency component

  24. Carnegie Mellon DSP G : Graph Low and High Pass Signals

  25. Carnegie Mellon DSP G : Graph Low and High Pass Signals

  26. Carnegie Mellon DSP G : Political Blogs  Data: 1224 conservative & liberal political blogs  Graph: 1224 nodes (blogs), edges (hyperlinks) Adamic, Glance  Graph signal: blue 1, red - 1 Sandryhaila, Moura, DSP on Graphs, IEEE Tr-SP, 2013

  27. Carnegie Mellon DSP G : Blogosphere Low- vs High Pass  1224 political blogs, hyperlinked: conservative & liberal  Adjacency matrix given by hyperlinks A  Graph signal: Adamic, Glance  F ourier transform:  Frequency representation:

  28. Carnegie Mellon DSP G : Blogosphere Low- vs High Pass  1224 political blogs, hyperlinked: conservative & liberal  Adjacency matrix given by hyperlinks A  Graph signal: Adamic, Glance  Frequency representation:

  29. Carnegie Mellon DSP G : Classification–Political Blogsphere  1224 political blogs, hyperlinked: conservative & liberal  Adjacency matrix given by hyperlinks A Adamic, Glance  Semisupervised Classifier (filter+threshold):  Filter design:

  30. Carnegie Mellon DSP G : Classification–Political Blogsphere 1 21 2  Political blogs: 3 23 22 4 5 24 6 27 7 25 26 28 10 29 11 30 9 12 13 32 31 17 15 14 18 33 35 36 16 34 19 39 38 37 20 40 Most connected Adamic, Glance Random  Classifier P=10 : AGF: Globalsip, 2013: Chen. Sandryhaila, Moura, Kovacevic DF: Diffusion Functions, 2008, J. Mach Learn. Res., Szlam, Coiffman, Maggioni

  31. Carnegie Mellon DSP G : Service Provider–Predict Customer Behavior  3.7 Million customers: 3.6 M non-churners, 100K churners  Adjacency matrix:  10 months log: Learn from few churners in month A who will churn month A+1

  32. Carnegie Mellon DSP G : Service Provider–Predict Customer Behavior  Classifier Deri & Moura, ICASSP, May 2014

  33. Carnegie Mellon DSP G : Classification  Classification by regularization  News articles dataset: Belkin, Matveeva, Nyogi, 2004 Graph: 18,000 news articles, 20 topics, graph: randomly select 500 from each class, each article a vector of 6000 most common keywords, use cosine distance between keywords:  Graph:  Regularization: Globalsip, 2013: Sandryhaila, Moura

  34. Carnegie Mellon DSP G : News Article Dataset–Classification  News articles dataset: Belkin, Matveeva, Nyogi, 2004 Graph: 18,000 news articles, 20 topics, graph: randomly select 500 from each class, each article a vector of 6000 most common keywords, use cosine distance between keywords: Globalsip, 2013: Sandryhaila, Moura

  35. Carnegie Mellon DSP G : NIST Digits Database Classification  NIST database: 70 0 0 0 grayscale im ages of handwritten digits from 0 to 9. Random ly select 30 0 0 im ages of each digit, construct testing dataset of 30 0 0 0 im ages. The representation graph for this dataset is constructed by viewing each im age as a point in a 28 2 =78 4 dim ensional vector space, com puting Euclidean distances between all im ages, and connecting each im age with six nearest neighbors.

  36. Carnegie Mellon Big Data  Product graphs:  Kronecker graphs:  Cartesian products:  Strong product:

  37. Carnegie Mellon Big Data   

  38. Carnegie Mellon Filtering  Filtering: Cartesian graph  Filtering: Kronecker graphs  Filtering: Strong graphs

  39. Carnegie Mellon Big Data  Parallel constructs:  Vectorizable constructs:

  40. Carnegie Mellon Big Data: Fourier Transform  Cartesian graphs:  Kronecker graphs and Strong graphs:

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend