large scale graph analysis
play

Large Scale Graph Analysis Erik Saule HPC Lab Biomedical - PowerPoint PPT Presentation

Large Scale Graph Analysis Erik Saule HPC Lab Biomedical Informatics The Ohio State University March 11, 2013 UMass Boston Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule :: 1 / 43 HPC Lab


  1. Large Scale Graph Analysis Erik Saule HPC Lab Biomedical Informatics The Ohio State University March 11, 2013 UMass Boston Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule :: 1 / 43 HPC Lab http://bmi.osu.edu/hpc

  2. Outline Introduction 1 the advisor 2 Citation Analysis for Document Recommendation A High Performance Computing Problem Result Diversification Centrality 3 Compression and Shattering Storage format for GPU acceleration Incremental Algorithms Data Management 4 Middleware for Data Analysis Out-of-Core Computing Conclusion 5 Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule :: 2 / 43 HPC Lab http://bmi.osu.edu/hpc

  3. Data in the Modern Days Facebook 1B active users a month. Each day: 2.5B content items shared 2.7B Likes 300M photos 500TB data Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 3 / 43 HPC Lab http://bmi.osu.edu/hpc

  4. Data in the Modern Days Facebook 1B active users a month. Each day: 2.5B content items shared 2.7B Likes 300M photos 500TB data Twitter 500M users 340M tweets/day (2,200/sec) 24.1M super bowl tweets Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 3 / 43 HPC Lab http://bmi.osu.edu/hpc

  5. Data in the Modern Days Facebook Academic networks 1B active users a month. Each day: 1.5M papers/year (4,000/day) 2.5B content items shared 100,000 papers/year in CS 2.7B Likes 300M photos 500TB data Twitter 500M users 340M tweets/day (2,200/sec) 24.1M super bowl tweets Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 3 / 43 HPC Lab http://bmi.osu.edu/hpc

  6. Data in the Modern Days Facebook Academic networks 1B active users a month. Each day: 1.5M papers/year (4,000/day) 2.5B content items shared 100,000 papers/year in CS 2.7B Likes Transportation 300M photos 10M trips in Paris public 500TB data transportation/day Twitter 2.5M registered vehicles in LA 500M users 1.2M used for commuting/day 340M tweets/day (2,200/sec) 24.1M super bowl tweets Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 3 / 43 HPC Lab http://bmi.osu.edu/hpc

  7. Data in the Modern Days Facebook Academic networks 1B active users a month. Each day: 1.5M papers/year (4,000/day) 2.5B content items shared 100,000 papers/year in CS 2.7B Likes Transportation 300M photos 10M trips in Paris public 500TB data transportation/day Twitter 2.5M registered vehicles in LA 500M users 1.2M used for commuting/day 340M tweets/day (2,200/sec) Compositing 24.1M super bowl tweets Problems can also come from multiple sources, e.g., identify coauthors in Facebook. Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 3 / 43 HPC Lab http://bmi.osu.edu/hpc

  8. Are these problems new? “CERN report 1959” about a 1H experiment on the synchrocyclotron The use of the computer in this sort of measurement is important, not only because of the large amounts of data which must be handled, but because with a modern high speed computer one can search quickly for various systematic errors. Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 4 / 43 HPC Lab http://bmi.osu.edu/hpc

  9. Are these problems new? “CERN report 1959” about a 1H experiment on the synchrocyclotron The use of the computer in this sort of measurement is important, not only because of the large amounts of data which must be handled, but because with a modern high speed computer one can search quickly for various systematic errors. But also... Intrusion detection in computer security Search engines Stock market predictions Weather forecast Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 4 / 43 HPC Lab http://bmi.osu.edu/hpc

  10. Are these problems new? “CERN report 1959” about a 1H experiment on the synchrocyclotron The use of the computer in this sort of measurement is important, not only because of the large amounts of data which must be handled, but because with a modern high speed computer one can search quickly for various systematic errors. But also... Intrusion detection in computer security Search engines Stock market predictions Weather forecast Not so new! Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 4 / 43 HPC Lab http://bmi.osu.edu/hpc

  11. So why is it important now? Ubiquitous Scientist (LHC, Metagenomics) Big companies (Data companies, Operational marketing) Small companies (Website logs, who buys what? where?) People (Personal analytics) Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 5 / 43 HPC Lab http://bmi.osu.edu/hpc

  12. So why is it important now? Ubiquitous Scientist (LHC, Metagenomics) Big companies (Data companies, Operational marketing) Small companies (Website logs, who buys what? where?) People (Personal analytics) In brief, everybody has Big Data problems now! None of these data can be manually analyzed. Automatic analysis is mandatory. Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 5 / 43 HPC Lab http://bmi.osu.edu/hpc

  13. The Three Attributes of Big Data Velocity Variety Volume flowing in the system in high volume unstructured data Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 6 / 43 HPC Lab http://bmi.osu.edu/hpc

  14. The Three Attributes of Big Data Velocity Variety Volume flowing in the system in high volume unstructured data Millions, Graphs Streaming data Billions, Hypergraphs Temporal data Trillions Conceptual data Flow of queries of vertices and edges Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 6 / 43 HPC Lab http://bmi.osu.edu/hpc

  15. The Three Attributes of Big Data Velocity Variety Volume flowing in the system in high volume unstructured data Millions, Graphs Streaming data Billions, Hypergraphs Temporal data Trillions Conceptual data Flow of queries of vertices and edges Problems Storing and transporting such data Extracting the important data and building a graph (or else) Analyzing the graph: static analysis recurrent analysis temporal analysis Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 6 / 43 HPC Lab http://bmi.osu.edu/hpc

  16. My Goal Study Big Data problems and design solutions for them. Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 7 / 43 HPC Lab http://bmi.osu.edu/hpc

  17. My Goal Study Big Data problems and design solutions for them. Applications (Source) Facebook, the advisor , twitter, CiteULike, traffic camera, transportation systems Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 7 / 43 HPC Lab http://bmi.osu.edu/hpc

  18. My Goal Study Big Data problems and design solutions for them. Applications (Source) Facebook, the advisor , twitter, CiteULike, traffic camera, transportation systems Algorithms (Analysis) Page Rank, Random Walk, Traversals, Centrality, Community Detection, Outlier Detection, Visualization Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 7 / 43 HPC Lab http://bmi.osu.edu/hpc

  19. My Goal Study Big Data problems and design solutions for them. Middleware Applications (Source) MPI, Hadoop, Pegasus, Graph Lab, Facebook, the advisor , twitter, DOoC+LAF, DataCutter, SQL, CiteULike, traffic camera, SPARQL transportation systems Algorithms (Analysis) Page Rank, Random Walk, Traversals, Centrality, Community Detection, Outlier Detection, Visualization Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 7 / 43 HPC Lab http://bmi.osu.edu/hpc

  20. My Goal Study Big Data problems and design solutions for them. Middleware Applications (Source) MPI, Hadoop, Pegasus, Graph Lab, Facebook, the advisor , twitter, DOoC+LAF, DataCutter, SQL, CiteULike, traffic camera, SPARQL transportation systems Hardware Algorithms (Analysis) Clusters, Cray XMT, Intel Xeon Phi, Page Rank, Random Walk, FPGAS, SSD drives, NVRAM, Traversals, Centrality, Community Infiniband, Cloud Computing, GPU. Detection, Outlier Detection, Visualization Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 7 / 43 HPC Lab http://bmi.osu.edu/hpc

  21. My Goal Study Big Data problems and design solutions for them. Middleware Applications (Source) MPI, Hadoop, Pegasus, Graph Lab, Facebook, the advisor , twitter, DOoC+LAF, DataCutter, SQL, CiteULike, traffic camera, SPARQL transportation systems Hardware Algorithms (Analysis) Clusters, Cray XMT, Intel Xeon Phi, Page Rank, Random Walk, FPGAS, SSD drives, NVRAM, Traversals, Centrality, Community Infiniband, Cloud Computing, GPU. Detection, Outlier Detection, Visualization What to use? When to use them? What is missing? Ohio State University, Biomedical Informatics Large Scale Graph Analysis Erik Saule Introduction:: 7 / 43 HPC Lab http://bmi.osu.edu/hpc

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend