cs535 big data 1 22 2020 sangmi lee pallickara
play

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | - PDF document

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department | Colorado State University CS535 BIG DATA PART A. BIG DATA TECHNOLOGY 1. INTRODUCTION TO BIG DATA What is Big Data? Sangmi Lee Pallickara


  1. CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department | Colorado State University CS535 BIG DATA PART A. BIG DATA TECHNOLOGY 1. INTRODUCTION TO BIG DATA What is Big Data? Sangmi Lee Pallickara Computer Science, Colorado State University http://www.cs.colostate.edu/~cs535 CS535 Big Data | Computer Science Department | Colorado State University CS535 Big Data | Computer Science Department | Colorado State University Big Data The three(or four) Vs in Big Data • Things one can do at a large scale that cannot be done at a smaller one • Volume • Voluminous • To extract new insights • It does not have to be certain number of petabytes or quantity. • Create new forms of values • Velocity • How fast the data is coming in? • How fast you need to be able to analyze and utilize it • Big Data is about analytics of huge quantities of data in order to infer probabilities • Variety • Big Data is NOT about trying to “teach” a computer to “think” like humans • Number of sources or incoming vectors • Providing a quantitative dimension it never had before • Veracity • Can you trust the data itself, source of the data, or the process? • User entry errors, redundancy, corruption of the values • Data cleaning CS535 Big Data | Computer Science Department | Colorado State University CS535 Big Data | Computer Science Department | Colorado State University Who is using Big Data? http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University 1

  2. CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department | Colorado State University CS535 Big Data | Computer Science Department | Colorado State University Photo Credit:https://datafloq.com/read/car-manufacturers-are-using-big-data/1204 CS535 Big Data | Computer Science Department | Colorado State University CS535 Big Data | Computer Science Department | Colorado State University Connected cars • Single hybrid plug-in car generates up to 25 gigabytes per hour • Connected cars • $130 billion • Traffic problem, re-routing based on the volume of traffic • Alerts driver when a road conditions are hazardous by automatically activating anti-lock break • This information is shared by the vehicles that are nearby CS535 Big Data | Computer Science Department | Colorado State University CS535 Big Data | Computer Science Department | Colorado State University The Artemis project: Saving “preemies” using Big Data • The Artemis project • Dr. Carolyn McGregor • Toronto’s Hospital for Sick Children, University of Ontario Institute of Technology and IBM • Captures and process the patients’ data in real time • 16 different data streams • Heart rate, respiration rate, temperature, blood pressure and blood oxygen level • Around 1,260 data points per second • System detects subtle changes that may signal the onset of infection 24 hours before overt symptoms appear http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University 2

  3. CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department | Colorado State University CS535 Big Data | Computer Science Department | Colorado State University Look Who’s Peeking at Your Paycheck Related research areas • Experian’s Income Insight • Storage systems • Estimates people’s income level • How can we efficiently resolve queries on massive amounts of input data? • Based on their credit history • The input dataset may be presented in the form of a distributed data stream • Trains the estimation model using selected credit history and tax information from IRS • Machine learning • How can we efficiently solve large-scale machine learning problems? • The input data may be massive; stored in a distributed cluster of machines • Distributed computing • How can we efficiently solve large-scale optimization problems in distributed computing environments? • For example, how can we efficiently solve large-scale combinatorial problems, e.g. processing of large scale graphs? KAREN BLUMENTHAL, “Look Who’s Peeking at Your Paycheck”, The Wall Street Journal, Jan. 13, 2010, http://www.wsj.com/articles SB10001424052748703672104574654211904801106 CS535 Big Data | Computer Science Department | Colorado State University CS535 BIG DATA Big Data Lab at Colorado State University • Director: Sangmi Pallickara • Algorithmic and systems design • Scalable analytics over voluminous datasets PART A. BIG DATA TECHNOLOGY on complex distributed architectures 2. COURSE INTRODUCTION • Research has been deployed in the following domains • Precision agriculture, atmosphere science, environmental biology, ecology, civil engineering, bioinformatics, and public health Sangmi Lee Pallickara Computer Science, Colorado State University http://www.cs.colostate.edu/~cs535 CS535 Big Data | Computer Science Department | Colorado State University CS535 Big Data | Computer Science Department | Colorado State University People Big Data Lab at Colorado State University • Awards • Cochran Family Professorship 2018-2021 • IEEE TCSC Award for Excellence in Scalable Computing (Mid-Career Researcher) 2018 • National Science Foundation CAREER Award 2016 Sangmi Pallickara Saptashwa Mitra Walid Budgaga Dan Rammer • Funded by • The National Science Foundation Sam Armstrong • The Advanced Research Projects Agency-Energy (Department of Energy) Laksheen Mendis Undergraduate researchers at CURC • Department of Homeland Security Ryan Becwar • The Environmental Defense Fund Kevin Brewwiler • Google, Amazon, and Hewlett Packard Caleb Carlson Kartik Khurana Aaron Pereira Paahuni Khandelwal http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University 3

  4. CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department | Colorado State University CS535 Big Data | Computer Science Department | Colorado State University Goal of this course Communications [1/2] • Understanding fundamental concepts in Big Data Analytics • Course Website • Computing Systems + Scalable Algorithms and Models • http://www.cs.colostate.edu/~cs535 • Announcements: Check the course website at least twice a week. • Learn about existing technologies and how to apply them • Schedule (course materials, readings, assignments) Computing Algorithms systems and models • Policies Specialized • Canvas Graph models modeling tools • Assignment submission Computing Predictive • Grades frameworks models Storage systems Analytics and middle ware • Piazza • Discussion board CS535 Big Data | Computer Science Department | Colorado State University CS535 Big Data | Computer Science Department | Colorado State University Communications [2/2] Course Structure GEAR V: Algorithmic Techniques for Big Data • Contact Me Week 13, 14 • sangmi@colostate.edu GEAR IV: Large Scale Recommendation Systems and • Office hour: Friday 10:00AM ~ 11:00AM and by appointment Social Media: Week 11, 12 • Office: CSB456 GEAR III: Big Graph Analysis Research Group Meeting Week 9, 10 • URL: http://www.cs.colostate.edu/~sangmi When: 1:30-2:30pm Fridays GEAR II: Machine Learning for Big Data Where: CSB305 Week 7, 8 • Contact GTAs GEAR I: Peta-scale Storage Systems • Paahuni Khandelwal Week 5, 6 • Mohamed Chaabane Big Data Technology Week 1 ~ Week 4 • Office hours (in CSB120 and online office hours TBA) CS535 Big Data | Computer Science Department | Colorado State University CS535 Big Data | Computer Science Department | Colorado State University Course Structure | Part A: Big Data Technology Course Structure | Part B : GEAR Sessions • What is the GEAR Session? • Week 1 ~ Week 4 • Guided Exploration for Big Data Analytics Research • Big Data Technology • Purposes • Goals • Guided learning environment for advanced research topics in Big Data • Understand concepts of Big Data computing environment • Understanding different aspects of Big Data research with lectures and discussions • Hands-on experience • Topics • Sessions • Introduction to Big Data Session I. Peta-scale Storage Systems • Lambda Model Session II. Machine Learning for Big Data Session III. Big Graph Analysis • Quick view of MapReduce Session IV. Large Scale Recommendation Systems and Social Media • Introduction to Apache Spark Session V. Algorithmic Techniques for Big Data • Analytics with Apache Storm • Duration: 2 weeks/session • Up to 3 lectures • 1 student-led research discussion http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend