Leveraging Artificial Intelligence and Big Data to Create Value - - PowerPoint PPT Presentation

leveraging artificial intelligence and big data to create
SMART_READER_LITE
LIVE PREVIEW

Leveraging Artificial Intelligence and Big Data to Create Value - - PowerPoint PPT Presentation

Leveraging Artificial Intelligence and Big Data to Create Value Dr. Sudha Ram Director, INSITE Center for Business Intelligence and Analytics Anheuser-Busch Professor of MIS, Entrepreneurship & Innovation Professor of Computer Science


slide-1
SLIDE 1

Leveraging Artificial Intelligence and Big Data to Create Value

Director, INSITE Center for Business Intelligence and Analytics Anheuser-Busch Professor of MIS, Entrepreneurship & Innovation Professor of Computer Science Eller College of Management Email: ram@eller.arizona.edu

  • Dr. Sudha Ram

August 19, 2020 EROSS-2020

slide-2
SLIDE 2

BIG DATA: From Petabytes to ZettaBytes

2

slide-3
SLIDE 3

Meaning of “BIG”

slide-4
SLIDE 4

Meaning of “BIG”

slide-5
SLIDE 5

5

Big Data – Traditionally Defined

VOLUME VARIETY VELOCITY VERACITY VALUE

slide-6
SLIDE 6

Diverse Sources of Data

Many Different Sources generating Data

slide-7
SLIDE 7

An Internet Minute

slide-8
SLIDE 8

PARADIGM SHIFT

PARADIGM SHIFT! “Datafication” of the world

Sensors embedded in Physical Objects IP Protocol based communication

slide-9
SLIDE 9
slide-10
SLIDE 10

Health Internet of Things

slide-11
SLIDE 11

Paradigm Shift

Temporal and Spatial Dimensions Billions of Users and Objects Leaving Massive Traces of Activity “Laboratory” for understanding the pulse

  • f humanity
slide-12
SLIDE 12

12

QUE QUEST f for the he HOL HOLY GR GRAIL

Predicting the Future

slide-13
SLIDE 13

13

INSITE Center for Business Intelligence and Analytics

  • Interdisciplinary Research Center at

University of Arizona

  • www.insiteua.org
slide-14
SLIDE 14

14

Creating a Smarter/Better World

  • Data Science and Network Science
  • Visualizations Using Time and Space
  • Scalable techniques for network analysis and graph mining
  • Predictive Modeling
  • Train students in Data science
  • Work on interesting research projects with industry partners to

solve real world problems

slide-15
SLIDE 15

15

RESEARCH PROJECTS

  • Health Care
  • Education
  • News Media/Journalism
  • Crowdfunding
  • Crowdsourcing
  • Internet of Things and Wearable devices
  • Social Media

SOCIAL IMPLICATIONS

slide-16
SLIDE 16

16

Leveraging Data Science

  • Define a problem/challenge
  • Identify signals
  • Use data science methods
  • Solve the problem

Repurposing Data is Key

slide-17
SLIDE 17

17

PREDICTION MODELS

Predict Emergency Department Visits in near Real Time Using Big Data  Freshman Retention Prediction  COVID-19 Research

slide-18
SLIDE 18

18

Leverage Big data

Big Data not just about volume

  • Social media
  • Internet search
  • Environmental sensors
  • Wearable sensors
  • Spatial and Temporal Dimensions
  • Fine Grained - Spatial/Temporal
slide-19
SLIDE 19

19

Focus on Asthma

  • 25 million people affected in the United States
  • 2 million emergency department (ED) visits
  • 0.5 million hospitalizations
  • 3,500 deaths
  • 50 billion dollars in medical costs annually
  • 11 million missed school days every year
  • 14 million missed work days every year

Source: CDC Reports (2011, 2012)

slide-20
SLIDE 20

20

Pediatric asthma ER Visits, USA, 2011

slide-21
SLIDE 21

21

Our Research Objective

Develop Robust Models to predict Asthma Related Emergency Department Visits in near Real Time Using Big Data Partner: Parkland Center for Clinical Innovation Joint work with Wenli Zhang, Dr. Yolande Pengetenze, Max Williams, funded in part by Parkland Center for Clinical Innovation

slide-22
SLIDE 22

22

Leverage Big data

Big Data not just about volume

  • Social media
  • Internet search
  • Environmental sensors
  • Wearable sensors
  • Spatial and Temporal Dimensions
  • Fine Grained - Spatial/Temporal
slide-23
SLIDE 23

23

EXTRACTING SIGNAL from Noisy Data

True asthma related tweets Not actually related to asthma

slide-24
SLIDE 24

24

Asthma Related Tweets

slide-25
SLIDE 25

25

Asthma Related Tweets

slide-26
SLIDE 26

26

Asthma Keywords

Asthma Inhaler Sneezing Runny Nose Wheezing

slide-27
SLIDE 27

27

Asthma Keywords

Asthma Inhaler Sneezing Runny Nose Wheezing

slide-28
SLIDE 28

28

Asthma-Related Stream

Twitter Asthma Stream - United States

Asthma related tweets, United States, (Asthma stream, 11 Oct, 2013 – 31 Dec, 2013)

slide-29
SLIDE 29

29

Extracting Signals

  • 1. Tweets indicating awareness of disease,

E.G., “Hope I don’t get an asthma attack again today..”

  • 2. Using disease as rhetoric, e.G., “He is so

cute I think I got asthma” Distinguish tweets that are relevant to asthma from tweets that mentioned asthma in an irrelevant context.

slide-30
SLIDE 30

30

Emergency Room Visits and Tweets

slide-31
SLIDE 31

31

Air Quality Sensor Data

  • Identify and include AQI data from a specific

geographic region.

  • Collected pollution data from 27 air quality sites

around the Dallas area.

  • Selected sites closest to the zip codes of the ED asthma

patients in our ED visits dataset. Using this data, we calculated daily average AQI for our model.

slide-32
SLIDE 32

32

Pollutants

  • CO: Carbon monoxide
  • NO2: Nitrogen dioxide
  • O3: Ozone
  • Pb: Lead
  • PM2.5: Atmospheric particulate matter, diameter of 2.5 micrometres
  • r less
  • PM10: Atmospheric particulate matter, diameter of 10 micrometres
  • r less
  • SO: Sulfur monoxide
slide-33
SLIDE 33

33

EPA Pollution Sensor Data and Emergency Visits

slide-34
SLIDE 34

34

Prediction Models Using Streaming Data

  • Air Quality Sensor data streams
  • Tweets
  • Google Trends search data
  • Machine Learning Techniques to predict

number of ED visits per day with high accuracy

slide-35
SLIDE 35

35

Best Predictors

Successfully predicted with 80% accuracy

  • # of asthma tweets
  • CO
  • NO2
  • PM2.5
slide-36
SLIDE 36

36

USEFUL for Public Health NOTIFICATION

I. Epidemiologic surveillance of asthma disease activity in the community, e.g., the department of health and human services (DHHS)

  • II. Stakeholders notifications of

community-level asthma- disease activity and risk factors

slide-37
SLIDE 37

37

Hospital/ED Preparedness

Predicting asthma ED visits and staffing ED consequently

slide-38
SLIDE 38

38

Targeted Patient Interventions

Targeted patient interventions using patient address and geo-localization data for tweets. E.g., patient alerts about asthma risks and counseling for preventive

methods.

slide-39
SLIDE 39

39

Contributions

Promising Results Demonstrate the utility and value of linking big data from diverse sources in developing predictive models for non-communicable diseases Specific focus on asthma Relevant for other chronic conditions – Diabetes, Cardiac problems, Obesity

slide-40
SLIDE 40

40

Internet of Things and Big Data

Big Data for Improving Education Internet of Things: Smart Cards, Wifi Logs, Mobile Apps

slide-41
SLIDE 41

41

BUILDING A SMARTER CAMPUS

slide-42
SLIDE 42

Combining Network Science and Machine Learning

42

Societal Challenge: Student Retention Proactive Prediction is very Important Social Science theories indicate:

  • Social Interactions
  • Regularity of Routine
slide-43
SLIDE 43

Objective

Predict freshman retention at individual level Make proactive prediction before knowing first term GPA Learn students’ behavioral patterns from their CatCard transactions Provide actionable suggestions for retention management

slide-44
SLIDE 44

BIG DATA

Institutional Student Dataset ~ 7000 full-time registered freshmen, 6500 are left after removing international students for whom SAT scores or high school GPAs were not available 479 (7.37%) drop-out after Fall and 843 (12.98%) drop-out at the end of Spring SmartCard Transaction Dataset 1.8 million transactions made by freshmen from Aug 2012 thru May 2013 271 different locations include restaurants, vending machines, printers, parking, labs.

slide-45
SLIDE 45

Behavior and Interactions

slide-46
SLIDE 46

46

Patterns and Differences

slide-47
SLIDE 47

Movement and Behavior

slide-48
SLIDE 48

COMPUTATIONAL and NETWORK SCIENCE APPROACH

Fills gaps in behavioral and extant data-driven approaches New prediction approach CatCard transactions  implicit social networks and spatial sequences Proactive prediction Predicting retention before the end of 1st semester with 90% recall

slide-49
SLIDE 49

COVID-19 Related Research Projects

49

slide-50
SLIDE 50

 What is Contact Tracing?  Digital vs. Manual Methods  Three Different methods

  • a. Manual contact Tracing
  • b. Manual with Digital assistance from Prompted Mobility Pathway

aka Memory Jogger

  • c. Digital: BlueTooth App for exposure notification

50

slide-51
SLIDE 51

51

slide-52
SLIDE 52

Memory Jogger using Wifi Logs

slide-53
SLIDE 53

 Working with Jeremy Frumkin, Research and Discovery

Technologies

 Using Wifi network logs with Catcard data to support strategic

efforts related to congestion tracking on campus and managing campus foot traffic

 Understanding Movement Patterns among Campus spaces  Complementing app-based and manual contact tracing efforts with

the additional insights that can be gained through the wifi logs.

 Design a Memory Jogger – prompted Mobility pathway tool to

enhance manual contact tracing

53

slide-54
SLIDE 54

Traffic/Crowd Analysis

Select Date: Feb 3, 2020 Time 8 am-9 am Building

User types

Traffic on campus between 8am and 9am Top ten traffic spots visualized and compared with selected building (in red) Comparison of hourly Traffic in selected building

slide-55
SLIDE 55

 To compare the three methods for Contact Tracing and Exposure

notification.

 How do the three contact tracing approaches differ in their outcomes such

as timeliness and coverage of contacts and other metrics?

 How do these methods complement each other and what are their relative

strengths and weaknesses?

 How do these methods perform overall in preserving privacy while allowing

for comprehensive contact tracing? What are the tradeoffs?

 How acceptable are these three strategies to the community and what is

an effective path to deploying comprehensive contact tracing?

55

slide-56
SLIDE 56

56

Some General Lessons

  • Need for complex techniques?
  • Is causality really necessary for prediction?
  • What level of accuracy is good?
  • Working with your stakeholders is important
  • Research is very important in training next

generation scientists, end users, students,

  • thers
slide-57
SLIDE 57

57

Some General Lessons

  • Focus on defining the problem carefully
  • Out of the Box thinking
  • Big Data: Don’t think of it as a single very large dataset
  • Repurpose and combine different types of data
  • Exploit the granularity of data especially the spatial and

temporal features: Machine learning and network science

  • Extracting Signal from Noise
slide-58
SLIDE 58

Good News

58

McKinsey in 2015: predicted that by 2020 the number of data science jobs in the United States alone will exceed 500,000, but there will be fewer than 200,000 available data scientists to fill these positions. Globally, demand for data scientists was projected to exceed supply by more than 50 percent by 2020. IBM today: Annual demand for the fast-growing new roles of data scientist, data developers, and data engineers will reach nearly 700,000 openings by 2020.

slide-59
SLIDE 59

59

CONCLUSION

  • PARADIGM SHIFT
  • BIG DATA HAS A LOT OF HIDDEN VALUE
  • LET’S LEVERAGE IT USING AI TO CREATE A

BETTER WORLD!

slide-60
SLIDE 60

60

QUESTIONS??

TEDx Talk:

http://tedxtucson.com/portfolio/sudha-ram/ www.insiteua.org Email: ram@eller.arizona.edu

Twitter: @sudharam