New Computing In 2019 and Beyond - Opportunities, Challenges, and - PowerPoint PPT Presentation

New Computing In 2019 and Beyond - Opportunities, Challenges, and Threats Fromm Institute Fall 2019 - Lecture 3 Bebo White - bebo.white@gmail.com 1

calendar 2

how big is a billion? • how do we describe it? • 10 9 = 1,000,000,000 (to a scientist)? • is it really a big number? • how do we imagine/visualize it in order to make it real? 3

what can be said about data? (1/2) • a cosmic view(?) • a fundamental component of the universe - the quantum no- hiding theorem • nothing disappears from the Internet • perhaps our most important asset • the new oil, a new currency • is a billion pieces of data a lot? do you have/own a billion pieces of data? how would you count the data you own? how do you manage/use the data you own? 5

what can be said about data? (2/2) • we • generate it • collect it • depend on it • share it • analyze it • plan with it • protect it • (maybe) sell it • etc., etc. 6

a datum 7

two data 8

relationships between data 9

more data means more complexity 10

patterns emerge Patterns yield information and insight 11

slac depends on data patterns Linear Coherent Light Source (LCLS ) 12

“When the number of factors coming into play in a phenomenological complex is too large, [the] scientific method in most cases fails.” -Albert Einstein in Out of my later years 14

data extremes at lcls • one LCLS experiment generates (on average) 2.5 million images per day • the LCLS data team manages 10 petabytes of data - 3 times more than the total data library for Netflix 15

what’s a petabyte(pb)? • 10 15 bytes = 1 quadrillion bytes • it is estimated that the human brain has the storage capacity of 2.5 PB • 223,101 DVDs • is that a lot of data? (Big Data)? • how can it be managed? 17

the data deluge… • from the beginning of recorded time until 2003, mankind generated 5 exabytes of data • in 2011, every two days; in 2013, every 10 minutes • such numbers become almost meaningless 18

where is this data coming from? (1/2) • EVERYWHERE! • any communication over a network involves transfer of data that is meaningful to someone or something • every e-mail, every tweet, every transaction, every social media interaction, etc. etc. • sensors - IOT 20

where is this data coming from? (2/2) 21

consider the new forms of data • that maybe did not exist 20+ years ago • Internet data, derived from social media and other online interactions (including data gathered by connected people and devices) • tracking data, monitoring the movement of people and objects • satellite and aerial imagery, • etc., etc. • much of the value of ‘new forms of data’ lies in the potential for it to be analyzed in near real-time 22

and this doesn’t include science, business, etc. etc. 23

how is this data being used (consumed)? • the “poster children”/“large data generators” for datasets are: • personal/consumer use • scientific use • finance/business use • government use • etc, etc. • now, we are the experiments creating these datasets • Facebook knows what food and music we like and how we are likely to vote • advertisers use cookies and intelligent algorithms to create personalization • Amazon even claims to know what we want to (or will) buy next 24

characteristics of this data eco-system - the 4 v’s (1/2) • volume • size of datasets or aggregated datasets • velocity • data rate, pipeline, bandwidth 25

characteristics of this data eco-system - the 4 v’s (2/2) • variety • any type of data both structured and unstructured (?) or meaningful and meaningless (?) • veracity • trust, source/provenance • e.g., in Facebook what does “like” really mean? are emojis interpretable data? 26

“big data” - a possible definition - just volume? • refers to datasets whose size is beyond the ability of • single storage devices • typical database software tools to capture, store, manage, and analyze (McKinsey Global Institute) • this definition is not based upon data size (which will increase) • it can vary by sector/usage • usually unstructured • this is not a new issue 27

beyond capability • 1956 • 5 Mb storage • LCLS would require over 1 trillion of these per month • 1960s • 10 Mb storage 28

= 200,000 x 29

data storage is not really a problem • E.coli has a storage density of ~1.125 exabytes/cm 3 • at that density, all the world’s current storage needs for a year could fit in a m 3 cube of DNA • DNA can be sequenced (read), synthesized (written to), and accurately copied • DNA is stable; genome sequencing of DNA 500,000 years old 32

what is data science? • the addition of meaning to multivariate arrays of data • creative visualization of complex datasets • the collection of insights from dataset analytics (knowledge?) • the ability to substantiate decisions based on datasets 33

a popular introduction to data science • 2003 • detailed a strategy used by the Oakland A’s to use data to make pragmatic decisions that went against the traditional wisdom of baseball teams • the A’s were able to outcompete their rivals on a shoestring budget • what happens when you mix lots of data and smart people 34

data science components • domain/subject matter experts • data engineering/information architecture • statistics • visualization • advanced computing 35

one of the fun parts of data science is visualization 38

visualization in >3 dimensions is a challenge • our brains are “wired” for a 3D world • multivariate (>3 variables) is typically more rich/ informative, and interesting • historical efforts • can new technologies help> 44

Minard mixed data science, statistics, and art 45

visualization is fun • it can show relationships • it really isn’t analysis • does it support decision-making? • does to support prediction? 47

data science and data analytics are often used interchangeably • data science isn’t concerned with answering specific queries, instead parsing through massive datasets in sometimes unstructured ways to expose insights • data analytics works better when it is focussed, having questions in mind that need answers based on existing data • data science produces broader insights that concentrate on which questions should be asked • data analytics emphasizes discovering answers to questions being asked 48

crossover - data science/data analytics and ai - “sentiment analysis” • goal - gauging mood on social network data • huge data streams coming in very fast • social sites operate 24/7 • timeliness - not subject to time lags • too much and too subjective for human analysis • useful to marketers, IT, customers, law enforcement/ security agencies, political influencers , etc. 49

remember volume and velocity? 50

difficult comment analysis (1/2) • false negatives - “crying” and “crap” (negative) vs. “crying with joy” and “holy crap!” (positive) • relative sentiment - “I bought a Honda Accord” - great for Honda, bad for Toyota • compound sentiment - “I love the phone but hate the network” • conditional sentiment - “If someone doesn’t call me back, I’m never doing business with them again!” 51

difficult comment analysis (2/2) • scoring sentiment - “I like it” vs. “I really like it” vs. “I love it” • sentiment modifiers - “I bought an iPhone today :-)” “Gotta love the telephone company ;-<“ • international, cultural, etc. etc. specific sentiments 52

remember the course goals? • in particular • to help you to: • appreciate why some of these new computing technologies are unique, revolutionary, and disruptive • have the vocabulary and understanding to evaluate stories that you read/hear • participate knowingly with friends, relatives, colleagues in discussions on these topics 55

analyzing significant correlations between social media measures and sales 60

watson claims to be able to do this 61

sentiment analysis can work in the opposite direction - a threat? • results of analysis can feed into social media • IOT + AI become participants in social networks in almost realtime • how would these actions influence privacy, security, veracity of data? 62

comparisons between data science and ai (1/2) • meaning • DS is about curating large datasets for analytics and visualization • AI is implementing this data in a machine • skills • DS is about statistical technique design and development • AI is about algorithm technique design and development 64

New Computing In 2019 and Beyond - Opportunities, Challenges, and - PowerPoint PPT Presentation

New Computing In 2019 and Beyond - Opportunities, Challenges, and Threats Fromm Institute Fall 2019 - Lecture 3 Bebo White - bebo.white@gmail.com 1 calendar 2 how big is a billion? how do we describe it? 10 9 = 1,000,000,000 (to a

Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel

New Computing In 2019 and Beyond - Opportunities, Challenges, and Threats Fromm Institute Fall

New Computing In 2019 and Beyond - Opportunities, Challenges, and Threats Fromm Institute Fall

New Computing In 2019 and Beyond - Opportunities, Challenges, and Threats Fromm Institute Fall

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

Human Development Report 2019 Beyond income, beyond averages, beyond today: Inequalities in human

Potential use of EMS in Potential use of EMS in tuna tuna fisheries fisheries: : Challenges and

Introduction to R Thomas J. Leeper Department of Political Science and Government Aarhus

Autonomic Computing Research Issues, Challenges and Opportunities S. Hariri and M. Parashar

TITANIUM EYEWEAR DESIGNED IN ICELAND, MADE IN ITALY AGNAR NEW NEW NEW ALBA NEW NEW NEW

Zambia Zambia: Challenges 2010 : Challenges 2010-2011 2011 and Beyond and Beyond Perry Perone

Pervasive Computing: Opportunities and Challenges Dimitris Kalofonos Pervasive Computing Group

Human Development Report 2019 Beyond income, beyond averages, beyond today: Inequalities in human

India-Japan-China Trilateralism Challenges and Opportunities Challenges and Opportunities Vijay

Trustworthy Computing * Reverse engineers agree on that! Trustworthy Computing Trustworthy

COVID-19: Effects and Policy Responses Kunal Sen, Director, UNU-WIDER The COVID-19 Pandemic

Can a wage subsidy help reduce 50 percent youth unemployment? Amina Ebrahim and Jukka Pirttil

Research Town Hall May 13, 2020 To Town Hall Panelists Susan Meyn Jennifer Pietenpol, PhD

Subscription offers A resilient business model To fight COVID-19 Iaki Uriz, CEO of Caravelo

Searching String Collections for the Most Relevant Documents the Most Relevant Documents Wing

!"#$ %&'!(

Explainable AI for Human-Robot Collaboration Prof. Brad Hayes Bradley.Hayes@Colorado.edu

Unit 5: Principles of Contract Law D39PZ: Procurement and Contracts 2 Contracts in construction

New Computing In 2019 and Beyond - Opportunities, Challenges, and - PowerPoint PPT Presentation

New Computing In 2019 and Beyond - Opportunities, Challenges, and Threats Fromm Institute Fall 2019 - Lecture 3 Bebo White - bebo.white@gmail.com 1 calendar 2 how big is a billion? how do we describe it? 10 9 = 1,000,000,000 (to a

Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel

New Computing In 2019 and Beyond - Opportunities, Challenges, and Threats Fromm Institute Fall

New Computing In 2019 and Beyond - Opportunities, Challenges, and Threats Fromm Institute Fall

New Computing In 2019 and Beyond - Opportunities, Challenges, and Threats Fromm Institute Fall

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

Human Development Report 2019 Beyond income, beyond averages, beyond today: Inequalities in human

Potential use of EMS in Potential use of EMS in tuna tuna fisheries fisheries: : Challenges and

Introduction to R Thomas J. Leeper Department of Political Science and Government Aarhus

Autonomic Computing Research Issues, Challenges and Opportunities S. Hariri and M. Parashar

TITANIUM EYEWEAR DESIGNED IN ICELAND, MADE IN ITALY AGNAR NEW NEW NEW ALBA NEW NEW NEW

Zambia Zambia: Challenges 2010 : Challenges 2010-2011 2011 and Beyond and Beyond Perry Perone

Pervasive Computing: Opportunities and Challenges Dimitris Kalofonos Pervasive Computing Group

Human Development Report 2019 Beyond income, beyond averages, beyond today: Inequalities in human

India-Japan-China Trilateralism Challenges and Opportunities Challenges and Opportunities Vijay

Trustworthy Computing * Reverse engineers agree on that! Trustworthy Computing Trustworthy

COVID-19: Effects and Policy Responses Kunal Sen, Director, UNU-WIDER The COVID-19 Pandemic

Can a wage subsidy help reduce 50 percent youth unemployment? Amina Ebrahim and Jukka Pirttil

Research Town Hall May 13, 2020 To Town Hall Panelists Susan Meyn Jennifer Pietenpol, PhD

Subscription offers A resilient business model To fight COVID-19 Iaki Uriz, CEO of Caravelo

Searching String Collections for the Most Relevant Documents the Most Relevant Documents Wing

!&quot;#$ %&amp;'!(

Explainable AI for Human-Robot Collaboration Prof. Brad Hayes Bradley.Hayes@Colorado.edu

Unit 5: Principles of Contract Law D39PZ: Procurement and Contracts 2 Contracts in construction

!"#$ %&'!(