New Computing In 2019 and Beyond - Opportunities, Challenges, and - - PowerPoint PPT Presentation

new computing in 2019 and beyond opportunities challenges
SMART_READER_LITE
LIVE PREVIEW

New Computing In 2019 and Beyond - Opportunities, Challenges, and - - PowerPoint PPT Presentation

New Computing In 2019 and Beyond - Opportunities, Challenges, and Threats Fromm Institute Fall 2019 - Lecture 3 Bebo White - bebo.white@gmail.com 1 calendar 2 how big is a billion? how do we describe it? 10 9 = 1,000,000,000 (to a


slide-1
SLIDE 1

New Computing In 2019 and Beyond - Opportunities, Challenges, and Threats

Fromm Institute Fall 2019 - Lecture 3 Bebo White - bebo.white@gmail.com

1

slide-2
SLIDE 2

calendar

2

slide-3
SLIDE 3

how big is a billion?

  • how do we describe it?
  • 109 = 1,000,000,000 (to a scientist)?
  • is it really a big number?
  • how do we imagine/visualize it in order to make

it real?

3

slide-4
SLIDE 4

4

slide-5
SLIDE 5

what can be said about data? (1/2)

  • a cosmic view(?)
  • a fundamental component of the universe - the quantum no-

hiding theorem

  • nothing disappears from the Internet
  • perhaps our most important asset
  • the new oil, a new currency
  • is a billion pieces of data a lot? do you have/own a billion pieces
  • f data? how would you count the data you own? how do you

manage/use the data you own?

5

slide-6
SLIDE 6

what can be said about data? (2/2)

  • we
  • generate it
  • collect it
  • depend on it
  • share it
  • analyze it
  • plan with it
  • protect it
  • (maybe) sell it
  • etc., etc.

6

slide-7
SLIDE 7

a datum

7

slide-8
SLIDE 8

two data

8

slide-9
SLIDE 9

relationships between data

9

slide-10
SLIDE 10

more data means more complexity

10

slide-11
SLIDE 11

patterns emerge

11

Patterns yield information and insight

slide-12
SLIDE 12

slac depends on data patterns

12

Linear Coherent Light Source (LCLS)

slide-13
SLIDE 13

13

slide-14
SLIDE 14
  • Albert Einstein in Out of my later years

“When the number of factors coming into play in a phenomenological complex is too large, [the] scientific method in most cases fails.”

14

slide-15
SLIDE 15

data extremes at lcls

  • one LCLS experiment generates (on average) 2.5

million images per day

  • the LCLS data team manages 10 petabytes of data -

3 times more than the total data library for Netflix

15

slide-16
SLIDE 16

16

slide-17
SLIDE 17

what’s a petabyte(pb)?

  • 1015 bytes = 1 quadrillion bytes
  • it is estimated that the human

brain has the storage capacity

  • f 2.5 PB
  • 223,101 DVDs
  • is that a lot of data? (Big

Data)?

  • how can it be managed?

17

slide-18
SLIDE 18

the data deluge…

  • from the beginning of recorded time until 2003,

mankind generated 5 exabytes of data

  • in 2011, every two days; in 2013, every 10 minutes
  • such numbers become almost meaningless

18

slide-19
SLIDE 19

19

slide-20
SLIDE 20

where is this data coming from? (1/2)

  • EVERYWHERE!
  • any communication over a network involves

transfer of data that is meaningful to someone or something

  • every e-mail, every tweet, every transaction, every

social media interaction, etc. etc.

  • sensors - IOT

20

slide-21
SLIDE 21

where is this data coming from? (2/2)

21

slide-22
SLIDE 22

consider the new forms

  • f data
  • that maybe did not exist 20+ years ago
  • Internet data, derived from social media and other online

interactions (including data gathered by connected people and devices)

  • tracking data, monitoring the movement of people and objects
  • satellite and aerial imagery,
  • etc., etc.
  • much of the value of ‘new forms of data’ lies in the potential for it

to be analyzed in near real-time

22

slide-23
SLIDE 23

and this doesn’t include science, business, etc. etc.

23

slide-24
SLIDE 24

how is this data being used (consumed)?

  • the “poster children”/“large data generators” for datasets are:
  • personal/consumer use
  • scientific use
  • finance/business use
  • government use
  • etc, etc.
  • now, we are the experiments creating these datasets
  • Facebook knows what food and music we like and how we are likely to vote
  • advertisers use cookies and intelligent algorithms to create personalization
  • Amazon even claims to know what we want to (or will) buy next

24

slide-25
SLIDE 25

characteristics of this data eco-system - the 4 v’s (1/2)

  • volume
  • size of datasets or aggregated datasets
  • velocity
  • data rate, pipeline, bandwidth

25

slide-26
SLIDE 26

characteristics of this data eco-system - the 4 v’s (2/2)

  • variety
  • any type of data both structured and unstructured

(?) or meaningful and meaningless (?)

  • veracity
  • trust, source/provenance
  • e.g., in Facebook what does “like” really mean?

are emojis interpretable data?

26

slide-27
SLIDE 27

“big data” - a possible definition - just volume?

  • refers to datasets whose size is beyond the ability of
  • single storage devices
  • typical database software tools to capture, store, manage, and

analyze (McKinsey Global Institute)

  • this definition is not based upon data size (which will increase)
  • it can vary by sector/usage
  • usually unstructured
  • this is not a new issue

27

slide-28
SLIDE 28

beyond capability

  • 1956
  • 5 Mb storage
  • LCLS would require over

1 trillion of these per month

  • 1960s
  • 10 Mb storage

28

slide-29
SLIDE 29

29

= 200,000 x

slide-30
SLIDE 30

30

slide-31
SLIDE 31

31

slide-32
SLIDE 32

data storage is not really a problem

  • E.coli has a storage density of

~1.125 exabytes/cm3

  • at that density, all the world’s

current storage needs for a year could fit in a m3 cube of DNA

  • DNA can be sequenced (read),

synthesized (written to), and accurately copied

  • DNA is stable; genome sequencing
  • f DNA 500,000 years old

32

slide-33
SLIDE 33

what is data science?

  • the addition of meaning to multivariate arrays of

data

  • creative visualization of complex datasets
  • the collection of insights from dataset analytics

(knowledge?)

  • the ability to substantiate decisions based on

datasets

33

slide-34
SLIDE 34

a popular introduction to data science

  • 2003
  • detailed a strategy used by the

Oakland A’s to use data to make pragmatic decisions that went against the traditional wisdom of baseball teams

  • the A’s were able to outcompete

their rivals on a shoestring budget

  • what happens when you mix lots
  • f data and smart people

34

slide-35
SLIDE 35

data science components

  • domain/subject matter experts
  • data engineering/information architecture
  • statistics
  • visualization
  • advanced computing

35

slide-36
SLIDE 36

36

slide-37
SLIDE 37

37

slide-38
SLIDE 38
  • ne of the fun parts of data

science is visualization

38

slide-39
SLIDE 39

39

slide-40
SLIDE 40

40

slide-41
SLIDE 41

41

slide-42
SLIDE 42

42

slide-43
SLIDE 43

43

slide-44
SLIDE 44

visualization in >3 dimensions is a challenge

  • our brains are “wired” for a 3D world
  • multivariate (>3 variables) is typically more rich/

informative, and interesting

  • historical efforts
  • can new technologies help>

44

slide-45
SLIDE 45

45

Minard mixed data science, statistics, and art

slide-46
SLIDE 46

46

slide-47
SLIDE 47

visualization is fun

  • it can show relationships
  • it really isn’t analysis
  • does it support decision-making?
  • does to support prediction?

47

slide-48
SLIDE 48

data science and data analytics are often used interchangeably

  • data science isn’t concerned with answering specific queries,

instead parsing through massive datasets in sometimes unstructured ways to expose insights

  • data analytics works better when it is focussed, having

questions in mind that need answers based on existing data

  • data science produces broader insights that concentrate on

which questions should be asked

  • data analytics emphasizes discovering answers to questions

being asked

48

slide-49
SLIDE 49

crossover - data science/data analytics and ai - “sentiment analysis”

  • goal - gauging mood on social network data
  • huge data streams coming in very fast
  • social sites operate 24/7
  • timeliness - not subject to time lags
  • too much and too subjective for human analysis
  • useful to marketers, IT, customers, law enforcement/

security agencies, political influencers, etc.

49

slide-50
SLIDE 50

remember volume and velocity?

50

slide-51
SLIDE 51

difficult comment analysis (1/2)

  • false negatives - “crying” and “crap” (negative) vs.

“crying with joy” and “holy crap!” (positive)

  • relative sentiment - “I bought a Honda Accord” - great

for Honda, bad for Toyota

  • compound sentiment - “I love the phone but hate the

network”

  • conditional sentiment - “If someone doesn’t call me back,

I’m never doing business with them again!”

51

slide-52
SLIDE 52

difficult comment analysis (2/2)

  • scoring sentiment - “I like it” vs. “I really like it” vs. “I

love it”

  • sentiment modifiers - “I bought an iPhone today :-)”

“Gotta love the telephone company ;-<“

  • international, cultural, etc. etc. specific sentiments

52

slide-53
SLIDE 53

53

slide-54
SLIDE 54

54

slide-55
SLIDE 55

remember the course goals?

  • in particular
  • to help you to:
  • appreciate why some of these new computing

technologies are unique, revolutionary, and disruptive

  • have the vocabulary and understanding to evaluate

stories that you read/hear

  • participate knowingly with friends, relatives, colleagues

in discussions on these topics

55

slide-56
SLIDE 56

56

slide-57
SLIDE 57

57

slide-58
SLIDE 58

58

slide-59
SLIDE 59

59

slide-60
SLIDE 60

analyzing significant correlations between social media measures and sales

60

slide-61
SLIDE 61

watson claims to be able to do this

61

slide-62
SLIDE 62

sentiment analysis can work in the opposite direction - a threat?

  • results of analysis can feed into social media
  • IOT + AI become participants in social networks

in almost realtime

  • how would these actions influence privacy, security,

veracity of data?

62

slide-63
SLIDE 63

63

slide-64
SLIDE 64

comparisons between data science and ai (1/2)

  • meaning
  • DS is about curating large datasets for analytics and

visualization

  • AI is implementing this data in a machine
  • skills
  • DS is about statistical technique design and development
  • AI is about algorithm technique design and development

64

slide-65
SLIDE 65

comparisons between data science and ai (2/2)

  • technique
  • DS uses analytics techniques
  • AI uses ML
  • observation
  • DS identifies patterns in data for decision-making
  • AI looks for intelligence in data for decision-making

65

slide-66
SLIDE 66

66

slide-67
SLIDE 67

67

slide-68
SLIDE 68

remember mechanical turk?

  • “A hybrid machine/human computing arrangement which

advantageously involves humans to assist a computer to solve particular tasks, allowing the computer to solve the tasks more efficiently.” (from Amazon patent application)

  • crowdsources questions and research (data) from people
  • good way for Amazon to build database/learning models?

maybe for systems like Alexa?

68

slide-69
SLIDE 69

ai and data filtering

  • refine datasets into the basics that a user needs without

including data that is

  • repetitive
  • irrelevant
  • out of date
  • sensitive
  • etc., etc.

69

slide-70
SLIDE 70

consider the large hadron collider (lhc)

  • 25 GB/second (raw data from detectors)
  • how much of this is practical to save/analyze
  • noise/background?
  • known/well understood events?
  • rare event(s) that will win a Nobel Prize?
  • filtered down to ~1 GB/second = 96% reduction
  • can AI can do this?

70

slide-71
SLIDE 71

71

slide-72
SLIDE 72

72