The analytics landscape: A personal view Charles Elkan el - - PowerPoint PPT Presentation

the analytics landscape a personal view
SMART_READER_LITE
LIVE PREVIEW

The analytics landscape: A personal view Charles Elkan el - - PowerPoint PPT Presentation

The analytics landscape: A personal view Charles Elkan el December 20, 2011 What is analytics? JARGON Big data, business intelligence (BI), decision support (DSS), data warehousing, unstructured data, knowledge


slide-1
SLIDE 1

The analytics landscape: A personal view

Charles Elkan el December 20, 2011

slide-2
SLIDE 2

What is analytics?

  • Big data, business intelligence (BI), decision support (DSS),

data warehousing, unstructured data, knowledge discovery in databases (KDD), information visualization, map-reduce.

  • analytics = convert data into intelligence + capture value

= statistics + optimization

  • statistics = machine learning = data mining
  • optimization = microeconomics + operations research

JARGON

slide-3
SLIDE 3

Outline 1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research and business opportunity

slide-4
SLIDE 4

A basic distinction

  • I. Structured data

Tables in databases Nodes and links in networks

  • II. Unstructured data

Text Videos Tables in web pages XML

slide-5
SLIDE 5
  • I. Structured data
  • A data warehouse is a cost center, not a profit center.
  • How can structured data be a profit center?

1.Predictive analytics 2.Visual analytics

slide-6
SLIDE 6
  • 1. Predictive analytics
  • So, what can we do with structured data?
  • Answer: Make predictions, then take actions.
  • Example:
  • But, what are the costs and benefits of alternative actions?
  • And, who pays which costs?
slide-7
SLIDE 7

Cost-sensitive learning

  • Cross-domain theory of making optimal decisions given predictions:
slide-8
SLIDE 8
  • 2. Visual analytics
  • So, what can we do with structured data?
  • Answer: Find and display patterns; prompt human insight.
slide-9
SLIDE 9

Patterns of human metabolism

slide-10
SLIDE 10

Information visualization

  • “state of the art analytic tools to identify biomarkers”
slide-11
SLIDE 11
  • II. Unstructured data
slide-12
SLIDE 12

A case study

slide-13
SLIDE 13

A general need: Task-oriented semantic search

LaVerne Council, CIO of Johnson & Johnson: “... allow anyone to ask a question ... folks that have given us access to their email ... data mining for answers to that question ... help us solve a very hairy issue for one of our products ...

  • ne of the associates had completed his thesis in college on that

very topic ... they weren’t in the same company ... we were able to really come back with answers.”

slide-14
SLIDE 14

A grand vision

  • “Open source

intelligence (OSI)”

slide-15
SLIDE 15

A less grand vision

slide-16
SLIDE 16
  • III. The business of analytics
  • Analytics applications are valuable.
slide-17
SLIDE 17

Analytics companies are valuable

slide-18
SLIDE 18

Are valuations bubble-icious?

  • HP compared to Autonomy:

Sales: $128B versus $963M Income: $12B versus $343M Value: $50B versus $11B

  • Forrester: “The Autonomy IP is stagnant. There hasn’t been a

major release in five years.”

  • Zero recent patents for the core analytics.
slide-19
SLIDE 19
  • IV. A research and market opportunity
slide-20
SLIDE 20
  • New platform for diverse data

Cloud-based Multiply the user base 10x:

  • Easy to use
  • Fun to use
  • Opportunity: Add “secret sauce” to open-source software

Newer artificial intelligence Patented artificial intelligence

Disruption from below

slide-21
SLIDE 21
  • A role model for cloud-based ease of use: Box.net
  • $650M valuation, but no intelligence.
slide-22
SLIDE 22
  • Cloud-based software as a service (SaaS)
  • Easy to use, fun to use
  • Newer AI, patented AI
  • Open-source foundation:

Lucene and Solr as backend Tika for importing unstructured data

Disruption from below

slide-23
SLIDE 23

Newer artificial intelligence

  • Sentiment analysis
  • Topic models for
  • rganizing content
  • Recursive neural nets

for deep understanding

www.socher.org/index.php/ Main/ParsingNaturalScenes AndNaturalLanguageWith RecursiveNeuralNetworks

slide-24
SLIDE 24

Newer AI: Fewer topics, better fit

slide-25
SLIDE 25

Patented AI: Sentiment analysis

  • ... labels designate level of

quality, such as interestingness, appropriateness, timeliness, humor, style of language,

  • bscenity, sentiment
  • ... a classifier means effective to

automatically associate a quality value to items of data, wherein said quality value is indicative

  • f the qualitative nature of said

items of data

slide-26
SLIDE 26

Today in the New York Times

slide-27
SLIDE 27

SQUID

  • Sentiment analysis
  • Question answering
  • Unstructured data organization
  • Interactive insight
  • Diverse entity extraction
  • But what will be most beneficial and profitable?
  • Historical answer: Specific vertical applications.
slide-28
SLIDE 28

Profit lies in verticals, I

slide-29
SLIDE 29

Profit lies in verticals, II

slide-30
SLIDE 30

Discussion

  • Acknowledgement: Most images are due to other authors.