Wolfram|Alpha - Answering Questions with the Worlds Factual Data - - PDF document

wolfram alpha
SMART_READER_LITE
LIVE PREVIEW

Wolfram|Alpha - Answering Questions with the Worlds Factual Data - - PDF document

Slide 1 of 25 Wolfram|Alpha - Answering Questions with the Worlds Factual Data Joshua Martell - Feb 2nd @ 5pm, Mission City B5 2 StrataConf-WolframAlpha.nb Slide 2 of 25 What is Wolfram|Alpha? - Computational Knowledge Engine - the


slide-1
SLIDE 1

Slide 1 of 25

Wolfram|Alpha

  • Answering Questions with the World’s Factual Data

Joshua Martell - Feb 2nd @ 5pm, Mission City B5

slide-2
SLIDE 2

Slide 2 of 25

What is Wolfram|Alpha?

  • Computational Knowledge Engine
  • the world’s systematic knowledge computable and accessible to everyone
  • to compute whatever can be computed about anything
  • Integrate the world’s facts, data, and algorithms
  • 5 years of R&D
  • launched in May 2009

2 StrataConf-WolframAlpha.nb

slide-3
SLIDE 3

Slide 3 of 25

Wolfram|Alpha by example

  • when is valentines day?
  • how long was trex?
  • how many calories is in a burger?

StrataConf-WolframAlpha.nb 3

slide-4
SLIDE 4

Slide 4 of 25

Examples: Math

  • what is 22?
  • plot sinx^2y^2
  • int e^t sin5t dt
  • product 11n^4, n2 to infinity

4 StrataConf-WolframAlpha.nb

slide-5
SLIDE 5

Slide 5 of 25

Examples: Data lookups

  • population of china
  • GDP of the EU
  • flight time from seattle to tokyo
  • AAPL MSFT

StrataConf-WolframAlpha.nb 5

slide-6
SLIDE 6

Slide 6 of 25

Examples: Visualization

  • earthquakes dec 2004
  • qr code: http:strataconf.com
  • caffeine

6 StrataConf-WolframAlpha.nb

slide-7
SLIDE 7

Slide 7 of 25

Examples: Formulas

  • mortgage 5 20 yr
  • RLC circuit

StrataConf-WolframAlpha.nb 7

slide-8
SLIDE 8

Slide 8 of 25

Examples: Fun

  • scrabble quixotic
  • what's the meaning of life?
  • airspeed velocity of an unladen swallow

8 StrataConf-WolframAlpha.nb

slide-9
SLIDE 9

Slide 9 of 25

Elements of the output

Described in the tour

  • input field
  • assumptions (sometimes)
  • input interpretation
  • results pod
  • other pods
  • buttons and pull-down menus
  • output extras under the pod
  • source information

StrataConf-WolframAlpha.nb 9

slide-10
SLIDE 10

Slide 10 of 25

More about the input interpretation

  • getting from the free form input to here is our secret sauce
  • combination of heuristics, algorithms, and developer curation
  • W|A has formed an exact expression representing your input

10 StrataConf-WolframAlpha.nb

slide-11
SLIDE 11

Slide 11 of 25

More about pods

  • W|A looks for components that can report about the input
  • results pod is the “answer”
  • pods load asynchronously
  • related cross-domain information in other pods
  • entity only inputs

StrataConf-WolframAlpha.nb 11

slide-12
SLIDE 12

Slide 12 of 25

Finding data

  • want to have the best information available
  • quality, breadth, and technical assessment of each source
  • web searches, evaluation of source materials
  • prefer primary sources, prefer digital sources

12 StrataConf-WolframAlpha.nb

slide-13
SLIDE 13

Slide 13 of 25

Finding data

  • technical considerations: Print >> PDF >> HTML >> DB >> CSV
  • discuss with world experts to understand the data
  • PD & US Gov have simple licensing terms
  • corporate deals are more complicated, but data is better quality, documentation, assistance

StrataConf-WolframAlpha.nb 13

slide-14
SLIDE 14

Slide 14 of 25

Aggregating data

  • rare that one source has it all
  • fill in with secondary sources
  • alignment is difficult, troublesome, and error prone
  • use common identifiers, verify data across sources
  • hand checking....does this value make sense?
  • automating updates

14 StrataConf-WolframAlpha.nb

slide-15
SLIDE 15

Slide 15 of 25

Data cleaning and curation

  • we want the best data, and we’re willing to work for it
  • automate as much as possible, do the rest by hand
  • use Mathematica to find outliers and oddballs, explore data, verify quality
  • takes time and attention to detail

StrataConf-WolframAlpha.nb 15

slide-16
SLIDE 16

Slide 16 of 25

Making it computable

  • teach W|A about the domain and its relationships to existing domains
  • natural language parsing for entities and properties
  • data becomes a building block for the inspiration of users
  • GDP Greece population of Italy
  • questions thus far?

16 StrataConf-WolframAlpha.nb

slide-17
SLIDE 17

Slide 17 of 25

Data storage and retrieval

  • read heavy system
  • writes from feeds and developers
  • some results involve only computation, but usually some data is used
  • elaborate tracking of data changes

StrataConf-WolframAlpha.nb 17

slide-18
SLIDE 18

Slide 18 of 25

Versioning and deployment

  • significant development effort
  • different tools for programmatic changes verses hand curation
  • versioning is closely tied to deployment; deploy only new or updated values
  • weekly deployment of a new revision of W|A with data updates and code changes
  • content distribution system updates colocations

18 StrataConf-WolframAlpha.nb

slide-19
SLIDE 19

Slide 19 of 25

Purely computational data

  • formulas, encodings, etc.
  • internal APIs are very flexible
  • data comes from a wide variety of computational and unconventional sources
  • many, many built in algorithms in Mathematica

StrataConf-WolframAlpha.nb 19

slide-20
SLIDE 20

Slide 20 of 25

Computation and visualization

  • Mathematica as a development platform
  • our (not so) secret weapon
  • functional, very high level, symbolic programming language
  • built in everything
  • statistics, numerics, advanced plots, charts, file formats
  • large collection of algorithms
  • database/Java/.NET integration - C interface
  • used by
  • all 15 major US Federal government departments
  • all Fortune 50 companies
  • all 50 largest universities worldwide

20 StrataConf-WolframAlpha.nb

slide-21
SLIDE 21

Slide 21 of 25

Web components

  • webMathematica = Mathematica powered web pages
  • Mathematica integrates into the servlet engine
  • majority of code is Mathematica

StrataConf-WolframAlpha.nb 21

slide-22
SLIDE 22

Slide 22 of 25

Other Technologies: API

  • REST API, various language bindings
  • returns XML encoded HTML, plaintext, images, etc
  • used for W|A iPhone iPad / Android app
  • used by Bing to display W|A results into search results
  • free to try out

2GB over a T1 »

Result

10363 seconds

22 StrataConf-WolframAlpha.nb

slide-23
SLIDE 23

Slide 23 of 25

Other Technologies: Wolfram|Alpha appliance

  • W|A in your data center
  • consulting to integrate corporate data into W|A system

StrataConf-WolframAlpha.nb 23

slide-24
SLIDE 24

Slide 24 of 25

Data Summit 2011

  • Wolfram Data Summit - Washington DC, Sep 7,8, 9
  • not W|A specific
  • many corporate, non-profit, government attendees
  • meeting place to discuss common issues and solutions

24 StrataConf-WolframAlpha.nb

slide-25
SLIDE 25

Slide 25 of 25

StrataConf-WolframAlpha.nb 25