What is Data Science? Business efficiency: Wal-Mart - - PowerPoint PPT Presentation

what is data science business efficiency wal mart
SMART_READER_LITE
LIVE PREVIEW

What is Data Science? Business efficiency: Wal-Mart - - PowerPoint PPT Presentation

What is Data Science? Business efficiency: Wal-Mart http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.html Business Marketing: Target http://tinyurl.com/7jbntx3 Recommendations: In October 2006 Netflix held a competition for the best


slide-1
SLIDE 1

What is Data Science?

slide-2
SLIDE 2

Business efficiency: Wal-Mart

http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.html

slide-3
SLIDE 3

Business Marketing: Target

http://tinyurl.com/7jbntx3

slide-4
SLIDE 4
  • In October 2006 Netflix held a competition for the best

algorithm to predict user ratings of movies.

  • The winner must improve Netflix’ own algorithm (Cinematch) by at

least 10%

  • Award was given in September 2009
  • Based on Collaborative Filtering
  • Difficult movies to predict:

“Napoleon Dynamite” ,“Lost in Translation”, “Fahrenheit 9/11”, “Kill Bill: Volume 1”

http://www2.research.att.com/~volinsky/netflix/bpc.html

Recommendations:

slide-5
SLIDE 5

Sports Analytics

slide-6
SLIDE 6

Beyond Moneyball: The defensive shift

http://www.sporttechie.com/2014/11/11/sports/mlb/beyond-moneyball-how-big-data-is-changing-baseball/

slide-7
SLIDE 7

Lesson for Data Scientists:

  • Question your assumptions (be especially skeptical when predicting a rare event with

limited history using human behavior.

  • Examine data quality - in this election polls were not reaching all likely voters
  • Beware of your own biases: many pollsters were likely Clinton supporters and did not

want to question the results that favored their candidate

slide-8
SLIDE 8
  • Physician John

Snow links the

  • utbreak to a

contaminated well by plotting number of cases on a map

  • Started the

science of epidemiology

Cholera outbreak in London 1854

slide-9
SLIDE 9

a.k.a. Domesday Book

  • Commissioned in 1085 by

William the Conqueror

  • Record of the Great

Survey of England

  • Last used to settle dispute

in court in the 1960s!

http://www.domesdaybook.co.uk/

The Book of Winchester (1086)

slide-10
SLIDE 10

What problems were solved?

  • Engineering: design of machines
  • Sciences: formulation of theories

How were problems solved?

  • Empirically
  • Theories
  • Computation

Data in the 20th century

slide-11
SLIDE 11

Data in the 21st Century

How is today different?

  • More data is available
  • More data is digital
  • More data is observed, rather than

generated by a designed experiment

slide-12
SLIDE 12

Data in the 21st Century

What problems are solved today?

  • Spell checking
  • Face recognition
  • Sentiment analysis
  • Optimal routing
  • High-frequency trading algorithms
  • just to name a few …
slide-13
SLIDE 13

Data in the 21st Century

How are problems solved today?

  • Empirically
  • Theories
  • Computation
  • Data exploration

http://research.microsoft.com/en-us/collaboration/fourthparadigm/

slide-14
SLIDE 14

For Example

Network security:

  • 20th century: based on rules and signatures
  • 21st century: data mining traffic logs

http://www.bro.org/

Artificial Intelligence: VS.

slide-15
SLIDE 15

IBM Watson: The Jeopardy Challenge

Not everything is perfect!

ITS LARGEST AIRPORT IS NAMED FOR A WORLD WAR II HERO: ITS SECOND LARGEST, FOR A WORLD WAR II BATTLE. Category: U.S. Cities

slide-16
SLIDE 16

A good question

So, what is data science?

slide-17
SLIDE 17

Who are the Data Scientists?

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/

Skills:

  • Make discoveries while swimming in data
  • Don’t allow technical limitations to bog down solutions
  • Often fashion their own tools
  • Skilled in storytelling with data

Some data-driven companies: Google, Wal-Mart, Twitter, LinkedIn, Amazon

slide-18
SLIDE 18

What data scientists do

  • Ask a question
  • Get relevant data
  • Prepare data for analysis
  • outliers, missing values, incorrect values
  • Explore data
  • understand the world as it is (was)
  • Statistical model
  • estimate/train and validate model
  • predict what will (likely) happen
  • Communicate results
  • tell a story
  • recommend
slide-19
SLIDE 19

The Data Science Process

Data Extraction Exploratory Data Analysis Machine Learning, Statistical Models Data Cleaning Communicate and Report Findings Build Data Product

slide-20
SLIDE 20

Data Scientist skills

  • Computer science
  • programming, hacking skills
  • Statistics
  • probability, distributions, modelling
  • Mathematics
  • linear algebra, calculus, optimization
  • Domain expertise
  • storytelling, pose question, interpret result
  • Communication
  • presentation, data visualization
slide-21
SLIDE 21

Drew Conway’s Venn diagram

  • Real world motivating questions
  • Hypothesis Testing
  • Extract insight
  • Familiarity with statistical

tools

  • Understand algorithms
  • Interpret results

http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

  • Acquire and clean data
  • Text file manipulation
  • Think algorithmically
slide-22
SLIDE 22

IBM Predictive Analytics for Asset Management

https://www.youtube.com/watch?v=b9LrXxG5SjY