The bottom line We are the data science people but the world needs - - PowerPoint PPT Presentation

the bottom line
SMART_READER_LITE
LIVE PREVIEW

The bottom line We are the data science people but the world needs - - PowerPoint PPT Presentation

The bottom line We are the data science people but the world needs to know about it Wrangling vs Analytics wrangling analytics Wrangling: data processing that allows meaningful analysis to begin (extraction, integration, cleaning, querying,


slide-1
SLIDE 1

The bottom line

We are the data science people but the world needs to know about it

slide-2
SLIDE 2

Wrangling vs Analytics

wrangling analytics Wrangling: data processing that allows meaningful analysis to begin (extraction, integration, cleaning, querying, etc - basically SIGMOD/PODS CFP) Requires more effort (usually 50-80%)

slide-3
SLIDE 3

This is what we do

  • But the world sees the end result
  • The 80-20 rule: 20% of effort gets 80% of PR
  • But we need to be better at it
  • Some ammunition...
slide-4
SLIDE 4

Data analysts’ favorite tools

0% 10% 20% 30% 40% 50% 60% 70% Teradata SPSS Perl Amazon Elastic MapReduce (EMR) Hbase Weka Amazon RedShift Pig C SQLite Scala PowerPivot C++ SAS Apache Hadoop MongoDB Visual Basic/VBA Cloudera Spark Hive Homegrown analysis tools D3 Oracle PostgreSQL Java Matplotlib (Python) JavaScript Tableau Microsoft SQL Server ggplot Python: numpy, scipy, scikit-learn MySQL R Python Excel SQL

TOOLS

LANGUAGES, DATA PLATFORMS, ANALYTICS

Share of Respondents Tool: language, data platform, analytics

slide-5
SLIDE 5

Future data analysts’ favorite tools

slide-6
SLIDE 6

The world needs to know

  • ... but it’s much more fun doing research than talk to

the “real world”

  • Still, we are not a small community, and we have

people with different skills

  • One example: we convinced our funders (EPSRC)

that data management is an essential part of “big data”

  • The more people get the message, the healthier our

field is