The Emerging Role of Data Scientists on Software Development Teams - - - PowerPoint PPT Presentation

the emerging role of data scientists on software
SMART_READER_LITE
LIVE PREVIEW

The Emerging Role of Data Scientists on Software Development Teams - - - PowerPoint PPT Presentation

The Emerging Role of Data Scientists on Software Development Teams - Shruthi Nagaraj Carleton University Who is a Data Scien9st ? The people who do collec9on and analysis are called data scien*sts!!, -DJ Pa9l and Jeff Hammerbacher


slide-1
SLIDE 1

The Emerging Role of Data Scientists

  • n Software Development Teams
  • Shruthi Nagaraj

Carleton University

slide-2
SLIDE 2

Who is a Data Scien9st ?

“The people who do collec9on and analysis are called data scien*sts!!”,

  • DJ Pa9l and Jeff Hammerbacher
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

Methodology

  • Interviews with 16 par9cipants { P1 to P16}

– 5 women and 11 men from eight different

  • rganiza9ons at MicrosoP
  • Snowball sampling

– data-driven engineering meet-ups and technical community mee9ngs – word of mouth

  • Clustering of par9cipants
slide-6
SLIDE 6

DATA SCIENTISTS IN SOFTWARE DEVELOPMENT TEAMS

  • Data science is not a new field, but the

prevalence of interest in it has grown rapidly.

  • Observed an evolu9on of data science in ,

both in MicrosoP terms of technology and people

slide-7
SLIDE 7

Why are Data Scien;sts Needed in So?ware Development Teams?

  • Demand for Experimenta;on
  • need for designing experiments with real user data
  • Demand for Sta;s;cal Rigor
  • conduct formal hypothesis tes9ng, report confidence

intervals, and determine baselines through normaliza9on.

  • Demand for Data Collec;on Rigor
  • data scien9sts discuss how much data quality

maXers and how many data cleaning issues they have to manage.

slide-8
SLIDE 8

Background of Data Scien9sts

  • Most CS, many interdisciplinary backgrounds
  • Many have higher educa9on degrees
  • Strong passion for data
  • PhD training contributes to working style
slide-9
SLIDE 9

Ac;vi;es of Data Scien;sts

  • Collec;on
  • Data engineering pla5orm, Experimenta*on

pla5orm

  • Analysis
  • Data merging and cleaning, Data shaping

including selec*ng and crea*ng features

  • Use and Dissemina;on
  • Defining ac*ons and triggers, Transla*ng insights

and models to business values

slide-10
SLIDE 10

Problems that Data Scien;sts Work

  • n
  • Performance Regression
  • Requirements Iden;fica;on
  • Fault Localiza;on and Root Cause Analysis
  • Bug Priori;za;on
  • Customer Understanding
  • …….etc
slide-11
SLIDE 11

Organiza;on of Data Science Teams

  • The “Triangle” model
  • The “Hub and Spoke” model
  • The “Consul*ng” model
  • The “Individual Contributor”
  • The “Virtual Team” model.
slide-12
SLIDE 12

Working Styles of Data Scien;sts

Insight Provider Modelling Specialists PlaTorm Builder Polymath Team Leader

slide-13
SLIDE 13
slide-14
SLIDE 14

Insight Providers

  • Play an inters99al role between managers and

engineers within a product group

  • Generate insights and to support and guide

their managers in decision making

  • Analyze product and customer data collected

by the teams’ engineers

  • Strong background in sta9s9cs
  • Communica9on and coordina9on skills are key
slide-15
SLIDE 15
slide-16
SLIDE 16

Modelling Specialists

  • Act as expert consultants
  • Build predic9ve models that can be

instan9ated as new soPware features and support other team’s data-driven decision making

  • Strong background in machine learning
  • Other forms of exper9se such as survey design
  • r sta9s9cs would fit as well
slide-17
SLIDE 17

Modelling Specialists

  • Modeling Specialists some9mes partner with Insight

Providers to define ground truths to assess the quality of their predic9ve models

  • They believe - building new soPware features based on

the predic9ve models is extremely important for demonstra9ng the value of their work

slide-18
SLIDE 18

Platform Builders

slide-19
SLIDE 19

Pla^orm Builders

  • Build data engineering pla^orms that are

reusable in many contexts

  • Strong background in big data systems
  • Make trade-offs between engineering and

scien9fic concerns

slide-20
SLIDE 20

Pla^orm Builders

  • They think data collec9on soPware must be

reliable, performant, low-impact, and widely deployable.

  • On the other hand, the soPware should provide

data that are sufficiently precise, accurate, well- sampled, and meaningful enough to support sta9s9cal analysis.

  • Their exper9se in both soPware engineering and

data analysis enables them to make tradeoffs between these concerns.

slide-21
SLIDE 21

Polymaths

slide-22
SLIDE 22

Polymaths

  • Data scien9sts who “do it all”:

− Forming a business goal − Instrumen9ng a system to collect data − Doing necessary analyses or experiments − Communica9ng the results to managers

slide-23
SLIDE 23

Team Leaders

slide-24
SLIDE 24

Team Leaders

  • Senior data scien9sts who typically run their
  • wn data science teams
  • Act as data science “evangelists”, pushing for

the adop9on of data-driven decision making

  • Work with senior company leaders to inform

broad business decisions

slide-25
SLIDE 25

IMPLICATIONS

  • Research
  • for researchers this new team composi9on changes the

context in which problems are pursued.

  • Prac;ce
  • how to improve the impact and ac9onability of data

science work from the strategies shared by other data scien9sts.

  • Educa;on
  • combine a deep understanding of soPware engineering

problems,

slide-26
SLIDE 26

Conclusion

  • Demand for designing experiments with real

user data and repor9ng results with sta9s9cal rigor.

  • Shared ac9vi9es, several success stories, and

five dis9nct styles of data scien9sts.

  • Reported strategies that data scien9sts use to

ensure that their results are relevant to the company

slide-27
SLIDE 27

Discussions

  • Why are data scien9sts needed in soPware

development teams ?

  • What kinds of problems and ac9vi9es do data

scien9sts need to work on in soPware development teams?

  • Should big companies start using this idea?
slide-28
SLIDE 28

Thank you