Data Science in the Cloud Stefan Krawczyk @stefkrawczyk - PowerPoint PPT Presentation

Online & Streamed Computation Do you need to recompute: ● Very likely ○ features for all users? you start with predicted results for all users? ○ a batch system Are you heavily dependent on your ● ETL running every night? ● Online vs Streamed depends on in house factors: ○ Number of models How often they change ○ We use online ○ Cadence of output required system for In house eng. expertise recommendations ○ ○ etc.

Streamed Example

Online/Streaming Thoughts Dedicated infrastructure → More room on batch infrastructure ● ○ Hopefully $$$ savings Hopefully less stressed Data Scientists ○

Online/Streaming Thoughts Dedicated infrastructure → More room on batch infrastructure ● ○ Hopefully $$$ savings Hopefully less stressed Data Scientists ○ Requires better software engineering practices ● ○ Code portability/reuse Designing APIs/Tools Data Scientists will use ○

Online/Streaming Thoughts Dedicated infrastructure → More room on batch infrastructure ● ○ Hopefully $$$ savings Hopefully less stressed Data Scientists ○ Requires better software engineering practices ● ○ Code portability/reuse Designing APIs/Tools Data Scientists will use ○ Prototyping on AWS Lambda & Kinesis was surprisingly quick ● ○ Need to compile C libs on an amazon linux instance

What’s in a Model? Scaling model knowledge

Ever: Had someone leave and then nobody understands how they trained their ● models?

Ever: Had someone leave and then nobody understands how they trained their ● models? Or you didn’t remember yourself? ○

Ever: Had someone leave and then nobody understands how they trained their ● models? Or you didn’t remember yourself? ○ Had performance dip in models and you have trouble figuring out why? ●

Ever: Had someone leave and then nobody understands how they trained their ● models? Or you didn’t remember yourself? ○ Had performance dip in models and you have trouble figuring out why? ● Or not known what’s changed between model deployments? ○

Ever: Had someone leave and then nobody understands how they trained their ● models? Or you didn’t remember yourself? ○ Had performance dip in models and you have trouble figuring out why? ● Or not known what’s changed between model deployments? ○ Wanted to compare model performance over time? ●

Ever: Had someone leave and then nobody understands how they trained their ● models? Or you didn’t remember yourself? ○ Had performance dip in models and you have trouble figuring out why? ● Or not known what’s changed between model deployments? ○ Wanted to compare model performance over time? ● Wanted to train a model in R/Python/Spark and then deploy it a webserver? ●

Produce Model Artifacts

Produce Model Artifacts Isn’t that just saving the coefficients/model values? ●

Produce Model Artifacts Isn’t that just saving the coefficients/model values? ● NO! ○

Produce Model Artifacts Isn’t that just saving the coefficients/model values? ● NO! ○ Why? ●

Produce Model Artifacts Isn’t that just saving the coefficients/model values? ● NO! ○ Why? ● How do you deal with organizational drift?

Produce Model Artifacts Isn’t that just saving the coefficients/model values? ● NO! ○ Why? ● How do you deal with organizational drift? Makes it easy to keep an archive and track changes over time

Produce Model Artifacts Isn’t that just saving the coefficients/model values? ● NO! ○ Why? ● Helps a lot with model debugging & diagnosis! How do you deal with organizational drift? Makes it easy to keep an archive and track changes over time

Produce Model Artifacts Isn’t that just saving the coefficients/model values? ● NO! ○ Why? ● Helps a lot with model debugging & diagnosis! How do you deal with organizational drift? Makes it easy to keep an archive and track Can more easily use in changes over time downstream processes

Data Science in the Cloud Stefan Krawczyk @stefkrawczyk - PowerPoint PPT Presentation

Data Science in the Cloud Stefan Krawczyk @stefkrawczyk linkedin.com/in/skrawczyk November 2016 Who are Data Scientists? Means: skills vary wildly But theyre in demand and expensive The Sexiest Job of the 21st Century - HBR

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Big Data on Google Cloud Using Cloud Dataflow, BigQuery, and friends to process data the Cloud

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Cloud Ross Mallace Commercial Director Cloud/SaaS Cloud is here. ALL By 2020 most core

Embracing Cloud Ian Apperley Agenda A little about me What is Cloud and where did it come

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

SAS and (the) Cloud Dave Annis SAS Solutions onDemand SAS and (the) Cloud Everyones Cloud

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

Electron Cloud Build Electron Cloud Build- Electron Cloud Build Electron Cloud Build -Up

Why do big data and cloud systems slow down and stop? Shan Lu What are? Why do big data and

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

Cloud Computing and Cloud Storage By: Maurice Kelly History of Internet and Cloud Computing

11. Presentation Approaches II Dealing with the presentation problem Vorlesung

make experiment WMT 2010 workflow management Goals JHU Submission WMT 2010 Running translation

The search of water in Mars using radars Toms Ahumada Motivation First publication

Query Execution in Column-Stores Atte Hinkka Seminar on Columnar Databases, Fall 2012 1

REAL-TIME 8K WORKFLOW | RED R3D SDK ABOUT RED EVOLUTION OF RED Jim Jannard founded the

OPPOSITION TO W ESTLAWN S URGERY C ENTER , LLC Project No. CN1911-046 TriStar StoneCrest Medical

TCUSD Wellness Plan Development Fall 2019 -2020 Adverse Childhood Experience Study (ACES)

Patented Object Level DRM, e-Commerce, Content Security and Tracking Centralized Content

Data Science in the Cloud Stefan Krawczyk @stefkrawczyk - PowerPoint PPT Presentation

Data Science in the Cloud Stefan Krawczyk @stefkrawczyk linkedin.com/in/skrawczyk November 2016 Who are Data Scientists? Means: skills vary wildly But theyre in demand and expensive The Sexiest Job of the 21st Century - HBR

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Big Data on Google Cloud Using Cloud Dataflow, BigQuery, and friends to process data the Cloud

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Cloud Ross Mallace Commercial Director Cloud/SaaS Cloud is here. ALL By 2020 most core

Embracing Cloud Ian Apperley Agenda A little about me What is Cloud and where did it come

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

SAS and (the) Cloud Dave Annis SAS Solutions onDemand SAS and (the) Cloud Everyones Cloud

Cloud Computing &amp; Cloud Models Cloud Models Topics Defining cloud computing

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

Electron Cloud Build Electron Cloud Build- Electron Cloud Build Electron Cloud Build -Up

Why do big data and cloud systems slow down and stop? Shan Lu What are? Why do big data and

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

Cloud Computing and Cloud Storage By: Maurice Kelly History of Internet and Cloud Computing

11. Presentation Approaches II Dealing with the presentation problem Vorlesung

make experiment WMT 2010 workflow management Goals JHU Submission WMT 2010 Running translation

The search of water in Mars using radars Toms Ahumada Motivation First publication

Query Execution in Column-Stores Atte Hinkka Seminar on Columnar Databases, Fall 2012 1

REAL-TIME 8K WORKFLOW | RED R3D SDK ABOUT RED EVOLUTION OF RED Jim Jannard founded the

OPPOSITION TO W ESTLAWN S URGERY C ENTER , LLC Project No. CN1911-046 TriStar StoneCrest Medical

TCUSD Wellness Plan Development Fall 2019 -2020 Adverse Childhood Experience Study (ACES)

Patented Object Level DRM, e-Commerce, Content Security and Tracking Centralized Content

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing