A field guide to the machine learning zoo Theodore Vasiloudis - PowerPoint PPT Presentation

A field guide to the machine learning zoo Theodore Vasiloudis SICS/KTH

From idea to objective function

Formulating an ML problem

Formulating an ML problem Common aspects ● Source: Xing (2015)

Formulating an ML problem Common aspects ● ○ Model (θ) Source: Xing (2015)

Formulating an ML problem Common aspects ● ○ Model (θ) Data (D) ○ Source: Xing (2015)

Formulating an ML problem Common aspects ● ○ Model (θ) Data (D) ○ ● Objective function: L(θ, D) Source: Xing (2015)

Formulating an ML problem Common aspects ● ○ Model (θ) Data (D) ○ ● Objective function: L(θ, D) ● Prior knowledge: r(θ) Source: Xing (2015)

Formulating an ML problem Common aspects ● ○ Model (θ) Data (D) ○ ● Objective function: L(θ, D) ● Prior knowledge: r(θ) ● ML program: f(θ, D) = L(θ, D) + r(θ) Source: Xing (2015)

Formulating an ML problem Common aspects ● ○ Model (θ) Data (D) ○ ● Objective function: L(θ, D) ● Prior knowledge: r(θ) ● ML program: f(θ, D) = L(θ, D) + r(θ) ML Algorithm: How to optimize f(θ, D) ● Source: Xing (2015)

Example: Improve retention at Twitter Goal: Reduce the churn of users on Twitter ● ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be retweeted

Example: Improve retention at Twitter Goal: Reduce the churn of users on Twitter ● ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be retweeted ● Data (D): ● Model (θ): ● Objective function - L( D , θ ): Prior knowledge (Regularization): ● ● Algorithm:

Example: Improve retention at Twitter Goal: Reduce the churn of users on Twitter ● ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be retweeted ● Data (D): Features and labels, x i , y i ● Model (θ): ● Objective function - L( D , θ ): Prior knowledge (Regularization): ● ● Algorithm:

Example: Improve retention at Twitter Goal: Reduce the churn of users on Twitter ● ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be retweeted ● Data (D): Features and labels, x i , y i ● Model (θ): Logistic regression, parameters w p(y| x , w ) = Bernouli(y | sigm( w Τ x )) ○ ● Objective function - L( D , θ ): ● Prior knowledge (Regularization): ● Algorithm:

Example: Improve retention at Twitter Goal: Reduce the churn of users on Twitter ● ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be retweeted ● Data (D): Features and labels, x i , y i ● Model (θ): Logistic regression, parameters w p(y| x , w ) = Bernouli(y | sigm( w Τ x )) ○ Objective function - L( D , θ ): NLL( w ) = Σ log(1 + exp(-y w Τ x i )) ● Prior knowledge (Regularization): r( w ) = λ* w Τ w ● ● Algorithm: Warning: Notation abuse

Example: Improve retention at Twitter Goal: Reduce the churn of users on Twitter ● ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be retweeted ● Data (D): Features and labels, x i , y i ● Model (θ): Logistic regression, parameters w p(y| x , w ) = Bernouli(y | sigm( w Τ x )) ○ Objective function - L( D , θ ): NLL( w ) = Σ log(1 + exp(-y w Τ x i )) ● Prior knowledge (Regularization): r( w ) = λ* w Τ w ● ● Algorithm: Gradient Descent

Data problems

Data problems ● GIGO: Garbage In - Garbage Out

Data readiness Source: Lawrence (2017)

Data readiness Problem: “Data” as a concept is hard to reason about. ● ● Goal: Make the stakeholders aware of the state of the data at all stages Source: Lawrence (2017)

Data readiness Source: Lawrence (2017)

Data readiness Band C ● ○ Accessibility Source: Lawrence (2017)

Data readiness Band C ● ○ Accessibility ● Band B ○ Representation and faithfulness Source: Lawrence (2017)

Data readiness Band C ● ○ Accessibility ● Band B ○ Representation and faithfulness ● Band A Data in context ○ Source: Lawrence (2017)

Data readiness Band C ● ○ “How long will it take to bring our user data to C1 level?” ● Band B ○ “Until we know the collection process we can’t move the data to B1.” ● Band A “We realized that we would need location data in order to have an A1 dataset.” ○ Source: Lawrence (2017)

Data readiness Band C ● ○ “How long will it take to bring our user data to C1 level?” ● Band B ○ “Until we know the collection process we can’t move the data to B1.” ● Band A “We realized that we would need location data in order to have an A1 dataset.” ○

Selecting algorithm & software: “Easy” choices

Selecting algorithms

Source: scikit-learn.org An ML algorithm “farm”

Source: Asimov Institute (2016) The neural network zoo

Selecting algorithms Always go for the simplest model you can afford ●

Selecting algorithms Always go for the simplest model you can afford ● ○ Your first model is more about getting the infrastructure right Source: Zinkevich (2017)

Selecting algorithms Always go for the simplest model you can afford ● ○ Your first model is more about getting the infrastructure right Simple models are usually interpretable. Interpretable models are easier to debug. ○ Source: Zinkevich (2017)

Selecting algorithms Always go for the simplest model you can afford ● ○ Your first model is more about getting the infrastructure right Simple models are usually interpretable. Interpretable models are easier to debug. ○ ○ Complex model erode boundaries Source: Sculley et al. (2015)

Selecting algorithms Always go for the simplest model you can afford ● ○ Your first model is more about getting the infrastructure right Simple models are usually interpretable. Interpretable models are easier to debug. ○ ○ Complex model erode boundaries CACE principle: Changing Anything Changes Everything ■ Source: Sculley et al. (2015)

Selecting software

Leaf The ML software zoo

Your model vs. the world

What are the problems with ML systems? Data ML Code Model Expectation

What are the problems with ML systems? Data ML Code Model Sculley et al. (2015) Reality

Things to watch out for

Things to watch out for Data dependencies ● Sculley et al. (2015) & Zinkevich (2017)

Things to watch out for Data dependencies ● ○ Unstable dependencies Sculley et al. (2015) & Zinkevich (2017)

Things to watch out for Data dependencies ● ○ Unstable dependencies ● Feedback loops Sculley et al. (2015) & Zinkevich (2017)

Things to watch out for Data dependencies ● ○ Unstable dependencies ● Feedback loops ○ Direct Sculley et al. (2015) & Zinkevich (2017)

Things to watch out for Data dependencies ● ○ Unstable dependencies ● Feedback loops ○ Direct ○ Indirect Sculley et al. (2015) & Zinkevich (2017)

Bringing it all together

Bringing it all together Define your problem as optimizing your objective function using data ● ● Determine (and monitor) the readiness of your data ● Don't spend too much time at first choosing an ML framework/algorithm ● Worry much more about what happens when your model meets the world.

Thank you. @thvasilo tvas@sics.se

Sources ● Google auto-replies: Shared photos, and text ● Silver et al. (2016): Mastering the game of Go ● Xing (2015): A new look at the system, algorithm and theory foundations of Distributed ML ● Lawrence (2017): Data readiness levels ● Asimov Institute (2016): The Neural Network Zoo ● Zinkevich (2017): Rules of Machine Learning - Best Practices for ML Engineering ● Sculley et al. (2015): Hidden Technical Debt in Machine Learning Systems

A field guide to the machine learning zoo Theodore Vasiloudis - PowerPoint PPT Presentation

A field guide to the machine learning zoo Theodore Vasiloudis SICS/KTH From idea to objective function Formulating an ML problem Formulating an ML problem Common aspects Source: Xing (2015) Formulating an ML problem Common aspects

North Carolina Zoo The Future Zoo Governance and Expansion North Carolina Zoo Expansion Plans

North Carolina Zoo The Future Zoo Expansion and Governance North Carolina Zoo Expansion Plans

Amendment to Oakland Zoo Master Plan to Oakland Zoo Master Plan Amendment Oakland City Council

Little Rock Zoo Task Force Findings & Recommendations Final Report Overview The Zoo Task

Introducing the zoo of paper beasts David Simonsen, WAYF, david@wayf.dk Todays walk in the zoo

Naples Zoo at Caribbean Gardens Our Events At Naples Zoo, you will be surrounded by lush sub-

Elephants in SOFIA ZOO Hello, my name is Artaida, and today Im going to tell you about all

Oregon Zoo Bond Citizens Oversight Committee September 14, 2016 Zoo Oversight Budget

A New Zoo Oregon Zoo Bond Citizens Oversight Committee Program Update September 10, 2014

Probabilistic Computation Lecture 14 BPP, ZPP 1 Zoo NEXP EXP NPSPACE PSPACE 2P NP P

Machine Learning for NLP The Neural Network Zoo Aurlie Herbelot 2019 Centre for Mind/Brain

Departmental Presentations Little Rock Zoo Zoo Operations Positions Approved for Hire

A New Zoo Oregon Zoo Bond Citizens Oversight Committee Program Update May 14, 2014

Welcome! We are excited to have you visit. Minnesota Zoo Field Trip Our mission: To connect

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Zoo Field Trip Getting excited? Heres what to expect. Our mission: To connect people,

Graph Embeddings in Practice: A Telco Churn Prediction Use Case PhD Researcher: Sandra Mitrovi

Simons Center for Communication, Information & Network Mathematics UT Austin Wiopt 2017,

Boa A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories Robert

Capo: Recapitulating Storage for Virtual Desktops Mohammad Shamma, Dutch T. Meyer, Jake Wires,

Peer-to-peer systems and Data location overlay networks Churn Newscast algorithm

Understanding the Downstream Instability of Word Embeddings Megan Leszczynski , Avner May, Jian

Exploring Characteristics of Code Churn @JMKraaijeveld @EricBouwers Time Activities Code Churn

V ANTAGE : S CALABLE AND E FFICIENT F INE -G RAIN C ACHE P ARTITIONING Daniel Sanchez and Christos

Sambuz

Useful Links

Newsletter

Mail Us

A field guide to the machine learning zoo Theodore Vasiloudis - PowerPoint PPT Presentation

A field guide to the machine learning zoo Theodore Vasiloudis SICS/KTH From idea to objective function Formulating an ML problem Formulating an ML problem Common aspects Source: Xing (2015) Formulating an ML problem Common aspects

North Carolina Zoo The Future Zoo Governance and Expansion North Carolina Zoo Expansion Plans

North Carolina Zoo The Future Zoo Expansion and Governance North Carolina Zoo Expansion Plans

Amendment to Oakland Zoo Master Plan to Oakland Zoo Master Plan Amendment Oakland City Council

Little Rock Zoo Task Force Findings &amp; Recommendations Final Report Overview The Zoo Task

Introducing the zoo of paper beasts David Simonsen, WAYF, david@wayf.dk Todays walk in the zoo

Naples Zoo at Caribbean Gardens Our Events At Naples Zoo, you will be surrounded by lush sub-

Elephants in SOFIA ZOO Hello, my name is Artaida, and today Im going to tell you about all

Oregon Zoo Bond Citizens Oversight Committee September 14, 2016 Zoo Oversight Budget

A New Zoo Oregon Zoo Bond Citizens Oversight Committee Program Update September 10, 2014

Probabilistic Computation Lecture 14 BPP, ZPP 1 Zoo NEXP EXP NPSPACE PSPACE 2P NP P

Machine Learning for NLP The Neural Network Zoo Aurlie Herbelot 2019 Centre for Mind/Brain

Departmental Presentations Little Rock Zoo Zoo Operations Positions Approved for Hire

A New Zoo Oregon Zoo Bond Citizens Oversight Committee Program Update May 14, 2014

Welcome! We are excited to have you visit. Minnesota Zoo Field Trip Our mission: To connect

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Zoo Field Trip Getting excited? Heres what to expect. Our mission: To connect people,

Graph Embeddings in Practice: A Telco Churn Prediction Use Case PhD Researcher: Sandra Mitrovi

Simons Center for Communication, Information &amp; Network Mathematics UT Austin Wiopt 2017,

Boa A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories Robert

Capo: Recapitulating Storage for Virtual Desktops Mohammad Shamma, Dutch T. Meyer, Jake Wires,

Peer-to-peer systems and Data location overlay networks Churn Newscast algorithm

Understanding the Downstream Instability of Word Embeddings Megan Leszczynski , Avner May, Jian

Exploring Characteristics of Code Churn @JMKraaijeveld @EricBouwers Time Activities Code Churn

V ANTAGE : S CALABLE AND E FFICIENT F INE -G RAIN C ACHE P ARTITIONING Daniel Sanchez and Christos

Sambuz

Useful Links

Newsletter

Mail Us

Little Rock Zoo Task Force Findings & Recommendations Final Report Overview The Zoo Task

Simons Center for Communication, Information & Network Mathematics UT Austin Wiopt 2017,