How have Data Science Skills Evolved? A case study using embeddings - PowerPoint PPT Presentation

How have Data Science Skills Evolved? A case study using embeddings Maryam Jahanshahi Ph.D. Research Scientist TapRecruit.co http://bit.ly/dataengconf2018

TapRecruit uses NLP to understand career content Converting unstructured documents into structured data Smart Editor for JDs Pipeline Health Monitoring Salary Estimation Data-driven suggestions on Analytics dashboards to help Data-driven salary estimates both the content and language diagnose quality and diversity based on a job’s requirements use in job descriptions. issues in talent pipelines. rather than just title and location.

Language matters in job descriptions Same title, Same Title Finance Manager Finance Manager Different job Kraft Foods Roche Required Experience Senior (6-8 Years) Junior (3 Years) Required Responsibility No Managerial Experience Division Level Controller Preferred Skill Strategic Finance Role Required Education MBA / CPA

Language matters in job descriptions Same title, Same Title Finance Manager Finance Manager Different job Kraft Foods Roche Required Experience Senior (6-8 Years) Junior (3 Years) Required Responsibility No Managerial Experience Division Level Controller Preferred Skill Strategic Finance Role Required Education MBA / CPA Different title, Performance   Senior Analyst, Same job Marketing Manager Customer Strategy PocketGems The Gap Required Experience Mid-Level Mid-Level Required Skills Quantitative Focus Quantitative Focus Required Experience iBanking Expertise Finance Expertise Required Skills Data Analysis Tools (SQL) Relational Database Experience Preferred Experience Consulting Experience Preferred External Consulting Experience Preferred Preferred Education MBA Preferred BA in Accounting, Finance, MBA Preferred

How have data science skills changed over time?

Strategies to identify changes within datasets MBA SQL PhD Tableau Python PowerBI Manual Feature Extraction: Require a priori selection of key attributes, therefore difficult to discover new attributes

Strategies to identify changes within datasets 1880 1920 1960 2000 MBA SQL force atom radiat state energy theory energy energy motion electron electron electron PhD Tableau differ energy measure magnet light measure ray field Python PowerBI Matter Quantum Electron Manual Feature Extraction: Dynamic Topic Models: Require a priori selection of key Uses a bag of words approach, attributes, therefore difficult to and require experimentation with discover new attributes topic number. Adapted from Blei and Lafferty, ICML 2006.

Word embeddings capture semantic similarities Statistical modeling through software (e.g. SPSS) or programming language (e.g. Python ) Word Context Experience in Python , Java or other object-oriented programming languages Context Word Context Proficiency programming in Python , Java or C++. Context Word Context

Word embeddings capture semantic similarities Statistical modeling through software (e.g. SPSS) or programming language (e.g. Python ) Word Context Experience in Python , Java or other object-oriented programming languages Context Word Context Proficiency programming in Python , Java or C++. Context Word Context Python

Word embeddings capture semantic similarities Statistical modeling through software (e.g. SPSS) or programming language (e.g. Python ) Word Context Experience in Python , Java or other object-oriented programming languages Context Word Context Proficiency programming in Python , Java or C++. Context Word Context Python Object- Programming orientated Language Java C++

Word embeddings capture semantic similarities Statistical modeling through software (e.g. SPSS) or programming language (e.g. Python ) Word Context Experience in Python , Java or other object-oriented programming languages Context Word Context Proficiency programming in Python , Java or C++. Esperanto Context Word Context French German Python Object- Programming orientated Language Java C++ Japanese

Embeddings capture entity relationships Dimensionality enables comparison between word pairs along many axes Exxon Tillerson McMillon Wal-Mart Dauman McAdam Colao Viacom Verizon Vodafone Hierarchies Adapted from Stanford NLP GLoVE Project

Embeddings capture entity relationships Dimensionality enables comparison between word pairs along many axes Slowest Slower Exxon Tillerson Shortest Slow McMillon Wal-Mart Shorter Dauman McAdam Short Colao Viacom Stronger Verizon Vodafone Strongest Strong Hierarchies Comparatives and Superlatives Adapted from Stanford NLP GLoVE Project

Embeddings capture entity relationships Dimensionality enables comparison between word pairs along many axes Slowest Slower Exxon Tillerson Man Shortest Slow McMillon Wal-Mart Shorter Dauman McAdam Short King Woman Colao Viacom Queen Stronger Verizon Vodafone Strongest Strong Hierarchies Comparatives and Superlatives Woman :: Queen as Man :: ? Adapted from Stanford NLP GLoVE Project

Pretrained embeddings facilitate fast prototyping Corpus Generation Corpus Processing Language Model Generation Language Model Tuning Final Application

Pretrained embeddings facilitate fast prototyping Corpus Twitter Common Crawl GoogleNews Wikipedia Corpus Generation Tokens 27 B 42-840 B 100 B 6 B Corpus Processing Vocabulary Size 1.2 M 1.9-2.2 M 3 M 400 k Algorithm GLoVE GLoVE word2vec GLoVE Language Model Generation Vector Length 25 - 200 d 300 d 300 d 50 - 300 d Language Model Tuning Final Application

Problems with pretrained embedding models Abbreviations vs Words Casing e.g. IT vs it Out of Vocabulary Words Domain Specific Words & Acronyms Words with multiple meanings Polysemy e.g. drive (a car) vs drive (results) e.g. Chef (the job) vs Chef (the language) Phrases that have new meanings Multi-word Expressions e.g. Front-end vs front + end

Tools for developing custom language models Modularized for different data and modeling requirements SyntaxNet CoreNLP Corpus Processing Language Modeling Tokenization, POS tagging, Sentence Different word embedding models Segmentation, Dependency Parsing (GLoVE, word2vec, fastText)

Hyperparameter tuning on final model outputs Window sizes capture semantic similarity vs semantic relatedness Esperanto French German Python Object- Programming orientated Language Java C++ Japanese Small Window Size Capture Semantic similarity, Substitutes and Word-level differences

Hyperparameter tuning on final model outputs Window sizes capture semantic similarity vs semantic relatedness Esperanto Esperanto Statistical French French modeling SPSS German German Python Software Object- Python Programming Programming orientated Japanese C++ Language Java Java C++ Language Object-orientated Japanese Small Window Size Large Window Size Capture Semantic similarity, Capture Semantic relatedness, Substitutes and Word-level differences Alternatives and Domain-level differences

Career language embedding model Identified equal opportunity and perks language

Career language embedding model Identified 'soft' skills and language around experience

I’ve got 300 dimensions… but time ain’t one

Two approaches to connect embeddings Static embeddings Dynamic embeddings stitched together trained together 2018 8 1 0 2 2017 7 1 0 2 6 1 0 2 2016 5 1 0 2 2015 Kim, Chiu, Kaneki, Hedge and Petrov, arXiv: 1405:3515. Balmer and Mandt, arXiv: 1702:08359 Kulkarni, Al-Rfou, Perozzi and Skiena, arXiv: 1411:3315.   Yao, Sun, Ding, Rao and Xiong, arXiv: 1703:00607 Rudolph and Blei, arXiv: 1703:08052

Two approaches to connect embeddings Static embeddings Dynamic embeddings stitched together trained together 2018 8 1 0 2 2017 7 1 0 2 6 1 0 2 2016 5 1 0 2 Data hungry: Sufficient data for each 2015 time slice for a quality embedding. Requires alignment : Each time slice is trained independently, therefore dimensions are not comparable across slices. Kim, Chiu, Kaneki, Hedge and Petrov, arXiv: 1405:3515. Balmer and Mandt, arXiv: 1702:08359 Kulkarni, Al-Rfou, Perozzi and Skiena, arXiv: 1411:3315.   Yao, Sun, Ding, Rao and Xiong, arXiv: 1703:00607 Rudolph and Blei, arXiv: 1703:08052

Two approaches to connect embeddings Static embeddings Dynamic embeddings stitched together trained together 2018 8 1 0 2 2017 7 1 0 2 6 1 0 2 2016 5 1 0 2 Data hungry: Sufficient data for each Data efficient: Treats each time slice as 2015 time slice for a quality embedding. a sequential latent variable, enabling time slices with sparse data. Requires alignment : Each time slice Does not require alignment: Treating is trained independently, therefore dimensions are not comparable across time slice as a variable ensures slices. embeddings are connected across slices. Kim, Chiu, Kaneki, Hedge and Petrov, arXiv: 1405:3515. Balmer and Mandt, arXiv: 1702:08359 Kulkarni, Al-Rfou, Perozzi and Skiena, arXiv: 1411:3315.   Yao, Sun, Ding, Rao and Xiong, arXiv: 1703:00607 Rudolph and Blei, arXiv: 1703:08052

How have Data Science Skills Evolved? A case study using embeddings - PowerPoint PPT Presentation

How have Data Science Skills Evolved? A case study using embeddings Maryam Jahanshahi Ph.D. Research Scientist TapRecruit.co http://bit.ly/dataengconf2018 TapRecruit uses NLP to understand career content Converting unstructured documents into

OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR

Fabian Bartnick Vice President Asia Pacific WAY BACK WHEN THE CONSUMER & THE DISTRIBUTION

Vice President Asia Pacific WAY BACK WHEN THE CONSUMER & THE DISTRIBUTION & PRICING

Case study 2 Case study 2 Case study 2 Case study 2 Former Industrial Site, London: How has

Killer Presentation Skills: How to Acquire the Skills and Killer Presentation Skills: How to

Cartesian Genetic Programming Evolved picture Evolved picture Julian F. Miller Dept of

Skills Development Scotland Investing in Skills Development Skills Scotland Investing in

(Online)Safety What are the dangers? Always evolving More embedded How have they evolved? 1996

Thursday and Friday Recap questions to answer in 8 minutes 1] How have human teeth evolved

Welcome to 8 th Grade Parent Information Night What are the IB approaches to learning skills? 1.

Understanding the impact of variations in the skills supply and demand SKILLS GAPS AND HIGH

Facilitation Skills, Facilitation Skills, Presentation Skills or Both? Presentation Skills or

Life Skills Schedule Introduction Schedules Daily Living Skills Explore and Learn! Questions

Killer Presentation Skills: How to Acquire the Skills and Say Goodbye to Killer Presentation

Skills Network 19 th April 2016 Iain Elliott Skills Network Chair Iain Elliott Skills

Study of the circumstellar envelopes of evolved stars Do Thi Hoai Department of Astrophysics

Machine Translation Overview April 23, 2020 Junjie Hu Materials largely borrowed from Austin

Course overview intelligent agents search and game-playing logical systems Artificial

Demystifying the efficiency of reinforcement learning: A few recent stories Yuxin Chen EE,

Chairs Report Sowjanya Gollapinni (UTK) FNAL UEC meeting January 19, 2018 1 News

2)EXERCCIOS 3)TAREFA DE CASA 2 E X E R C I S E S 3 Question 1 INDICATE THE IDEA TRANSMITTED

Final Assignment Problem: Cheese Delivery Swiss cities (the Chosen Cities ) with access to the

Safety barriers Ola Holmberg Radiation Protection of Patients Unit Division of Radiation,

Pros and C d Con ons of of Cha hanging t the he A ACAs Essential Health B Benefit

How have Data Science Skills Evolved? A case study using embeddings - PowerPoint PPT Presentation

How have Data Science Skills Evolved? A case study using embeddings Maryam Jahanshahi Ph.D. Research Scientist TapRecruit.co http://bit.ly/dataengconf2018 TapRecruit uses NLP to understand career content Converting unstructured documents into

OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR

Fabian Bartnick Vice President Asia Pacific WAY BACK WHEN THE CONSUMER &amp; THE DISTRIBUTION

Vice President Asia Pacific WAY BACK WHEN THE CONSUMER &amp; THE DISTRIBUTION &amp; PRICING

Case study 2 Case study 2 Case study 2 Case study 2 Former Industrial Site, London: How has

Killer Presentation Skills: How to Acquire the Skills and Killer Presentation Skills: How to

Cartesian Genetic Programming Evolved picture Evolved picture Julian F. Miller Dept of

Skills Development Scotland Investing in Skills Development Skills Scotland Investing in

(Online)Safety What are the dangers? Always evolving More embedded How have they evolved? 1996

Thursday and Friday Recap questions to answer in 8 minutes 1] How have human teeth evolved

Welcome to 8 th Grade Parent Information Night What are the IB approaches to learning skills? 1.

Understanding the impact of variations in the skills supply and demand SKILLS GAPS AND HIGH

Facilitation Skills, Facilitation Skills, Presentation Skills or Both? Presentation Skills or

Life Skills Schedule Introduction Schedules Daily Living Skills Explore and Learn! Questions

Killer Presentation Skills: How to Acquire the Skills and Say Goodbye to Killer Presentation

Skills Network 19 th April 2016 Iain Elliott Skills Network Chair Iain Elliott Skills

Study of the circumstellar envelopes of evolved stars Do Thi Hoai Department of Astrophysics

Machine Translation Overview April 23, 2020 Junjie Hu Materials largely borrowed from Austin

Course overview intelligent agents search and game-playing logical systems Artificial

Demystifying the efficiency of reinforcement learning: A few recent stories Yuxin Chen EE,

Chairs Report Sowjanya Gollapinni (UTK) FNAL UEC meeting January 19, 2018 1 News

2)EXERCCIOS 3)TAREFA DE CASA 2 E X E R C I S E S 3 Question 1 INDICATE THE IDEA TRANSMITTED

Final Assignment Problem: Cheese Delivery Swiss cities (the Chosen Cities ) with access to the

Safety barriers Ola Holmberg Radiation Protection of Patients Unit Division of Radiation,

Pros and C d Con ons of of Cha hanging t the he A ACAs Essential Health B Benefit

Fabian Bartnick Vice President Asia Pacific WAY BACK WHEN THE CONSUMER & THE DISTRIBUTION

Vice President Asia Pacific WAY BACK WHEN THE CONSUMER & THE DISTRIBUTION & PRICING