Feature engineering Abhishek Trehan People Analytics Practitioner - - PowerPoint PPT Presentation

feature engineering
SMART_READER_LITE
LIVE PREVIEW

Feature engineering Abhishek Trehan People Analytics Practitioner - - PowerPoint PPT Presentation

DataCamp Human Resources Analytics: Predicting Employee Churn in R HUMAN RESOURCES ANALYTICS : PREDICTING EMPLOYEE CHURN IN R Feature engineering Abhishek Trehan People Analytics Practitioner DataCamp Human Resources Analytics: Predicting


slide-1
SLIDE 1

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Feature engineering

HUMAN RESOURCES ANALYTICS: PREDICTING EMPLOYEE CHURN IN R

Abhishek Trehan

People Analytics Practitioner

slide-2
SLIDE 2

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Feature engineering

Basic variables: Set of variables available directly in a dataset Derived variables: Set of variables derived using data transformation of basic variables

slide-3
SLIDE 3

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Creating new features

Age difference between an employee and their manager Job-hop index Employee tenure

slide-4
SLIDE 4

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Age difference

Views Handling pressure Expectations Work ethics

slide-5
SLIDE 5

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Job-hopping

Job-hop index = Number of companies worked Total experience

slide-6
SLIDE 6

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Employee tenure

Tenure: duration of employment Inactive employees tenure Active employees tenure

date_joining & last_working_date date_joining & cutoff_date

slide-7
SLIDE 7

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Deriving employee tenure

# Coercing date variables from dd/mm/yyyy format library(lubridate)

  • rg_final %>%

mutate(date_of_joining = dmy(date_of_joining), cutoff_date = dmy(cutoff_date), last_working_date = dmy(last_working_date))

slide-8
SLIDE 8

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Calculating timespan

# Computing time span in years library(lubridate) date_1 <- ymd("2000-01-01") date_2 <- ymd("2014-08-09") time_length(interval(date_1, date_2), "years") [1] 14.60274

slide-9
SLIDE 9

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Let's practice!

HUMAN RESOURCES ANALYTICS: PREDICTING EMPLOYEE CHURN IN R

slide-10
SLIDE 10

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Compensation

HUMAN RESOURCES ANALYTICS: PREDICTING EMPLOYEE CHURN IN R

Abhishek Trehan

People Analytics Practitioner

slide-11
SLIDE 11

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Compensation matters

Compensation is one the top drivers of employee turnover Pay matters for employee retention

slide-12
SLIDE 12

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Exploring compensation variable

# Plot the distribution of compensation ggplot(emp_tenure, aes(x = compensation)) + geom_histogram()

slide-13
SLIDE 13

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Exploring compensation variable

# Plot the distribution of compensation across levels ggplot(emp_tenure, aes(x = level, y = compensation)) + geom_boxplot()

slide-14
SLIDE 14

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Deriving Compa-ratio

Compa Ratio = Median Compensation Actual Compensation

slide-15
SLIDE 15

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Deriving Compa-ratio

Compa-ratio of 1.2 or 120% means that the employee is paid 20% above the median pay Compa-ratio of 1 or 100% means that the employee is paid exactly the median pay Compa-ratio of 0.8 or 80% means that the employee is paid 20% below the median pay

slide-16
SLIDE 16

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Deriving median compensation & compa-ratio

# Derive Compa-ratio emp_compa_ratio <- emp_tenure %>% group_by(level) %>% mutate(median_compensation = median(compensation), compa_ratio = (compensation / median_compensation)) # Look at the median compensation for each level emp_compa_ratio %>% distinct(level, median_compensation) # A tibble: 2 x 2 # Groups: level[2] level median_compensation <fct> <dbl> 1 Analyst 51840 2 Specialist 83496

slide-17
SLIDE 17

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Deriving Compa-level

Compa-ratio > 1: Above Otherwise: Below

slide-18
SLIDE 18

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Let's practice!

HUMAN RESOURCES ANALYTICS: PREDICTING EMPLOYEE CHURN IN R

slide-19
SLIDE 19

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Information value

HUMAN RESOURCES ANALYTICS: PREDICTING EMPLOYEE CHURN IN R

Abhishek Trehan

People Analytics Practitioner

slide-20
SLIDE 20

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Understanding Information value

Measure of the predictive power of independent variable to accurately predict the dependent variable Rank independent variables on the basis of their predictive power

slide-21
SLIDE 21

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Calculating Information value

IV = ( (% of non-events - % of events)) ∗ log( ) ∑ % of events % of non-events

slide-22
SLIDE 22

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Calculating Information value

# Load Information package library(Information) # Compute Information Value IV <- create_infotables(data = emp_final, y = "turnover") # Print Information Value IV$Summary Variable IV 12 percent_hike 1.144784e+00 17 total_dependents 1.088645e+00 21 no_leaves_taken 9.404533e-01 31 tenure 9.332570e-01 27 mgr_effectiveness 6.830020e-01 11 compensation 6.074885e-01

slide-23
SLIDE 23

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Information value (IV) table

Information value Predictive power < 0.15 Poor Between 0.15 and 0.4 Moderate > 0.4 Strong

percent_hike: 1.14 (Strong) compa_ratio: 0.29 (Moderate)

slide-24
SLIDE 24

DataCamp Human Resources Analytics: Predicting Employee Churn in R

Let's practice!

HUMAN RESOURCES ANALYTICS: PREDICTING EMPLOYEE CHURN IN R