SCMA HR Analytics Challenge

Team 8

Team Leader: Jonathan Fremin

Team Members: Thomas Gorman, Peter Bowling, and Riley Trent.

Introduction

Employee retention is a significant issue that all companies must come to terms with. The ideal plan is to gather and employ brilliant and hard-working individuals and find a way to keep them for long periods of time. For most organizations, this ideal is rarely met. It is inevitable that some employees will become complacent, will be poached by other companies, will transition into other industries, or even come to loathe their current employment situation. There are obvious and quantifiable costs associated with employee turnover such as hiring, onboarding, and training. There are also notable hidden costs as well, like the loss of productivity or time spent fixing new employee mistakes. A study from the Center for American Progress estimated the costs for replacing employees (not executives or physicians) at about 21% of that employee’s annual salary. Being able to identify issues that have a clear effect on employee turnover is the first step to improving retention and minimizing the total costs required for replacing employees.

The goal of this analytics challenge is to identify which employees from a nondescript firm will be most likely leave the company based on certain statistics. We were provided with a Microsoft Excel workbook titled “Analytics_Challenge_Data” that simulates employee data. Statistics such as time spent with the company, numbers of projects worked, department, and salary are among the data provided. This data is to be dissected using various data mining tools to produce a prediction on which employees are the most liable to separate from the company.

Executive Summary

Why are we losing our best employees and how can we predict who is leaving next? This was the problem our team was presented with by this firm. We used several analytical procedures to help form a hypothesis and test our hypothesis for this problem. In our analysis of the firm that we were presented we found some key indicators that help the firm analyze their employees figure out why some of their best employees are leaving and who is going to leave next. We predicted that the employee turnover had a relation to the employee satisfaction level and the time spent at the company. Our analysis showed that the employees in the sales department were more likely to leave because of number of employees this department has in the company and how many projects the workers are given. More specifically the employees in sales between year three and year four are more likely to leave because of the exponential increase in projects they are expected to complete every year. We also saw a relation from this exponential increase in projects to the average satisfaction level dropping between all departments. We used this average satisfaction number to estimate whether an employee between years three and four will stay or leave the company. Our employee satisfaction model we created showed that:

a). a less than .5 satisfaction in Sales is the most likely to leave

b.) a less than .45 satisfaction in HR is the next most likely to leave

c.) a .1 or less in satisfaction in any department WILL leave the company.

Below is the procedure that we followed and our analytical findings that support our hypothesis. Finally, our report closes with our suggestions on how to manage the employee turnover and what can be done to keep the best employees.

Problem

Recently, a firm was noticing that many of their best and most experienced employees were departing the company. Unfortunately, the firm has collected little information as to why this would be the case. However, they have managed to work with their human resources department to piece together some common data on their employees (those who have left as well as those who have remained). Given the data at hand, the firm wishes to better understand the reasons why these specific employees have left while also potentially finding a way to predict what factors would most likely result in a company defection. The data are provided in a separate file with one tab for the actual information and the adjoining tab providing a description of each column of information.

Methodology

For the challenge at hand, we decided to analyze the data provided using the CRISP-DM (Cross-Industry Process for Data Mining) methodology. It is broken up into 6 main phases:

  1. Business Understanding – The Business Understanding phase focuses on understanding the objectives and requirements from a business perspective and helps formulate a data mining approach.
  2. Data Understanding - Initial data collection, getting familiar with the data (such as data visualization), and identifying data quality problems
  3. Data Preparation - Constructing the final data set from the initial raw data.
  4. Modeling - Using specified data modeling techniques
  5. Evaluation - Assessing the data mining results and modeling techniques
  6. Deployment - Determining an appropriate resolution/implementation strategy

Business Understanding

The firm is having a hard time handling losing many of its best employees. The data provided is not very comprehensive, but unless the data is inaccurate, there can be a lot of understanding to draw from it. By combining and analyzing the data from different perspectives, we hope to be able to provide some insight on potential warning signs of employee defection and offer ways to improve on employee morale so they are less likely to want to leave. Does the department an employee is in or the number of projects per employee have any effect on the change of employee turnover? We found the best way to approach the data was to start with the most obvious statistics such as current salary and tenure with the company and look for patterns that immediately jump out.

Data Understanding

From the perspective of trying to understand the data, the data is very straightforward. It was provided in a single table within a single Excel document. The columns and rows of data were labeled clearly and accurately, all of the data conforming to the same specifications. Similarly, the descriptions are thorough and leave little-to-no room for confusion. In a real-world scenario it is highly unlikely data will be provided in such pristine condition.

Data Preparation

The data contained within the Excel document is free from errors or blanks, which makes this phase our analysis extremely quick and essentially leaves us no room to do any usable data preparation.

Modeling Choice

We decided on a combination of XLMiner, SSPS Modeler, and Excel PivotTables for our modeling.

Evaluation

It was clear after some modeling and matching that there were some outstanding conclusions. We have made these explicit in the upcoming section “Quantitative Findings”.

Quantitative Findings

Employees Leave between 3 and 4 Years

While employees left at various points in their careers, there is an obvious spike in defectors across all departments between 3 and 4 years. This may be due to the project workload decreasing, or due to satisfaction level. In any case, looking at employees with low satisfaction before their 4 year mark would be a good indication of them considering leaving. After the 4th year, the factor that sticks out is now a high satisfaction rate that may also cause them to leave. This may be due to them searching for higher pay elsewhere in their respective fields.

Total of past leavers by Department

Row Labels / Count of
department / Sum of
leavers / Leavers by dept (%)
accounting / 767 / 204 / 27
hr / 739 / 215 / 29
IT / 1227 / 273 / 22
management / 630 / 91 / 14
marketing / 858 / 203 / 24
product_mng / 902 / 198 / 22
RandD / 787 / 121 / 15
sales / 4140 / 1014 / 24
support / 2229 / 555 / 25
technical / 2720 / 697 / 26
Grand Total / 14999 / 3571 / 24

By department, Sales overall had the most leavers, with Support and technical following close behind. That’s looking at the company as a whole. When we looked into the rate of leaving within respective departments, HR has the highest percent leave rate with respect to their department. We then decided to take a look at salary to see if this had an impact on leaving.

Past Leavers by Department and Salary

Count of department / Sum of left
Row Labels / high / low / medium / high / low / medium
accounting / 74 / 358 / 335 / 5 / 99 / 100
hr / 45 / 335 / 359 / 6 / 92 / 117
IT / 83 / 609 / 535 / 4 / 172 / 97
management / 225 / 180 / 225 / 1 / 59 / 31
marketing / 80 / 402 / 376 / 9 / 126 / 68
product_mng / 68 / 451 / 383 / 6 / 105 / 87
RandD / 51 / 364 / 372 / 4 / 55 / 62
sales / 269 / 2099 / 1772 / 14 / 697 / 303
support / 141 / 1146 / 942 / 8 / 389 / 158
technical / 201 / 1372 / 1147 / 25 / 378 / 294
Grand Total / 1237 / 7316 / 6446 / 82 / 2172 / 1317

Doing this heatmap helped show significant losses, but we needed to see which department lost the most respective to their department, so we calculated leave rates by salary and department.

Past Leaver Rate by Department for All Salaries

Department / High pay leave rate / low pay leave rate / medium pay leave rate
accounting / 0.067567568 / 0.276536313 / 0.298507463
hr / 0.133333333 / 0.274626866 / 0.325905292
IT / 0.048192771 / 0.282430213 / 0.181308411
management / 0.004444444 / 0.327777778 / 0.137777778
marketing / 0.1125 / 0.313432836 / 0.180851064
product_mng / 0.088235294 / 0.232815965 / 0.227154047
RandD / 0.078431373 / 0.151098901 / 0.166666667
sales / 0.05204461 / 0.332062887 / 0.170993228
support / 0.056737589 / 0.339441536 / 0.167728238
technical / 0.124378109 / 0.275510204 / 0.256320837

Here we see several things. In the high pay spectrum, we see HR had the highest leave-rate with respect to their department, with Marketing and Technical vying for second. In the lowest pay bracket, Support had the most quitters respective to their department, with with sales, marketing, and management following after. When we look to High salary leavers, HR had more with respect to their department while sales had more overall leavers. It may be the Sales department can handle it better due to the large amount of employees, and the ones listed above have to fill in the duties for missing coworkers and it may possibly affect satisfaction level and in turn cause more leavers. To see these side by side in chart, check below.

It is also worth noting that employee workload nearly doubles after year 2. Across the board, there is an average increase of about 80% in number of projects across each department during year 3 of employment, which could be an enormous stressor and may account for the significant decline in workplace satisfaction during that time period.

Predictions

The real question is “Where will employees leave next?”, and using predictive analytics through SPSS Modeler and XLMiner we hope to answer that.

The best predictor of leaving is satisfaction level, with second most important predictor being time spent with company.

That is overall. Given an employee has a satisfaction higher than 4.6, then time “spent in the company” is the next. However if they have lower than that, looking at number of projects can help decide.

Interestingly enough if one stayed with the company less than 4.5 years (matching our between 3 to 4 estimate) and meeting the satisfaction level of over .465, that employee is 59% likely to leave. However if we are just going off satisfaction level, 71.78 % of employees with a satisfaction higher than .465 will leave. The additional variables get clouded compared to satisfaction and time with the company, although the model can predict with accuracy who lines up with all other variables.

A prime example to look further down the tree is to see that those with a .1 satisfaction level will leave (no if, ands, or buts), however departments (which was even further down) did not do much to increase the leave rate compared to the other variables.

Deployment (Suggestions)

Since year 3 is the biggest culprit for a time in which an employee is most likely to leave, it should be targeted directly. Instead of dropping a significant portion of projects onto employees in their third year, we suggest that the workload is evened out over a three year period. Employees will notice the increase in pace of work, but to a lesser degree than just doubling the work at once. Such drastic change makes employees uneasy, especially when it influences them directly. It may also be wise to spread the work more equally between employees of all tenures. Newer employees are front-loaded with lots of projects while employees that have been around longer have significantly less projects during the year. Shifting a larger portion of the work (respective to current levels) onto the more tenured employees will make the workload disparity less obvious while also giving newer employees the opportunity to recognize that working within the company for a longer time has benefits other than regularly scheduled raises.

Since the amount of data allotted by the firm is fairly small, we also suggest more robust data cataloging. Just before employees have their annual reviews, we would like them to complete an anonymous survey that asks them for the same types of information provided to us in the Excel document: Number of years worked with the company, department, satisfaction level, number of projects completed that year, salary, and average monthly hours worked. On that same survey, employees could be asked their current views on their position within the company and any/all suggestions they have to give. Making the survey regular and anonymous should provide a reasonable starting place for pinpointing any issues that directly relate to employee turnover.

Bibliography

  1. “What is the CRISP-DM methodology?”Smart Vision - Europe. Accessed February 18, 2017. .
  2. Marban, Oscar, Gonzalo Mariscal, and Javier Segovia. “A Data Mining &Amp; Knowledge Discovery Process Model.”Data Mining and Knowledge Discovery in Real Life Applications, 2009.
  3. Boushey, Heather, and Sarah Jane Glynn. “There Are Significant Business Costs to Replacing Employees.”Center for American Progress, November 16, 2012.
  4. Pixabay, Analytics Image,