Here Are the Most in Demand Skills for Data Scientists

2654

I scoured job listing websites to find which skills are most in demand for data scientists. I looked at general data science skills and at specific languages and tools separately. I searched job listings on LinkedInIndeedSimplyHiredMonster, and AngelList on October 10, 2018.

I read through many job listings and surveys to find the most common skills. Terms like management were not compared because they can be used in so many different contexts in job listings.

All searches were performed for the United States with “data scientist” “[keyword]”. Using exact match search reduced the number of results. However, this method ensured the results were relevant for data scientist positions and affected all search terms similarly.

AngelList provides the number of companies with data scientist listings rather than the number of positions. I excluded AngelList from both analyses because its search algorithm seems to operate as an OR type of logical search, without the ability to change it to an AND. AngelList works fine if you are looking for “data scientist” “TensorFlow” which is only going to be found with data scientist positions, but if your keywords are “data scientist” “react.js” it returns far too many listings for companies with non-data scientist job listings.

Glassdoor was also excluded from my analyses. The site stated that it had 26,263 “data scientist” jobs in the US, but it would show me no more than 900 jobs. Additionally, it seems highly unlikely it would have more than three times the number of data scientist job listings as any other major platform.

Terms with over 400 listings on LinkedIn for general skills and over 200 listings for specific technologies were included in the final analyses. There was certainly some cross posting. The results are recorded in this Google Sheet.

I downloaded .csv files and imported them into JupyterLab. I then computed the percentage occurrences and averaged them across the job listing websites.

I also compared the software results to a Glassdoor study of its data scientist job listings from the first half of 2017. Combined with information from KDNuggets’ usage survey, it appears some skills are becoming more important and others are losing importance. We’ll get to those in a bit.

See my Kaggle Kernel for interactive charts and additional analyses here. I used Plotly for the visualizations. To use Plotly with JupyterLab takes a little wrangling as of this writing — instructions are at the end of my Kaggle Kernel and in Plotly’s docs.

Here’s the chart of the most frequent general data scientist skills sought by employers.

The results show that analysis and machine learning are at the heart of data scientist jobs. Gleaning insights from data is a primary function of data science. Machine learning is all about creating systems to predict performance and it is very in demand.

Data science requires statistics and computer science skills — no surprise there. Statistics, computer science, and mathematics are also college majors, which probably helps their frequency.

It is interesting that communication is mentioned in nearly half of job listings. Data scientists need to be able communicate insights and work with others.

AI and deep learning don’t show up as frequently as some other terms. However, they are subsets of machine learning. Deep learning is being used for more and more of the machine learning tasks that other algorithms were used for previously. For example, the best machine learning algorithms for most natural language processing problems are now deep learning algorithms. I expect deep learning skills will be sought more explicitly in the future and that machine learning will become more synonymous with deep learning.

Which specific software tools for data scientists are employers looking for? Let’s tackle that question next.

Below are the top 20 specific languages, libraries, and tech tools employers are looking for data scientists to have experience with.

Let’s briefly look at the most common tech skills.

Read the source article at Towards Data Science.