Image Source: analyticsforinsights.wordpress.com

How to Win in the Data Science World

Noemi Ramiro
The Startup
Published in
6 min readSep 12, 2020

--

In 2019, LinkedIn ranked “Data Scientist” as the most promising profession in the US with 56% increase in job demand, and has consistently topped Glassdoor’s best jobs in America for three years straight.

Sure, the COVID pandemic might have heavily affected the job landscape, but in the midst of businesses suffering enormous cuts lies a more pressing need for a data-driven culture. Having a strong data capability can reinforce better decision-making, with business goals and targets monitored and optimized.

For me and thousands of other aspiring data scientists, knowing what it takes to successfully break into the field would especially matter now, given the growing competition. So I ask, what do we need to do to stand out?

To answer this, I looked at the 2019 Kaggle Machine Learning and Data Science Survey results. Each year the “world’s largest data science community” conducts a survey among its users to obtain insights about the state of the Data Science and Machine Learning Industry. Looking at how Kagglers’ practices and characteristics influence salary should be a good starting point to answer my question.

I focused on the 2,013 data science professionals in the USA earning more than $30K annually, knowing that a lot of variation in salary can occur across geographies. Also, I excluded the professionals earning less than $30K to capture only those likely working full-time.

Most of the data science professionals have salaries within the $100K-200K range. This is way above the US median salary of $40K for 2019.

Salary Distribution of US Data Science Professionals in Kaggle (<$30K removed)

Are there major differences in salary among the different data science roles?

The sample is a mix of various data science professionals. The “Data Scientist” role tops the list (34%), followed by “Software Engineers” (13%) and “Data Analysts” (12%).

What are the job roles that have higher pay potential? I intend to answer this visually using heatmaps (and I will do this for the most part of this article). I figured that it’s a better way to show the distribution by role given that salary data was presented in ranges (rather than actual numbers).

From the heatmap we can see the obvious discrepancy between a Data Scientist and a Data Analyst, with the former showing a heavier concentration on the $100K-200K range, and the latter somewhere within $60K-125K. It seems that data scientists are paid much more than analysts.

Other professions such as Statisticians and Database Engineers tend to have more variation in pay, while Data Engineers are more concentrated in the $120K-125K range.

Note: Heatmaps used row percentages to account for differences in sizes among data roles.

What are the essential technical skills to do well in data science?

Do great data scientists need to be good at coding? A Glassdoor study suggests that it’s worthwhile because 9 out of 10 data scientist positions require at least one of Python, R, or SQL as a skill.

In this particular survey almost all Kagglers have at least one programming language that they use to do data science, and only .6% do not code at all. This isn’t surprising at all given the nature of Kaggle, where notebooks are the main way to share content.

But which particular programming languages are the most important to learn? The survey says that Python is the most popular with 30% using it on a regular basis. It is then followed by SQL (22%) and R (15%).

Looking at the salary heatmap we can see that while all the programming languages tend to bunch up in the $100K-200K range, software engineering-oriented languages such as Java, C++, and C have more dense representation in the $150K-200K range. Other noteworthy languages that relate to higher pay are Matlab, Typescript, and Bash.

On average, Kagglers use 2–3 programming languages on a regular basis. Does the number of languages used matter? Plotting the number of languages used according to salary range, we see that the number of languages used tend to increase as pay increases — up to the 125K-150K point. So yes, it may be worth learning more than 1.

Apart from programming, what other skills matter? From the salary heatmap we see a strong case for learning cloud-based data software and APIs. Those who use it appear to have a higher earning potential, most likely at $150K-200K, and even a high concentration of professionals earning more than $300K.

Does educational background play a huge part?

Data science professionals tend to be a highly educated group, with 72% having either a Master’s Degree or a PhD. The heatmaps do not really show anything remarkable, except that Professional Degrees have a high concentration in the $150K-250K bracket. This group only constitutes 1.3% of the sample, hence I would say this is inconclusive.

How much does continuous learning on online platforms help?

Aside from formal education, upskilling can be done through tons of online content, like what Massive Open Online Courses (MOOCs) and online bootcamps offer. Majority (83%) of Kagglers use these platforms to learn data science. Coursera is by far the most popular, followed by Datacamp, Udemy, and Kaggle Courses.

Interestingly, Fast.ai skewed heavily on the higher income levels $125K-150K. DataQuest on the other hand are much more spread over the lower and middle income levels, which suggests that beginners tend to use this site more.

Apart from MOOCs and courses, data science media can also be good sources of skills and industry knowledge. Blogs such as Medium and Analytics Vidhya are the most popular, followed by Kaggle.

Not a lot of pattern can be observed in the salary heatmap — most are bunched within the $100K-200K range. Curiously, Hacker News appears to have more followers on the higher end with $150K-200K salaries.

Key takeaways:

To win in the data science field (AND if you define winning as having a high pay):

  1. Code! Learning more languages will probably help. Apart from Python and R consider adding other non-data science languages such as C++, Java, and Typescript into your toolkit.
  2. Cloud-based technologies are worth learning. Get ready to explore those AWS, GCP, and Azure platforms for big data.
  3. Continuously upskill and update through MOOCs and online courses, and through media such as blogs and technology news.

A few disclaimers:

  • While technical skills are required to succeed in data science, soft skills are definitely indispensable. Unfortunately we cannot get a measure of that using just this dataset.
  • The Kaggle community is not representative of the entire data science industry, and might be geared towards a more specialized sample in the population (i.e. machine learning enthusiasts). Also, this is US-focused, and it will be interesting to see how the results would look like across various countries.
  • While these are primarily visualizations driving storytelling, it will be interesting to get more deep and do a predictive model to quantify the drivers to salary.

Thanks for reading! All the code for the analysis and visualizations can be accessed through my github.

--

--