Machine Learning Engineer vs. Data Scientist—Who Does What?


The roles of machine learning engineer vs. data scientist are both relatively new and can seem to blur. However, if you parse things out and examine the semantics, the distinctions become clear.

At a high level, we’re talking about scientists and engineers. While a scientist needs to fully understand the, well, science behind their work, an engineer is tasked with building something.

But before we go any further, let’s address the difference between machine learning and data science.

It starts with having a solid definition of artificial intelligence. This term was first coined by John McCarthy in 1956 to discuss and develop the concept of “thinking machines,” which included the following:

  • Automata theory
  • Complex information processing
  • Cybernetics

Approximately six decades later, artificial intelligence is now perceived to be a sub-field of computer science where computer systems are developed to perform tasks that would typically demand human intervention. These include:

  • Decision-making
  • Speech recognition
  • Translation between languages
  • Visual perception

Machine learning is a branch of artificial intelligence where a class of data-driven algorithms enables software applications to become highly accurate in predicting outcomes without any need for explicit programming.

The basic premise here is to develop algorithms that can receive input data and leverage statistical models to predict an output while updating outputs as new data becomes available.

The processes involved have a lot in common with predictive modeling and data mining. This is because both approaches demand one to search through the data to identify patterns and adjust the program accordingly.

Most of us have experienced machine learning in action in one form or another. If you have shopped on Amazon or watched something on Netflix, those personalized (product or movie) recommendations are machine learning in action.

Data science can be described as the description, prediction, and causal inference from both structured and unstructured data. This discipline helps individuals and enterprises make better business decisions.

It’s also a study of where data originates, what it represents, and how it could be transformed into a valuable resource. To achieve the latter, a massive amount of data has to be mined to identify patterns to help businesses:

  • Gain a competitive advantage
  • Identify new market opportunities
  • Increase efficiencies
  • Rein in costs

The field of data science employs computer science disciplines like mathematics and statistics and incorporates techniques like data mining, cluster analysis, visualization, and—yes—machine learning.

Having said all of that, this post aims to answer the following questions:

  • Machine learning engineer vs. data scientist: what degree do they need?
  • Machine learning engineer vs. data scientist: what do they actually do?
  • Machine learning engineer vs. data scientist: what’s the average salary?
Machine Learning Engineer vs. Data Scientist: What They Do

As mentioned above, there are some similarities when it comes to the roles of machine learning engineers and data scientists.

However, if you look at the two roles as members of the same team, a data scientist does the statistical analysis required to determine which machine learning approach to use, then they model the algorithm and prototype it for testing. At that point, a machine learning engineer takes the prototyped model and makes it work in a production environment at scale.

Going back to the scientist vs. engineer split, a machine learning engineer isn’t necessarily expected to understand the predictive models and their underlying mathematics the way a data scientist is. A machine learning engineer is, however, expected to master the software tools that make these models usable.

What Does a Machine Learning Engineer Do?

Machine learning engineers sit at the intersection of software engineering and data science. They leverage big data tools and programming frameworks to ensure that the raw data gathered from data pipelines are redefined as data science models that are ready to scale as needed.

Machine learning engineers feed data into models defined by data scientists. They’re also responsible for taking theoretical data science models and helping scale them out to production-level models that can handle terabytes of real-time data.

Machine learning engineers also build programs that control computers and robots. The algorithms developed by machine learning engineers enable a machine to identify patterns in its own programming data and teach itself to understand commands and even think for itself.

What Does a Data Scientist Do?

When a business needs to answer a question or solve a problem, they turn to a data scientist to gather, process, and derive valuable insights from the data. Whenever data scientists are hired by an organization, they will explore all aspects of the business and develop programs using programming languages like Java to perform robust analytics.

They will also use online experiments along with other methods to help businesses achieve sustainable growth. Additionally, they can develop personalized data products to help companies better understand themselves and their customers to make better business decisions.

As previously mentioned, data scientists focus on the statistical analysis and research needed to determine which machine learning approach to use, then they model the algorithm and prototype it for testing.

What Do the Experts Say?

Springboard recently asked two working professionals for their definitions of machine learning engineer vs. data scientist.

Mansha Mahtani, a data scientist at Instagram, said:

“Given both professions are relatively new, there tends to be a little bit of fluidity on how you define what a machine learning engineer is and what a data scientist is. My experience has been that machine learning engineers tend to write production-level code. For example, if you were a machine learning engineer creating a product to give recommendations to the user, you’d be actually writing live code that would eventually reach your user. The data scientist would be probably part of that process—maybe helping the machine learning engineer determine what are the features that go into that model—but usually data scientists tend to be a little bit more ad hoc to drive a business decision as opposed to writing production-level code.”

Shubhankar Jain, a machine learning engineer at SurveyMonkey, said:

“A data scientist today would primarily be responsible for translating this business problem of, for example, we want to figure out what product we should sell next to our customers if they’ve already bought a product from us. And translating that business problem into more of a technical model and being able to then output a model that can take in a certain set of attributes about a customer and then spit out some sort of result. An ML engineer would probably then take that model that this data scientist developed and integrate it in with the rest of the company’s platform—and that could involve building, say, an API around this model so that it can be served and consumed, and then being able to maintain the integrity and quality of this model so that it continues to serve really accurate predictions.”

Read the source post on the Springboard Blog.