What New Data Scientists Need to Know for Their First Job

2098
Photo by MD Duran on Unsplash

By J.T. Wolohan, Data Scientist, Booz Allen Hamilton

I’m a lead data scientist at the consulting firm and federal contracting giant Booz Allen Hamilton—a leader in the analytics services space. I oversee a small team of data scientists—some of whom are fresh out of masters and undergrad programs—in a variety of analytics platform development and algorithm development projects, from text mining, to document similarity and retrieval, to operations, program and business analytics.

As an alumnus of Syracuse University, I’m happy to be able to share some advice with soon-to-be graduates looking to enter a data science role in industry. Having had the opportunity to watch new data scientists grow, I want to share the three things that separate the great data scientists from the good data scientists I work with—customer thinking, data munging, and DevOps—as well as some tips about what I look for when interviewing data scientists.

Customer Thinking

The first and most important skill that separates the best data scientists I work with from everyone else is customer thinking. Data science is a highly complex field. It is quickly evolving and requires a variety of technical knowledge—from coding to mathematics. This makes it really hard for customers to ask for what they need. A lot of data science tasks take the form of “I’m having this problem and I’ve got some data that should be able to fix it, what can you do?” The great data scientists always keep their focus on what’s going to be most impactful for their customers. Does the customer need better accuracy? Better recall? Do they need faster runtime? Do they need lots of options or just one? Customers needs must be incorporated into every step of the data science process.

Data Munging

The second skill that separates the best data scientists I know is the ability to work with a wide variety of data formats and types. The number of projects that involve well formatted data—never mind situations where you’ll have programmatic access via an API—are few and far between. Far more prevalent are the situations where the customer has a handful of Excel workbooks, knowledge about a database somewhere that they don’t have access to, and a piece of software from 2003 that doesn’t have a maintainer but is still being used.

To succeed in this environment, data scientists must be able to bring together data from a variety of formats. You should be able to rip data from PDFs, parse the XML from a convoluted document, unpack deeply-nested JSON from an obscure API, scrape data from idiosyncratic webpages and bring all of that information together into a format that disguises all that messiness.

DevOps

The final skill that separates great data scientists is an awareness of DevOps. Now, data scientists by no means need to be DevOps / continuous delivery experts—but analytics need to be deployed to the customer somehow and DevOps is that how. An ability to wrap your models up as APIs in a deployable Docker container goes a long way.

Of course, this means you’ll need to know how to build APIs too—and it will help to have an understanding of microservices. There’s a lot of ideas to unpack there, but the key is this: data science models and trained machine learning algorithms get deployed as pieces of a system. That piece needs to be self-contained so that if it needs to be switched out it can be switched out easily, or if the parts around it change the model itself doesn’t need to be redesigned.

DevOps, Containers, APIs and microservices are the modern way of handling all that.

What do Data Science Interviewers Look For?

I’ve covered the three skills that separate the great new data scientists from the rest of the new data scientists – but even if you have all the skills you need to succeed at the job, you will still need to ace an interview (or a few) to land the job. Here are three tips from my experience interviewing young data scientists.

First, tech skills matter – so make sure you know the technology that’s listed in the job description. That said, sometimes you’ll only need to have knowledge of a type of technology (e.g., Hadoop OR Spark for Big Data analytics.) This is especially true with less-technical groups. Additionally, feel free to say you don’t know the answer to a question. Technology is complex and you won’t be an expert in everything.

Read the source article in InfoSpace, blog of the U of Syracuse iSchool.