An In-Depth Analysis of the Data Scientist Job Description
No one needs to tell you that big data is the future. When you see headlines suggesting data has surpassed oil in value and encounter once mind-boggling units of data like zettabytes and yottabytes being mentioned casually, the potential value contained in massive amounts of data becomes pretty clear.
You may wonder how you can be a part of that future. As a new data economy emerges, so do numerous data-focused occupations—data scientist being one of the most eye-catching. So what goes into this role? We analyzed thousands of data scientist job postings and asked the experts to weigh-in so you could get a better picture of a data scientist’s job description.
The role of data in business
Using data in a business setting is not a new concept. Even the crudest of ancient merchants could see the value in keeping track of their inventory and taking note of what was sold. Over the years we’ve learned to keep track of important factors that may influence a business and that information is used to identify trends and make informed decisions.
What’s changed is that we now have orders of magnitude more data. Every time you swipe your credit card, text your mom, or post on social media, you leave a digital crumb trail that others can follow. Couple that with massively expanded computing power and we’re able to use advanced mathematical analysis to identify trends and connections that would likely otherwise go unnoticed. So how does this most commonly take form for businesses? Let’s take a look at three common applications for data science in business.
This focus area is all about understanding the customer and delivering personalized content. Imagine what you could do with data that showed the interests, spending habits and favorite social media platforms of all your customers? If you’ve ever seen a Facebook ad for Starbucks after searching for the nearest location, you’ve seen data advertising in action.
Data science allows for even more far-flung connections to be made. For example, if you sell pizzas and know single parents from a certain region of the country are more likely to order pepperoni pizza on rainy days in the fall where they’ve traveled 75 miles or more, you may be able to push a 10 percent discount coupon to previous customers who fit the profile. While that’s maybe a stretch of an example, being able to make connections between seemingly unconnected things can produce more effective advertising efforts.
This application of data science encompasses a broad range of business activities. Data scientists can be employed to help develop methods for distilling massive amounts of market data to identify risks—are weather patterns suggesting there’s a good chance your supply chain will be disrupted? What will a dip in iron mining probably mean for the bottom line?
Banking and financial institutions also employ data scientists for risk management work—both for informing their potential investment decisions and for protecting their customers by using algorithms to detect and flag potentially fraudulent purchases.
Process improvement touches every industry. From shrimp farming to fashion week, data scientists are not only diagnosing issues but offering solutions. When managers waste valuable hours on scheduling, a data scientist can step in to provide automated scheduling systems. When quality candidates are scarce, algorithms easily scan hundreds of applicants in the time it would take you to review one. These and hundreds of other innovations prove that data science has already become indispensable to business.
A typical data scientist job description
Whether they are solving a problem or improving a process, data scientists must apply some common methods for manipulating data. One of the best ways to understand what they actually do is to walk through a typical job in data science.
Let’s say you work for a movie streaming service. You are tasked with creating a suggestion system in which users are recommended movies they might like. Sounds simple enough, right?
Collecting and processing
Before you even begin to interpret data, you’ve got to find it. It would be nice if you could magically download everything in a color-coordinated spreadsheet, but that’s just not how raw data works. The movie company will likely already have the information you need, but many data scientists must utilize public APIs, clickstream capture, web scraping, or third-party vendors just to get their data.
Once you have all of your files, they will probably have missing values, misspellings, duplications and other improperly parsed values—which means before you can do anything with the data, it must be cleaned. Movie titles and descriptions, usernames and information must be spelled right, organized in the proper columns or rows and free of error. This will likely be accomplished by some combination of cleaning software and manual sorting.
It might sound a bit tedious, but the process can be enjoyable for some. “It’s kind of magical to be able to take raw data that constitutes a deluge and make it understandable to everyone through mining and cleaning,” says Michal Dominik, manager of data science and analytics at Zety. “I found it surprising that I enjoyed transforming a dataset from an untidy to a tidy state more than anything else.”
Algorithms and coding
After the data is prepared, then you can begin to manipulate it. This step is often what people associate most with data science. It involves a lot of critical thinking because there is never just one way to solve a problem. In this instance, you could filter your data using a content-based or a collaborative method.
One approach could analyze the descriptions of movies a user has watched and suggest movies with similar descriptions. Again, it sounds simple, but you will have to account for words like the, a, and then so that the results are not skewed. You will also have to ensure that movies with longer descriptions are not weighted unevenly. All of this involves complex algorithms with parameters that must be translated into code, making proficiency in programming languages like Python®, R or Java® indispensable.
After you’ve tested, re-tested and eliminated all the bugs from your system, you will have successfully implemented a form of machine learning. This allows the system you create to learn from the data you input. So, when you watch Indiana Jones, the machine “learns” that you like action movies. While this is a very simplified example, the more data input and sources to analyze for connections, the better refined the results will be.
Visualization and communication
Having created a system that seems to reliably provide smart suggestions, you might think your job is done. But there’s still gold to be pulled from this system. You can now present information about the connections between users and the movies they watch. Do certain age groups prefer movies by a specific actor? How do less-expensive to acquire films fair with the demographics we’d like to expand our subscriber base to? Being able to translate data into actionable content—also known as data storytelling—is one of the most valuable skills a data scientist can have.
“There are a lot of data scientists who can hunch over a computer but who can’t communicate the results to people who have nothing to do with data,” Dominik says. “If you want to become a data scientist, learn analytics communication so you’re able to interpret models and relate them to outcomes that matter the most.”
Though your experience will vary depending on what company you actually work for and whether you choose to specialize, these skills are all important in a data scientist’s repertoire.
Beyond the data scientist job description
Like any career, duties and skills are only a portion of what the job will actually be. We talked to Derek Wilson, president and CEO of the consulting group CDO Advisors, to get a clear idea of the lesser-known challenges unique to data science.
“Business owners trust their gut more than data. This is how they have been running their business for years,” Wilson says. “Now you have data contradicting what they have ‘known’ for years, it’s going to take a lot of explaining to get them to change how they want to run the business.”
Though data science has been around for over 20 years and has climbed to the #1 ranking on Glassdoor’s “Best Jobs in America,” many have yet to see the potential of this profession. Wilson points to the fact that numerous organizations have yet to utilize data science. He cautions that these businesses can be very difficult to convince of its value.
“In organizations just getting started with data science, the hardest part is getting the business owners on board with the process and how to integrate data science outcomes into their business processes,” says Wilson.
Despite this challenge, however, he still enjoys working closely with commerce: “I like working directly with the business to determine possible cases that can have an impact quickly. This lets me understand their business process and enables them to see how data science really is a science.”
Wilson encourages the prospective data scientist by saying, “There is a lot of trial and error before getting to solutions that work” and offers this advice: “Make sure you know the business environment and challenges they face.”
Interested in a data-driven career?
Now that you have a better idea of what data science is all about and how these skills are applied in current business settings, you might want to learn more about the path to a data-focused career. A Bachelor’s degree in Data Analytics is certainly a great starting point—but you should know some advanced data science roles may prefer candidates with a master’s degree. If you’d like to learn more about the potential opportunities available to those with a Master of Science in Data Science degree, check out our article, “6 Data Science Careers You Could Launch with a Master’s Degree.”
Python is a registered trademark of The Python Software Foundation.
Java is a registered trademark of Oracle Corporation.