First, it may be helpful to answer a fundamental question: What do data scientists do, exactly? Predictably, just about every task a professional in this field takes on has to do with data. Data scientists collect, clean, analyze and visualize massive quantities of information that they then use to distill and communicate valuable business insights to company decision-makers. These experts build machine learning structures and data products that empower organizations to better understand and make more effective decisions for their operations and customers.
None of these tasks could ever, by any stretch of the imagination, be considered easy. But the work that data scientists do is critical for a business sector that increasingly relies on big data to drive performance.
According to research conducted by the multinational professional services company Accenture, 79 percent of enterprise executives agree that any organization that does not incorporate big data into their growth strategy would lose their competitive edge and potentially go out of business. Researchers also noted that 83 percent of companies surveyed had pursued big data projects to become more competitive. Given this, it’s not surprising that a different study published in 2018 by Wikibon suggested that revenues for the global big data market would increase from a high of $42 billion in the studied year to an incredible $103 billion by 2027.
In this big-data-centric environment, data scientists are more than useful — they’re crucial to business success. If you’re interested in exploring data’s potential, you could join their professional ranks and set the groundwork for a personally and professionally rewarding career. But, you’ll need to meet the technical and interpersonal qualifications that data scientists require.
Part 1: Technical Data Scientist Skills
Let’s get technical.
Any and every data scientist has undergone an extensive period of training and gained a strong knowledge foundation in data science. The hard fact of the matter is that data scientists face some of the most stringent educational requirements of any IT-related profession.
Data published by IT Career Finder reveals that roughly 40 percent of data scientist positions require an advanced degree such as a master’s or Ph.D. However, some others may be open to candidates with only a bachelor’s degree in Math, Statistics, Economics, Engineering or Computer Science. If aspiring data scientists really want to home in on a specialty and boost their resume above their competitors’, they might also opt into targeted training programs or boot camps in analytical disciplines like predictive analytics, data mining or database management.
Eventually, most data scientists do choose to specialize. According to one analysis written for the Harvard Business Review, these experts tend to fall into one of three categories in their later-stage careers:
- Business Intelligence: This category involves organizing company data into easy-to-understand dashboards, reports and emails.
- Decision Science: These specialists focus on using data to help companies make smarter, well-supported business strategy decisions.
- Machine Learning: Data scientists in this vertical build and apply data science models to perpetually gather information and further business operations.
All this said, aspiring data scientists must build a foundation of necessary technical skills before branching off into one subspecialty or another. Below, we’ve listed a few that you must perfect before you venture out into the job market.
1. Data Visualization
Data visualization is a critical part of any data scientist’s day-to-day work. With this skill, analytics professionals can turn intimidating walls of numerical and textual information into more accessible charts, maps and graphs. These illustrations empower people who lack advanced technical training — say, for example, team leaders and company decision-makers — to quickly grasp trends and data patterns without too much additional explanation.
Science writer Betsy Mason describes data visualization as “storytelling with a purpose”:
“Imagine a science textbook without images,” Mason suggests in an article for Knowable Magazine. “No charts, no graphs, no illustrations or diagrams with arrows and labels. The science would be a lot harder to understand…If you’ve ever stared at a massive spreadsheet of data and couldn’t see a trend, you know how much more effective a visualization can be.”
The ability to visualize data is an absolute necessity for aspiring data scientists. After all, if you can’t share the insights you’ve gleaned from data, you may as well have never discovered them in the first place.
Python is the programming language to beat in the data science world. Towards Data Science reports that in 2018, a whopping 66 percent of surveyed data scientists claimed to use Python daily. The language also topped IEEE Spectrum’s poll as the best programming language available to analytics professionals in 2019. It does offer a few notable perks; for instance, NumPy — one of Python’s most-used libraries — hosts a wide variety of high-level mathematical functions and provides support for expansive, multi-dimensional arrays.
It is also worth noting that researchers for the IEEE Spectrum survey reported that Python’s popularity was “driven in no small part by the vast number of specialized libraries available for it, particularly in the domain of artificial intelligence.”
This point is crucial. All aspiring data scientists need to have at least a basic understanding of AI-adjacent skills. Artificial intelligence has seen incredible growth over the last several years: Recent research from Gartner indicates that the rate of enterprise-level AI deployment rose over 270 percent in the four years between 2015 and 2019. As Gartner’s Research Vice President concluded in a write-up on the study, “If you are a CIO and your organization doesn’t use AI, chances are high that your competitors do and this should be a concern.”
Given AI’s fast-paced expansion and growing importance in the tech sector, having an understanding of AI-adjacent tools like Python is an absolute necessity.
Structured Query language, or SQL (pronounced interchangeably as SEE-quel or es-que-EL by those in the know), is a must-know programming language for data scientists.
SQL offers a means to manipulate and query data in relational databases — and is so well-used that American National Standards Institute has affirmed it as the standard language for relational database management systems. It’s easy to use and all but ubiquitous in data analytics work. Odds are, you won’t find a data science position that doesn’t require you to use SQL at least once in a while.
That said, SQL isn’t the be-all-end-all of databases. Aspiring data scientists should also know how to productively interact with non-relational (NoSQL) data stores when necessary. For context: NoSQL databases organize data in non-relational ways and tend to be simpler in design than their SQL counterparts. They also provide finer control over availability and more flexibility than strictly-linear relational database tables typically do.
If you want to gain a better understanding of NoSQL databases, it may help to familiarize yourself with a popular framework like MongoDB, which rejects relational organization in favor of a flexible, hierarchy-based model.
As you might imagine, SQL and NoSQL databases both have their place in data science applications. Whether a data team chooses one over the other hinges entirely on the challenge at hand; both skills are necessary for aspiring data scientists.
4. Social Media Mining
Social media mining refers to the process of excavating data from social media platforms like Facebook, Twitter, Instagram, etc. Skilled data scientists can use this data to identify useful patterns and distill insights that a business can then use to develop a greater understanding of an audience’s preferences and social media behaviors. This kind of analysis is crucial to developing an enterprise-level social media marketing strategy.
Given social media’s importance in day-to-day business and its potential to stick around for the long term, developing greater social media data mining skills is certainly a good idea for aspiring data scientists.
5. Fundamental Statistics
When it comes to building the essential skill set for a career in data science, there are few skills more important than statistics. From a high level, statistics involves the gathering, organization, analysis and interpretation of data — all points that facilitate the daily practices of data science. A thorough understanding of statistical principles also empowers data scientists to create mathematical and statistical models for their data; without it, data scientists would struggle to gain a full understanding of the data they are responsible for analyzing. As writers for Elite Data Science note in an article on the matter, “Data analysis requires descriptive statistics and probability theory, at a minimum.”
More realistically, though, aspiring data scientists should have a working knowledge of several more statistical concepts, including probability, statistical significance, regression and hypothesis testing. Those who have an interest in working on AI applications should also look into the mathematical principle that backs it: Bayesian Thinking. Bayesian Thinking is a philosophy that centers on the idea that beliefs should be updated as a person gathers additional data.
Regardless of which ideas you choose to pursue, however, a general understanding of statistics and statistical thinking is an absolute must-have for skilled data scientists.
6. Natural Language Processing/Machine Learning
It might seem odd to say, but it’s true: computers do have a language — and sometimes, they even need a translator.
Natural language processing (NLP) is a subfield of artificial intelligence that strives to bridge the gap between human language and machine understanding. To borrow a quote from SAS Insights on the subject, “Natural language processing helps computers communicate with humans in their own language and scales other language-related tasks. For example, NLP makes it possible for computers to read text, hear speech, interpret it, measure sentiment and determine which parts are important.”
As you can probably guess, NLP is critical to advancing AI functionality. However, it has other purposes, too. It helps machines gain a better ability to parse text and organize data in a meaningful way, and it exponentially increases data scientists’ ability to analyze large quantities of data effectively. Any data scientist who wants to break into AI development or machine learning should at least consider honing their NLP skills.
7. Microsoft Excel
It’s true, Excel may seem like an odd — even clunky — application to include in a list of advanced data scientist skills, but it truly is necessary for success in the field. Contrary to what you might expect from something included in Microsoft’s seemingly basic Office bundle, Excel is a critical tool for many data scientists. It sports its own programming language, VBA, as well as a tool pack (creatively named Analysis ToolPak) that contains valuable aids for statistical modeling and data analysis.
Data scientists trained in Excel can use VBA to develop macros: pre-recorded commands that can make routine, frequently-performed tasks like updating payroll, accounting or project management significantly easier for their human administrators. Excel also provides access to PivotTable, a tool which allows data scientists to quickly assess and distill conclusions from raw data.
So yes, an odd choice it might be — but learning it would certainly be to your benefit!
8. High-Level Math
“It always pays to know the machinery under the hood [rather] than being just the guy behind the wheel,” data scientist Tirthajyoti Sarkar once wrote in an article for Towards Data Science.
Sarkar has a point. Advanced mathematics truly is the backbone of data science. After all, mathematical philosophies underpin practical techniques and drive technological development. While it’s theoretically possible to be a capable analyst without a solid background in high-level math, ignorance of data science’s underlying mathematical principles can hold you back. To continue Sarkar’s metaphor, a hobbyist driver doesn’t know how to restart their car when it breaks down on the road. A similar idea holds in data science.
Moreover, it’s important to note that many organizations will not hire a data scientist who lacks a foundational understanding of advanced mathematical principles. One tech writer for Analytics Vidhya puts the matter bluntly in an article on the subject.
“Let’s get this out of the way right now,” he writes. “You need to understand the mathematics behind machine learning algorithms to become a data scientist. There is no way around it. It is an intrinsic part of a data scientist’s role, and every recruiter and experienced machine learning professional will vouch for this.”
So, what mathematical pursuits should be on a data scientist’s skill list? Basic linear algebra and multivariable calculus offer a decent jumping-off point — but further exploration is always encouraged.
Part 2: Non-Technical Data Scientist Skills
Let’s say, for argument’s sake, that you’re the best data scientist the job market has ever seen. You’re technically adept, well-educated and bring a plethora of creative ideas and opinions to the table. But, for some reason, professional doors keep closing in your face. Employers decide that they don’t want to invest in your towering data science skills; teams try to expel you from their ranks.
Why are they so hostile, if your portfolio of hard skills is so well-packed? The issue might lie in your non-technical skills — or, rather, the lack thereof. Here are the three “softer” skills that you should prioritize when preparing for a career in data science.
If you flip through job postings on sites like Indeed or Monster, you’ll see one phrase repeated like clockwork: must work well in a team.
Contrary to what pop culture might tell you, most scientific groups — including those in data science — don’t rely on a single, brilliant thinker to enact forward progress. The cohesion and collaborative power of a team are usually more important than the intelligence or creativity of any one member. If you don’t play well with others or think that you don’t need help from your teammates, you won’t contribute to success. If anything, your toxic attitude may lead the team to stress, lower levels of achievement and failure.
In 2015, Harvard researchers discovered that even “modest” levels of toxic employee behavior could increase turnover, lessen employee morale and decrease team performance. Eighty percent of surveyed employees reported that they lost time worrying about coworker incivility. Seventy-eight said that their commitment to their employer had dropped because of toxicity, and 66 percent said that their performance had declined.
Here’s the truth: it’s far more productive and rewarding to be a team player than it is to be a solo act. Work on your collaboration skills, and both you and your team will benefit from the returns!
Capable data scientists need to know how to convey the conclusions they gather from data. If you don’t have the skills necessary to translate technical jargon into plain English, it won’t matter how valuable your findings are — your audience won’t understand them.
Communication is one of the most critical skills a data scientist can develop — and often one, as it turns out, that professionals struggle with most. One 2017 survey that sought to identify the barriers that data scientists faced most often at work found that the majority of hurdles were non-technical. A few of the top seven obstacles included categories such as “explaining data science to others,” “lack of management/financial support,” and “results not used by decision-makers.” All of these point to an issue in communication, not technical know-how.
If you can’t convey, you fail — so, learn how to translate! Practice simplifying complex ideas into digestible explanations; work to convince your audience of a point, rather than give a dry report.
11. Business Savvy
Sure, you can start explaining obscure mathematical theory on-demand — but can you explain how that theory could be used to drive business forward? It’s true, data scientists need to have a thorough understanding of their discipline and have a firm foundation of technical skills. But at the end of the day, if you are using those skills to further a corporate goal, you need to have some degree of business acumen, as well.
Taking a few business classes won’t just help you better bridge the gap between your data scientist peers and business-minded leaders. It could also help you better apply the technical skills you have to developing helpful strategic insights for your employer.
Working in data science is a professionally and personally rewarding endeavor — but you need to put in the time to develop your skill set before you can hope to advance. So, get to work! Start building the educational foundation you need for a robust and long-lasting career in data science.