Odds are that you’ve heard the increasingly trendy business term data scientist. To be sure, there’s no shortage of myths around them, but I have yet to meet very many people who can answer in plain English, “Just what does a data scientist do, actually?”
It’s a question that, to answer properly, requires more questions.
Do they even exist or are they unicorns? What does one look like, anyway?
Yes. Here you go.
Of which sub-fields does data science consist?
Data science is a bouillabaisse of a number of other related quantitative and technical disciplines. These include: math, dataviz, statistics, data engineering, pattern recognition and learning, advanced computing, uncertainty modeling, data warehousing, and high-performance computing (HPC).
Is supply keeping up with demand?
In short, no. This is why they command large salaries.
While the numbers vary, there’s general consensus that we need more of them—many more. The vaunted management firm McKinsey believes that “The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise …to understand and make decisions based on the analysis of Big Data.”
Here’s more proof from Indeed.com:
Is there any overlap between the modern data scientist and a business analyst?
Yes, but there are real differences—including job growth:
Put differently, the difference is not simply a matter of nomenclature. These are not one and the same. As I wrote in Too Big to Ignore:
By tapping into these varied disciplines, data scientists are able to extract meaning from data in innovative ways. Not only can they answer the questions that currently vex organizations, they can find better ones to ask.
This might seem a bit abstract, so let’s make it more concrete.
A traditional data or business analyst typically examines data from a single source. Perhaps this is a CRM or ERP application. By way of contrast, a data scientist will go deeper. S/he will explore and examine data from sources often external to the enterprise. These may include social data, linked data, and open data. (For more on the latter, see my interview with Joel Gurin on his excellent book Open Data Now.)
What about the tools?
It’s the difference between checkers and chess. A business analyst will almost always use Microsoft Excel or Access. Data scientists will use far more powerful and predictive tools like R and Python.
Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.
— Josh Wills (@josh_wills) May 3, 2012
Does a data scientist need to know Bayes’ theorem?
Is strong business acumen required?
Absolutely. It’s a moral imperative.
What other non-technical skills are essential?
Data science is a bouillabaisse of a number of other related quantitative and technical disciplines.
The best data scientists excel at critical thinking. The job entails—in fact, requires—a high degree of human judgment. This goes double when selecting and defining the problem. It’s downright false to claim that data scientists are slaves to computers, automation, and data. As IBM puts it, “they will pick the right problems that have the most value to the organization.” Excellent communications and presentation skills are also sine qua nons. Being able to determine “the answer” means nothing if you can’t explain it effectively to upper management and laypersons.
For more on this, check out a great post on Priceonomics.
IBM sponsored this post.