“If you torture the data long enough, it will confess.”
The new year has arrived. As such, it’s tough to call Big Data new anymore. (It’s been on my radar for a tad under five years, and Wiley published Too Big to Ignore in March of 2012.) Recent technologies such as Hadoop have matured during that time. Still, tech alone only gets us so far. The need for education on the topic is as strong as ever—if not more so.
Put differently, there’s no shortage of widely held myths around Big Data. Perhaps the most dangerous is that Big Data knows all and that it obviates the need for human judgment. The almighty “data” will unequivocally tell us what do do and when and how to do it. In this way, data is like Gabbo from The Simpsons.
Nothing could be further from the truth, but don’t take my word for it.
The Essential and Oft-Ignored Human Element of Big Data
I’ll let that sink in for a moment.
As anyone with a modicum of statistics knowledge knows, even mature, ostensibly “objective” statistical and quantitative methods such as regression analysis don’t run themselves, even on small datasets. They require key human elements (read: judgment and decision making). This is why there’s a world of difference between an analyst and a true data scientist. Because of this, they are far from perfect. With regard to regressions, frequent errors from newbies include:
- Neglecting key independent variables.
- Stating that a relationship exists among variables when one does not (and vice-versa).
- Getting the causal chain completely wrong. (For instance, saying that A causes B when B causes A.)
What’s more, we make these mistakes both inadvertently and intentionally. (For more this, see Eli Pariser’s excellent book The Filter Bubble [affiliate link].)
This begs the question, How do we square this circle? How can we realize the legitimate benefits of Big Data while minimizing the chance for error?
Simon Says: Big Data and confirmation bias go hand in hand.
Big Data does not obviate the need for human judgment.
Remember the following when getting started with Big Data. First, recognize that confirmation bias is alive and well. If you’re intent on finding something, you will. The more important question is, What else are you missing?
Second, question everything. Far too few people are willing to go where the data takes them. This is especially pronounced as they ascend to senior levels within organizations. Many senior folks are loathe to challenge preexisting assumptions and to question what they know. At the same time, though, Big Data does not negate or minimize the importance of intuition.
What say you?