“Errors using inadequate data are much less than those using no data at all.”
A few weeks ago, I wrote about the difficulty in defining Big Data. In that way, it is like obscenity and porn. I stand by that position, although general consensus does exist around the three V’s of Big Data: volume, velocity, and variety. In short, more data from more sources are coming at us faster than ever. But is there a fourth V?
IBM and Oxford’s Said School of Business recently published the results of a global Big Data survey of “more than 1,110 business and IT professionals in 95 countries.” Among its key findings, fewer than half of the organizations surveyed collect and analyze data from social media. I’m not shocked at that discovering. From personal experience, I’ve seen organizations use Websense and other filters on corporate networks to prevent employees from wasting time on “The Twitter” and Facebook. Of course, in a world of BYOD and smartphones, it’s an exercise in futility. If you want to tweet on company time, you’ll find a way.
So, why do so many organizations ignore such a potentially valuable data source? As Peter Cohan writes in the aforementioned Forbes’ piece:
One reason is they don’t know how to manage data uncertainty that goes hand in hand with the fourth V in the IBM study, Veracity—information about “weather, the economy, or the sentiment and truthfulness of people expressed on social networks.”
With unstructured and semi-structured data, it’s time to think different.
In other words, because there’s so much noise around social media, organizations don’t attempt to find the signal. By this rationale, no data is better than mostly inaccurate data. Charles Babbage is rolling over in his grave.
Simon Says: Think Different
This line of thinking is hooey. Yes, most tweets, Facebook likes, Google +1’s, and other unstructured forms of data probably don’t mean that much to your organization, department, and team. None of that matters. Increasingly affordable Big Data solutions help organizations dial up that signal and reduce that noise. There’s a veritable gold mine out there, even if you have to sift through some dirt and sand to find it.
Moreover, traditional notions of data quality, master data, and data integrity are rooted in structured, transactional data. Every field in every record should be accurate and have meaning. With unstructured and semi-structured data, however, it’s time to think different. (This is especially true with data from social networks.) Sentiment–and degree of sentiment–are never going to be precise. Sufficiently large sample sizes and the law of large numbers obviate the need for precision.
What say you?
I wrote this post as part of the IBM for Midsize Business program.