“I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description [“hard-core pornography”]; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that.” [Emphasis added.]
—Justice Potter Stewart, concurring opinion in Jacobellis v. Ohio 378 U.S. 184 (1964), regarding possible obscenity in The Lovers.
What Big Data Is Not
The above quote comes from perhaps the most famous of all U.S. Supreme Court cases. The line “I’ll know it when I see it” has stood the test of time. What’s more, those seven words illustrate a number of things, not the least of which is the difficulty that even really smart people have in defining ostensibly simple terms.
Fast forward 48 years and many learned folks are having the same issue with respect to Big Data. Just what the heck is it, anyway?
Much like the term cloud computing, you can search in vain for days for “the right” definition of Big Data. I’d argue that such a definition doesn’t exist. Who can say with absolute certainty that one definition of the term is objectively better than another?
Much like obscenity or pornography, perhaps Big Data is actually best defined against its inverse–i.e., that which it is not. In that vein, I love this definition from The Register:
Big Data is any data that doesn’t fit well into tables and that generally responds poorly to manipulation by SQL.
[T]he most important feature of Big Data is its structure, with different classes of big data having very different structures.
With that definition, we can start to look at examples. A Twitter feed is Big Data; the census isn’t. Images, graphical traces, Call Detail Records (CDRs) from telecoms companies, web logs, social data, and radio-frequency identification (RFID) output all fall under the umbrella of Big Data. (Don’t just think of lists of your employees, customers, products, etc.)
Big Data is about so much more than merely buying, downloading, and/or deploying a new tool.
Is this a bit techie for most folks? Sure and, while instructive, it’s hardly perfect. I’m sure that someone, somewhere out there has created a spreadsheet, database table, or flat file that contains the following fields:
- Twitter handle
- Time of tweet
- Date of tweet
- Actual tweet
- Hashtags
As a general rule, though, these traditional data management tools aren’t enough to truly harness the power of Big Data. Microsoft Excel, Access, and even traditional relational databases just don’t cut it.
Simon Says: Big Data is a mind-set.
While we’re on the subject of tools, Big Data is about so much more than merely buying, downloading, and/or deploying a new tool. Yes, Big Data requires new tools like Hadoop. You’re not going to get sentiment analysis out of SELECT statements.
More important, though, Big Data necessitates new mind-set. Regardless of your own personal definition of the term, don’t make the mistake of assuming that heretofore methods and applications are sufficient. They’re not.
I wrote this post as part of the IBM for Midsize Business program.
0 Comments
Trackbacks/Pingbacks