Is Hadoop secure?
It’s a fair question, and one that many CIOs are asking themselves as they realize that Big Data is here to stay. There are major benefits to consolidating data sources, but what are the drawbacks?
About a year ago, ComputerWorld ran a story about Hadoop and security issues. From the piece comes the following quote from Richard Clayton, a software engineer with Berico Technologies, an IT services contractor for federal agencies:
Aggregating data into one environment also increases the risk of data theft and accidental disclosure.
You’ll get no argument from me that there’s a potential downside to one-stop shopping. After all, if all of an organization’s data is in one place, then it’s theoretically easier to steal, right? And I’d be silly to completely discount legitimate security concerns like this one. There are enough bad guys out there without helping them out.
Silos Aren’t the Solution
But what’s the alternative? For some, the answer can be summed up in two words: data silos. And this just grinds my gears.
I have encountered my fair share of tricky data issues throughout my consulting career. None has infuriated me more than data silos, primarily because of the other problems that the cause. Exhibit A: lack of master data (read: a single version of the truth.) For a long time, I dismissed master data management because I felt that all transactional data should be stored in an organization’s system of record. Period.
Data silos just grind my gears.
Now, I’ve softened my stance on that issue over the years, but I still can’t stomach data silos. I recognize that they can be valuable, as does my friend Jim Harris. Harris writes on his blog that “data silos are bad when different business units are redundantly storing and maintaining their own private copies of the same data, but data silos are good when they are used to protect sensitive data that should not be shared.”
Fair enough, but put me squarely in the anti-silo camp. In my view, the costs of data silos far exceed their benefits. It’s kind of like taking a very strong grip in golf to correct for a nasty slice. You’re “fixing” one problem by adding another.
Why not attempt to beef up security on operational and analytic systems throughout the enterprise? That way, Hadoop or a data warehouse contains both comprehensive and accurate data protected from troublemakers.
What say you?
I wrote this post as part of the IBM for Midsize Business program.