The Case for Data Lakes
In my last post, I briefly defined a data lake and described how it differs from a traditional data warehouse. Today I'll make the case for using one and offer a few words of caution before getting started.
Before I do, though, I'd like to take a little trip down memory lane. In my pre-author and -professor days, I frequently wrote complex reports from enterprise systems. At a high level, I can say three things about that data. First, much of the time, it was incomplete, duplicated, or flat-out wrong. Second, it was everywhere. The data typically lay in a number of different places: relational databases, legacy systems, business-intelligence applications, Microsoft Access databases, Excel spreadsheets, etc.
It'll only take a moment.