Just what is Big Data anyways? Loosely defined as a “term used to describe data sets so large and complex that they become awkward to work with using standard statistical software,” this ends up describing much of the data the industry deals with on a daily basis. In reality however, much of the issue is not so much in finding the right tools to analyze the data (though certainly the in-memory tools become a bottleneck), but rather one of acquiring, augmenting and managing the data itself for the long term. In common cases, the data you might have may not be “Big” on its’ own, but when coupled with the necessary datasets to properly understand and analyze your data, one finds that the existing tools and hardware slow to a crawl or downright fail. Frequently as well, analysis becomes limited to manageable segments of time, where the possibility of missing a larger trend or underlying cause becomes much more likely than if the entire set were available for analysis. Your data may not have started out very big ten years ago, but it certainly seems like it is heading that way.
Also apparent, is the fact that with the advent of technologies like Hadoop, the mere presence of the capabilities to store big data, have triggered a need to leverage these tools and collect still more data than would ever have been collected before, with the hope that there will be gold hidden somewhere in the accumulated mountains of data. As business grows, so do the metrics used to measure that business, and all of the data behind those measures has to live somewhere.
The reality is that most organizations are data-driven, not data-centric, and though the temptation exists to build massive infrastructures around handling this data in the same way that internal IT managed data before, the rules have changed. The massive upfront capital expenditures, along with the strong technical teams needed to properly deploy big data solutions limits the ability of all but few to properly design, build, and manage a big data store over time. Further, when looking at ROI, the high costs of maintaining teams combining DevOps, Data Scientists and Business analysts, along side the necessary infrastructure improvements to deal with big data problems alone, becomes harder and harder to justify.
This is precisely why the Cloud comes into play. With Software/Infrastructure/