Big Data, Little Clarity
© John Gress/Corbis
Big data -- the process of looking for patterns in data sets so large they resist traditional methods of analysis -- rates big buzz in the boardroom these days [source: Arthur]. But is bigger always better?
It's a rule that's drummed into most researchers in their first stats class: When encountering a sea of data, resist the urge to go on a fishing expedition. Given enough data, patience and methodological leeway, correlations are almost inevitable, if unethical and largely useless.
After all, the mere correlation between two variables does not imply causation; nor does it, in many cases, point to much of a relationship. For one thing, researchers cannot use statistical measures of correlation willy-nilly; each contains certain assumptions and limitations that fishing expeditions too often ignore, to say nothing of the hidden variables, sampling problems and flaws in interpretation that can gum up a poorly designed study.
Granted, big data has its uses. Inventory control thrives on discovering purchasing patterns, however mysterious their underlying causes. To take a somewhat creepy example, Target has used purchasing patterns to identify pregnant customers and then send them targeted coupons [sources: Duhigg; Hill; Taylor]. So enjoy that rewards card -- and 10 percent off your prenatal vitamins -- but don't expect too much out of big data in the causality department.