There is a dark side to big data. It is personal privacy. There are obvious privacy risks for the accidental or intended disclosure of collected "hard", personal data, but to my way of thinking the real danger is from derived or predictive data using mathematical constructs like Bayesian Inference and other tools. Using large datasets, these tools are melded into business intelligence cubes that work wonders in improving the bottom line, but violate privacy in a fundamental way in the sense that they are predicting human behaviors based on inferential probability, that may have a large degree of error in individual cases, yet are useful enough on a macro scale to improve the bottom line. A good example of this are credit scores. Just because 80 percent of people employing action A with action B tend to default on loans 55 percent more than people who do not exhibit those behaviors, doesn't mean that the entire population demographic will default, yet they are judged as if they all will.
The real danger of this predictive stuff comes from aggregators who combine predictive data with actual personal data and sell it to other companies. Judgements will made that may be untrue, but may result in denial of things like college entrance, handgun ownership, club memberships, professional certifications, career choices (suppose that you are of a certain height and the data says that people of that height do not do well in a particular professional sport. Yet we all know stories of the little guy who could.) and other life events where some sort of body has authority over certain aspects of our lives.
One of the current thrusts of Big Data, is to find non-intuitive behavioral predictors. For example we have heard of Target Department Stores sending pregnancy coupons to a 15 year old girl. Her parents threw a fit, until they discovered that their daughter was actually pregnant. Target figured it out using probabilities and finding a correlation of beauty products and vitamins leading to buying pregnancy stuff five months later in a certain demographic. Supermarkets have long known to put beer and diapers together on a Saturday, and it results in a large increase in sales. (Wife sends hubby to store for diapers, but the big game will be on later on in the weekend and the hubbies buddies are coming over.) All this is fine and dandy because it happens on an anonymous level, but when this sort of predictive stuff is applied with identifying data, it could become dangerous.
What is a CIO or CTO to do? To my way of thinking, the chief responsibility is to management, shareholders and the bottom line, and not to the privacy of the masses. Business is the last venue of civilized men for uncivilized warfare, and as a result, I am predicting a further erosion of privacy from Big Data. It is a force majeure, an unstoppable tsunami of assaults against our privacy that will rival any effort of the NSA or any other organization intent on cataloging the behaviors of the masses.