All Things Techie With Huge, Unstructured, Intuitive Leaps

Giving The Shaft To Data Mining And Obsfucating IBM & Twitter's Privacy Intrusion on Your Life


Those b*st*rds are going too far. Even though I am a data miner, I have a great concern as a data privacy advocate. Essentially Twitter & IBM are teaming up to mine your Twitter Stream to monetize your posts. They will take your tweets and try to sell crap to you, or worse, sell your data to other companies.

Here's how it will work. If you post that your mother died, you will see a crematorium or undertaking ads. Tweet about spending some time in the hospital, and you might pay a higher health insurance premium because they will sell that info to insurance companies.  The same about driving fast. Tweet about your kid going to college, and you will get a full court press on everything from college choices to clothes for university life.

It sucks. It just isn't right. You have three choices.  You can vote with your feet and leave Twitter. I have already left Facebook and LinkedIn. Twitter is my last stand.

You can carry on, but in a previous blog post, I mentioned that the most dangerous thing about Big Data Mining, is that data mining can make assumptions about you that simply aren't true, and you may be categorized into a list that you don't want to be on. It could affect your job, your security clearance, your credit score or who knows what.

You could self-censor, but censorship is wrong, even self-censoring.

I like the last option - f*ck with the machine learning, and deep learning and data-mining.  How? Obfuscate.  Here are a few things that I will do.

1) Disable all location services for tweets.
2) Disable all location services that your smart phone takes. It writes the location into the EXIF data. It also writes date and time and camera type, etc.
3) Google for a free EXIF editor, and remove all EXIF data from your pics.
4) Do not put your actual location in your bio. For example, I follow a dude, who's location is : Where I Have To Be
5) Put in a fake town where you live. If you have a dog named Rover, put down that you live in Roverville.  You can still keep your same state.
6) Never use your middle name or initial. It's just one more authentication factor.
7) When social media streams are mined using NLP or Natural Language Processing, an important part of that is finding "possessive determiners".  Don't use them.  Possessive Determiners are words like my, your, her, etc.  If you tweet "Its my birthday", even the dumbest NLP data mining machine can pick it up. However if you say "Welcome to Birthdayville, Population Me", not even the smartest NLP machine can pick that up. Get rid of possessive determiners in your Tweets.
8) Practice Typoglycemia.  http://en.wikipedia.org/wiki/Typoglycemia  Here is an example that would totally screw up a deep learning machine:

"I cdn'uolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg: the phaonmneel pweor of the hmuan mnid. Aoccdrnig to a rseearch taem at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Scuh a cdonition is arppoiatrely cllaed Typoglycemia .
"Amzanig huh? Yaeh and you awlyas thguoht slpeling was ipmorantt."

9) User slang. If your gas pedal foot itches to drive a BMW, call it a beamer or a beemer and don't capitalize the word.

10) Use alternate spelling. Ime a bygg phan of Neel Yoongs mewsic.

11) Throw in rand o m   s pac es   in yo ur  sente nce.  Or e*ven the od*d star will do.

12) Never tweet your age, your spouse or partner (I see married to @sweetiePie all the time) or any other information.  It is okay to list your employment of academic institution and that leave a lot of room to fool the NLP machines if you work at the Big Blue, or teach @ the Yard (thanks to the Harvard profs that follow me -- appreciate it).

Using these simple tips will cause the data mining and perceptrons scanning your feed to take a pass on what you type. Now is the time to bowdlerize or obfuscate your account.

I think that the bigger answer, is to startup a new hybrid of Twitter and Facebook that guarantees information privacy. But in the meantime, let's be careful out there as to what we post.  And remember, its not that difficult to deke out smart machines.


No comments:

Post a Comment