All Things Techie With Huge, Unstructured, Intuitive Leaps
Showing posts with label the next big thing. Show all posts
Showing posts with label the next big thing. Show all posts

An End To Dangerous Big Data Stalking


You are being stalked. Every website that you visit may add a stalker in the form of tracking cookies to your browser. They know where you have been.  And with just a modicum of inference they know who you are.

This web tracking is pervasive. It all goes into a big database. If for some reason, you enter your name on a form, and the form is transmitted to the website in what is known as an HTTP Post, they will harvest your name. But even without your name, they will know what demographic you belong to. They will know your financial standing and how much you earn. They will know what music you listen to and what clothes you buy. And all of this information is processed without the benefit of human eyes sorting and classifying this data. Machine Learning is pervasive.

But here is what is most dangerous about these stalkers.  They can make the wrong inference, and put you on a watch list that may be impossible to get off, or you may not even know about.  Here is a scenario that could make you a terrorist according to Big Data and Machine Learning.

You are sipping your morning coffee looking at Facebook, and you see a heartbreaking picture of a child caught in the clutches of war in the Middle East.  You "Like" the photo.  Then it is time for you to go to the airport. You are flying business class and are given a choice of food. There are Halal meals. You are an adventurous foodie, so you tick it to try it.   Coupled to that, is that you have an aisle seat.  Then you check your Twitter feed.  Someone posts about "Freedom of Religion",  You favorite the tweet. In the business section of a European website, you see the add for a hedge fund that promises great returns. You click for more information.  What you don't know, is that you have put the Big Data Digital Stalkers into overdrive, and you are now a person of interest to several agencies.

As it turns out, the photo that you "Liked" was posted by a terrorist group to garner sympathy.  All of the "Likes" are collected as possible links to these terrorists. You are in another database because you chose Halal food instead of the bacon cheeseburger.  The aisle seat is problematic. Hijackers do not take window seats.  The "Freedom of Religion" tweet was sponsored by the Muslim Anti-Defamation League. Into another database you go.  The hedge fund promising great returns is headquartered in the Cayman Islands. The IRS is suddenly interested in you.

The most dangerous thing about Big Data Stalkers, that that they make Bayesian Inferences which are probabilities.  Probabilities are just that. They are not certainty. Even with a 99% probability, the next event in the sample space could be wrong -- not what the probability predicts.  Machine Learning and Big Data Stalkers are a clear and present danger to personal privacy.

The other intrusion on your life from Big Data Stalking is the stuff done with commercial enterprises. They aim to learn absolutely everything they can about you, because they can sell that data.  Big Data can produce new or enhanced revenue streams.  Is there a way out of this?

I say that there can be.  With a paradigm shift, the consumers of Big Data can get what they want, and your privacy can be protected. How you ask? With a little dash of technology.

Let's suppose that you turn the tables and consent to limited data tracking. That data tracking is now bowdlerized, meaning that sensitive personal stuff is obfuscated or removed. This is done by an app on your device, cell phone, tablet or computer.  Then you are paid for that data to the highest bidder.  Everyone is happy, and you the consumer benefit from the data collection.

As for the other stuff, technology can help too.  I am a huge proponent of Artificial Intelligence.  Suppose that you had a proxy entity digital assistant called Blocker.  Blocker would surf the web for you, executing your Likes and Dislikes while retaining your anonymity. Blocker would run on a proxy service, so that even IP addresses would be hidden. On top of that, it would surf in anonymous mode.  If there wasn't any personal user data to be had, your privacy would be protected. The data flow wouldn't entirely be impeded because through content analysis, you could still make pretty good inferences of the humans behind any wall. For example, a grandma living in Norway wouldn't be listening to rap music, but her grandson might be.

So, with a bit of different thinking, we can mitigate the dangers of Big Data Stalkers. The unfortunate thing, is that many denizens of the Internet, do know or don't care about the Stalkers.

The Semantic Web and a Possible Rules Engine that Rocks

The entry below on the putative consciousness of Google got me to thinking about "The Semantic Web". It was/is an initiative of W3C to make all web pages machine readable.

A good example of making dumb web pages smart is the "Apples for Sale" example. Picture this. An HTML web page has apples for sale. It is a simple page. There is a picture of an apple, a piece of text that says "Apples For Sale". Another piece of text that says $1.00 and another piece of text that says "Each". A machine reading that web page HTML would not know that it was a commerce page offering something for sale. It would not know that $1.00 is the price. It would not know that apples is the object being offered for sale and it would not know that each is the unit relating to price per unit.

The Semantic Web would change all that. It would mark-up a web page to associate all the stuff with the HTML so that a machine could sort through it.

A few years back, the "next big thing" was a rules engine. A rules engine would be incorporated into an application, and if the business rules change, you wouldn't have to change your application. You would just change a rules file that the rules engine read.

I used a rules engine for a network policy tool that decided which server would provide what services in a LAN. I expected rules engines to progress a lot further, but they have become sidelines rather than mainstream.

How a rules engine fits into the semantic web, is that a Rules Interchange Format is part of the infrastructure of the semantic web. One must agree on rules if machines are to read and understand web pages. Rules engines can be predictive or reactive (forward chaining or backwards chaining). For example, a forward chaining rules engine calculates loan risk during a credit application while a backwards chaining rules engine tells humans or other machines when inventory items are getting low.

Rules engines have not been widely used, and in my shortsighted humble opinion, it is because they are bulky, non-intuitive and put a performance hit on applications. However, I may have an algorithm for a rules engines that rocks.

Consider the following code. It is part of the The Rule Interchange Format (RIF) which is the W3C Recommendation:

Prefix(ex )
(* ex:rule_1 *)
Forall ?customer ?purchasesYTD (
If And( ?customer#ex:Customer
?customer[ex:purchasesYTD->?purchasesYTD]
External(pred:numeric-greater-than(?purchasesYTD 5000)) )
Then Do( Modify(?customer[ex:status->"Gold"]) ) )

The RIF is entirely based on "If ..... (some condition) .... then .... (do this)". What this bit of Rules Interchange Code does, is for a commercial entity to check each customer's year-to-date purchases and if they are greater than $5,000, then upgrade their status to "Gold".

The thought struck me, that one could have a rules engine that operated directly on the database. It would parse the RIF language and automagically convert it to SQL. (I will race you to the patent office on this idea).

My rules engine would create an SQL statement that would create a cursor with "Select * from CustomerTable where "YearToDate" total > '5000.00'. Then I would loop through the cursor and update the status to gold.

The great thing about this, is that this rules engine that rocks, would revolutionize data-mining and database reporting. The more that I think about it, the more that I am convinced that this could be the NEXT BIG THING in data mining.

And as for the Semantic Web, in my opinion it is a no-go. Who is going to mark-up a few billion pages that are already out there? Also the entire history of the Internet won't be re-worked so it will be useless to the semantic web. I see this function being done at a single point at the web server level, which will have context engines to recognize stuff and mark up the page as they serve it up. Now that is a workable plan.

I'd write more on this, but I have to open up an IDE and test this rules engine idea. Later.