All Things Techie With Huge, Unstructured, Intuitive Leaps

A Possible Technology Solution To Fake News



I just read an article in Finance Monthly about how fake news is killing the economy, and it is really disturbing to me, how various sectors of the political scene can assert a total untruth and have thousands of sheeple believe it to be gospel. I keep thinking that surely we are better as a people and more enlightened, but apparently that is not the case. We seemed to have regressed over the last couple of years. Fake news has turned into a monster, and certain classes of people cannot find direction their moral compass enough to fight it. So maybe technology can offer some solutions.

Thanks to some of my friends at @Microsoft , I once read a bulletin put out by them with a title that resonated with me. It was the differentiation between attackers and defenders. To be successful in fake news, you cannot just merely defend the truth, you have to attack the fake stuff. There is a difference between those two actions. Defenders use lists in their arsenal of tools. Attackers use graphs. That was the Microsoft assertion and it is true. All techies will have nodded when they have read the sentence about attackers using graphs.

When I talk about graphs, I don't mean those pretty pictures that Excel puts out of profit and loss, or the rise of the price of Bitcoin over the last year. I mean graph in the sense of mathematical graph theory. If you aren't up on this, let me explain. A graph is a theoretical structure amounting to a set of objects or ideas of which some pairs of objects or ideas are related. How they are related (or the lines connecting them) are called edges. The objects or ideas are called nodes or vertices. Graphs are a part of discrete mathematics that can translate easily into real life scenarios. A picture is worth a thousand words, so here is a picture of a graph.


You will notice in the above graph there are things (nouns) which are the vertices or nodes and there are states (is, lives, has) which are the edges. Edges have properties. and the properties can have sub-properties. A sub-property in this case is that the edge is directed with an arrowhead. This makes the information in the graph semantic -- or composed of meaning that is apparent by the structure of the graph. The nice part is that there are discrete mathematical methods for traversing the graph and extracting not only data, but knowledge. A graph is capable of creating a level of abstraction. For example, the discrete data is a news story. A level of abstraction is the assertion or inference that the particular news story is fake news.

When fake news appears, the defenders of the truth manage it in list form. Here is their list:


  • How many untruths are there? List them.
  • Find countervailing documented data to refute.
  • Do for all untruths.
  • Come to conclusion that the whole article is fake news.

In the meantime, the originator of the fake news uses a more complex graph-like function to promulgate the fake news. It starts with an inconvenient truth that is pejorative and an attempt is made to neutralize it. First they must define the audience that is ready, willing and able to uncritically accept any falsehood. They must also craft the "alternative facts" to be plausible, at least possible if high improbable. Then they have to find the opportunity and the medium to place the fake news. This involves a network of perfidy that is a graph of the underbelly of spreading falsehoods for personal, pecuniary or political gain.

So, the solution is that there must be an impartial, balanced methodology of determining and labeling fake news. This is the nub of the problem. Other problems are that the sheer volume of news coming out, makes human content moderation almost an impossible job, unless you have deep pockets like Google or Facebook. Although, from past experience, Facebook is for sale to anyone who wants to buy ad space, Russian trolls and democracy-destabilizers and all. The obvious answer is machine learning and artificial intelligence monitoring and labeling fake news. You can't suppress fake news, no matter how egregious the lies are, because of the First Amendment and Freedom of Speech, but you can label it with the Scarlet Letter of fake news and those who cite it, are obviously lacking some cognitive ability.

What does the Fake News BS Detector technology stack look like? First you have to give the system some context for current events. This is where AI comes in. Graphs have to be created and semantically understood. Luckily for this, we have wonderful graph databases. My current favorite is @Neo4j. Some of the graphs that your AI machine will create will be something like this:

CREATE (djt:Person {name:"Donald J. Trump"})
RETURN djt
MATCH (djt:Person {name:"Donald J. Trump"})
CREATE (djt)-[status: HOLDS_OFFICE] -> (potus:Position {name:"President"} 
RETURN djt, status, potus

The above happens to be a simplistic example of Cypher, the language used to create graphs in Neo4j. You get the idea. The AI machine does lexical, syntactical and semantic analysis to create the graphs,

So you run the AI machine, and you get a bunch of graphs. I was a little stumped as to how to teach the machine true from false one the semantic analysis was complete. You need some human intervention somewhere at the beginning of the cycle, and I wondered how to do it. However, just recently, I read a seminal article by Dimitri De Jonghe about Curated Governance with Stake Machines and the light bulb switched on, and I got the Eureka moment.

I wasn't totally unfamiliar with Dimitri. He is one of the key members of the @BigchainDB team, and I had communicated with him on smart contracts, and they graciously granted me access to the Github on smart contracts before it was released.

The article on Curated Governance with Stake Machines is a perfect example of how our lives will be tokenized by blockchain. Essentially what you do is steer token holders to earn more tokens by curating items (graphs) that are or are not fake. The token holders themselves have their opinion rated by reputation and bias that are empiricized by the curation automata. Essentially, you have created a token-curated registry of graphs. These curators could be reporters and news media types, just like Reddit editors. Let me quote from the article linked above: So long as there are parties which would desire to be curated into a given list, a market can exist in which the incentives of rational, self-interested token holders are aligned towards curating a list of high quality.

Naturally these verified graphs would be stored in a data-centric blockchain like BigchainDB, which could also handle the tokenization of the curation. The data payload of BigchainDB is well suited to textual or tuple key:pair representation of graphs.

Now onto the automatic part. Suppose you built the machine and Twitter bought it to scan posted items and put a red stop sign icon if it is fake news. You have the consensus of the curators for a graph, for example, that Entity: "Russia" -> Action: "Interfered" -> Object: "US Election 2016". The fake news article is read by the platform and feed into the stack. The algorithm to check for fakeness can be a method like Latent Dirichlet Allocation. This throws data as a document at the platform and allows the platform to sort it out, as opposed to having a manual model. If you are a techie and have done eCommerce recommender systems, you will see that this is similar matrix factorization models. If the previous sentence is Greek to you, essentially you have a matrix where the rows are documents and the columns are words. These matrices are not exactly a sequence of words, but rather of the index of the words found in either the nodes or edges of graphs that you already have. Thus, you can calculate a probability (known as a Bayesian process) of the new item being fake or not. This methodology is a generative model, meaning that you can generate examples of fake and real news and it knows the difference.

This type of architecture can be extended as the number of meme and graphs grow, using an algorithm called Hierarchical Dirichlet Process where the number of topics chooses itself automatically and grows according to the input data (that can be assisted token curated when necessary) via non-parametric machine learning.

These ideas need some research and development, but they could point to a way where we have "trusted" news adjudicated by machines that were "taught" by trusted token curated registries.

We really need to do something about how we have degenerated as a human species from the ethical and altruistic, moral high ground of the truth, to a third of the American people willing to believe lies in spite of what rational evidence tells them. Perhaps it is time for the machines to step in.

No comments:

Post a Comment