Future Imperfect & Software Stream of Consciousness : Bayesian Approach

A couple of years ago, I was searching for untapped horizons in data mining, and I came across a course given by Professor Wil van der Aalst where he pioneered the technology of business process mining from server event logs. Naturally I signed up for the course. It is and was a fascinating course, not only due to its in-depth and non-trivial treatment of gleaning knowledge from data, but for me, it got the creative juices flowing to think of where it could be applied elsewhere. I was so intrigued with the possibilities, that I created a Google Scholar Alert for Professor van der Aalst's publication. The latest Google alert was on January 31rst, and it was a paper entitled "Connecting databases with process mining". The link is here: http://repository.tue.nl/858271 It was this paper that triggered this article.

I am a huge proponent of AI, Machine Learning and Analytics. In Machine Learning, you gather large datasets, clean the data, section the data into smaller sets for training & evaluation, and then train an AI machine with hundreds, perhaps thousands of training epochs until the probability of gaining the sought-after knowledge crosses an appropriate threshold. Machine intelligence is a huge field of endeavor and it is progressing to be a major part of everyday life in all phases of life. However, it is time consuming to teach the machine and get it right. Professor van der Aalst's area of expertise can provide a better way. Let me explain:

My particular interest, is that I am building a semantic blockchain to record all of the data coupled to vehicles, autonomous or not. Blockchain of course, is an immutable data ledger that is true, autonomous itself in operation, disintermediates third parties and is outage-resistant. Autonomous vehicles will by law, be required to log every move, have records of their software revisions, and have records like post-crash behavior etc.

I immediately saw the possibilities of using this data. Suppose that you are in an autonomous vehicle and that vehicle has never been on a tricky roadway that you need to navigate to get to your destination. Your car doesn't know the route parameters, but thousands of other autonomous vehicles have, including many with your kind of operating system and software. With the connected car, your vehicle would know its GPS coordinates and query a system for the driving details for this piece of roadway that is unknown to the computer. Instead of intense computational ability required to navigate, a recipe with driving features could be downloaded.

Rather than garnering those instructions from repeated training epochs in machine learning, one could apply process mining to the logs to extract the knowledge required. There are already semantic methods of communicating processes, from decision trees to Petri nets, and if the general process were already known to the machine, it would reduce the computational load. As a matter of fact, each vehicle could have a process mining module to extract high level algorithms for the roads that it drives regularly. That in itself will reduce the computational load of the vehicles. It would know in advance, where the stop signs are, for example, and you won't have Youtube videos of self-driving cars going through red lights and stop signs.

It goes a lot further than autonomous vehicles. This concept of creating high level machine processes through event logs can be applied to such diverse fields from robotic manufacturing to cloud server monitoring and numerous fields where human operators or real world human judgement is required.

Process mining could either eliminate machine learning in a lot of instances, or it could supplement it, with a mix of technologies. The aim is the same, which is aggregating data into information and integrating information into knowledge, both for humans and machines.

This process mining business reminds me of the history behind Bayesian Inference. The Reverend Thomas Bayes discovered probability and prior belief equations. They sat on a dusty shelf for over 200 years and they were re-purposed for computer inference and machine intelligence. I think that Professor van der Aalst's methodologies will be re-purposed for things yet un-imagined, and it will not take 200 years to come to fruition.

While waiting for Honda Xcelerator in Silicon Valley to evaluate my latest disruptive auto tech pitch, I got a little weary of documenting the API and creating more entry points, so I was thinking about revenue streams and startups. I received the Warren Buffet biography for Christmas, and by coincidence, I came across a passage in the book where a startup was pitched to Warren. It gave me pause to think.

Warren had bought the Wall Street firm Salomon Brothers, and it was a problem-child investment. The company was caught up in treasury bond scandal, and Warren had to beg and plead with the government and regulators not to shut them down, and destroy his investment. As a mea culpa, heads had to roll, and one of the heads was John "JM" Meriwether. JM had reported the transgression of one his employees that caused the evolving scandal, and JM's superiors sat on the information without immediately reporting it to the regulators. After it was all said and done, JM was a victim as well because of his position, although he had no culpability in hiding the fraud. He left Salomon Brothers and started a hedge fund called Long Term Capital. He approached Warren Buffett to invest in it. It was Meriwethers' approach that got my attention.

Warren was still on good terms with JM after the DCBM (contractors and consultants know this term -- it is "Don't Come Back on Monday"). Although JM got the DCBM, he was still welcome at Warren's table. If you are in Warren's inner circle, you get invited to a steak dinner at Gorat's in Omaha -ha-ha Nebraska. JM had a history of arbitrage and trading at Salomon and he compiled the numerical results of his successes and failures while heading the arb team. If you know anything about statistics, now you should be able to at least start feeling the heat in terms of the Bayesian Approach.

Over the course of ingesting the finer bovine parts, JM pulled out a schedule to show Buffett different probabilities (another Bayesian bell rings) of results and how much money his hedge fund, Long Term, could make, based on those probabilities. Also in the schedule was the probabilities of various strategies involving small or large trades with different parameters of leveraged capital. To someone like me, the approach was brilliant. It was totally Bayesian and it provided some evidence of pro forma revenues other than wishful thinking and shots in the dark at a dart board.

Every venture capitalist knows that over 99.999% of the business plans that they receive, show pro forma revenues of over a million dollars after two years. It is almost a de rigueur feature of a business plan and pitch deck. And we all know almost all of them never hit that benchmark. Taking a Bayesian Approach to revenue forecasting could be a breath of fresh air to business plans, pitch decks and venture capitalism in general, even though it didn't work on Warren Buffett.

So what is the Bayesian Approach? Bayes’ theorem is named after Rev. Thomas Bayes (1701–1761), who first provided an equation that allows new evidence to update beliefs (Wikipedia). The formula in mathematical terms is given as:

P(A|B) = P(B|A) x P(A) / P(B)

Describing it in words goes like this: A and B are related events and the probability of B happening is not 0. The probability of A happening, given that B has happened = the probability that B will happen given A, times the probability of B, all divided the the probability of B.

It doesn't sound like much, but the Bayes formula has staggering implications. It solves practical questions that were unanswerable by any other means: the defenders of Captain Dreyfus used it to demonstrate his innocence in the Dreyfus spying affair; insurance actuaries used it to set rates; Alan Turing used it to decode the German Enigma cipher and arguably save the Allies from losing the Second World War; the U.S. Navy used it to search for a missing H-bomb and to locate Soviet subs; RAND Corporation used it to assess the likelihood of a nuclear accident; and Harvard and Chicago researchers used it to verify the authorship of the Federalist Papers (The Less Wrong Blog). It is also the basis of some machine learning and artificial intelligence.

I think that it is a brilliant strategy for demonstrating revenue possibilities for start-ups. You could take a pool of known customers, a customer conversion rate (which is a probability based on your efforts to date) coupled to a variety of strategies to converting them, coupled to a variety of probabilies of what they will pay, and if you have done your homework, you will come up with a believable, but less spectacular pro forma revenue statement for your startup.

While the approach is brilliant, it didn't work on Warren Buffett. Why? Warren & crew had this to say about it: "We thought that they were very smart people. But we were a little leery of the complexity and leverage of their business. We were very leery of being used as a sales lead. We knew that others would follow if we got in." (Munger - The Snowball). Warren thought that there was a flaw in the original premise of how they were going to use their leverage. He didn't want to be a Judas goat -- a wise old goat that is used for it entire lifetime to daily lead other goats to slaughter.

So while it didn't convince billionaire Buffett, taking a Bayesian approach to revenue forecasting for a startup, just might land you a round of financing.

Future Imperfect & Software Stream of Consciousness

Process Mining From Event Logs -- An Untapped Resource And Wave of The Future

How Not To Convince Warren Buffett - Bayesian Approach To Revenue Forecasting For Startups