Future Imperfect & Software Stream of Consciousness : Bayesian inference

Showing posts with label Bayesian inference. Show all posts

How Not To Convince Warren Buffett - Bayesian Approach To Revenue Forecasting For Startups

P(A|B) = P(B|A) x P(A) / P(B)

Describing it in words goes like this: A and B are related events and the probability of B happening is not 0. The probability of A happening, given that B has happened = the probability that B will happen given A, times the probability of B, all divided the the probability of B.

It doesn't sound like much, but the Bayes formula has staggering implications. It solves practical questions that were unanswerable by any other means: the defenders of Captain Dreyfus used it to demonstrate his innocence in the Dreyfus spying affair; insurance actuaries used it to set rates; Alan Turing used it to decode the German Enigma cipher and arguably save the Allies from losing the Second World War; the U.S. Navy used it to search for a missing H-bomb and to locate Soviet subs; RAND Corporation used it to assess the likelihood of a nuclear accident; and Harvard and Chicago researchers used it to verify the authorship of the Federalist Papers (The Less Wrong Blog). It is also the basis of some machine learning and artificial intelligence.

I think that it is a brilliant strategy for demonstrating revenue possibilities for start-ups. You could take a pool of known customers, a customer conversion rate (which is a probability based on your efforts to date) coupled to a variety of strategies to converting them, coupled to a variety of probabilies of what they will pay, and if you have done your homework, you will come up with a believable, but less spectacular pro forma revenue statement for your startup.

While the approach is brilliant, it didn't work on Warren Buffett. Why? Warren & crew had this to say about it: "We thought that they were very smart people. But we were a little leery of the complexity and leverage of their business. We were very leery of being used as a sales lead. We knew that others would follow if we got in." (Munger - The Snowball). Warren thought that there was a flaw in the original premise of how they were going to use their leverage. He didn't want to be a Judas goat -- a wise old goat that is used for it entire lifetime to daily lead other goats to slaughter.

So while it didn't convince billionaire Buffett, taking a Bayesian approach to revenue forecasting for a startup, just might land you a round of financing.

How Not To Convince Warren Buffett - Bayesian Approach To Revenue Forecasting For Startups

While waiting for Honda Xcelerator in Silicon Valley to evaluate my latest disruptive auto tech pitch, I got a little weary of documenting the API and creating more entry points, so I was thinking about revenue streams and startups. I received the Warren Buffet biography for Christmas, and by coincidence, I came across a passage in the book where a startup was pitched to Warren. It gave me pause to think.

Warren had bought the Wall Street firm Salomon Brothers, and it was a problem-child investment. The company was caught up in treasury bond scandal, and Warren had to beg and plead with the government and regulators not to shut them down, and destroy his investment. As a mea culpa, heads had to roll, and one of the heads was John "JM" Meriwether. JM had reported the transgression of one his employees that caused the evolving scandal, and JM's superiors sat on the information without immediately reporting it to the regulators. After it was all said and done, JM was a victim as well because of his position, although he had no culpability in hiding the fraud. He left Salomon Brothers and started a hedge fund called Long Term Capital. He approached Warren Buffett to invest in it. It was Meriwethers' approach that got my attention.

Warren was still on good terms with JM after the DCBM (contractors and consultants know this term -- it is "Don't Come Back on Monday"). Although JM got the DCBM, he was still welcome at Warren's table. If you are in Warren's inner circle, you get invited to a steak dinner at Gorat's in Omaha -ha-ha Nebraska. JM had a history of arbitrage and trading at Salomon and he compiled the numerical results of his successes and failures while heading the arb team. If you know anything about statistics, now you should be able to at least start feeling the heat in terms of the Bayesian Approach.

Over the course of ingesting the finer bovine parts, JM pulled out a schedule to show Buffett different probabilities (another Bayesian bell rings) of results and how much money his hedge fund, Long Term, could make, based on those probabilities. Also in the schedule was the probabilities of various strategies involving small or large trades with different parameters of leveraged capital. To someone like me, the approach was brilliant. It was totally Bayesian and it provided some evidence of pro forma revenues other than wishful thinking and shots in the dark at a dart board.

Every venture capitalist knows that over 99.999% of the business plans that they receive, show pro forma revenues of over a million dollars after two years. It is almost a de rigueur feature of a business plan and pitch deck. And we all know almost all of them never hit that benchmark. Taking a Bayesian Approach to revenue forecasting could be a breath of fresh air to business plans, pitch decks and venture capitalism in general, even though it didn't work on Warren Buffett.

So what is the Bayesian Approach? Bayes’ theorem is named after Rev. Thomas Bayes (1701–1761), who first provided an equation that allows new evidence to update beliefs (Wikipedia). The formula in mathematical terms is given as:

P(A|B) = P(B|A) x P(A) / P(B)

Describing it in words goes like this: A and B are related events and the probability of B happening is not 0. The probability of A happening, given that B has happened = the probability that B will happen given A, times the probability of B, all divided the the probability of B.

It doesn't sound like much, but the Bayes formula has staggering implications. It solves practical questions that were unanswerable by any other means: the defenders of Captain Dreyfus used it to demonstrate his innocence in the Dreyfus spying affair; insurance actuaries used it to set rates; Alan Turing used it to decode the German Enigma cipher and arguably save the Allies from losing the Second World War; the U.S. Navy used it to search for a missing H-bomb and to locate Soviet subs; RAND Corporation used it to assess the likelihood of a nuclear accident; and Harvard and Chicago researchers used it to verify the authorship of the Federalist Papers (The Less Wrong Blog). It is also the basis of some machine learning and artificial intelligence.

I think that it is a brilliant strategy for demonstrating revenue possibilities for start-ups. You could take a pool of known customers, a customer conversion rate (which is a probability based on your efforts to date) coupled to a variety of strategies to converting them, coupled to a variety of probabilies of what they will pay, and if you have done your homework, you will come up with a believable, but less spectacular pro forma revenue statement for your startup.

While the approach is brilliant, it didn't work on Warren Buffett. Why? Warren & crew had this to say about it: "We thought that they were very smart people. But we were a little leery of the complexity and leverage of their business. We were very leery of being used as a sales lead. We knew that others would follow if we got in." (Munger - The Snowball). Warren thought that there was a flaw in the original premise of how they were going to use their leverage. He didn't want to be a Judas goat -- a wise old goat that is used for it entire lifetime to daily lead other goats to slaughter.

So while it didn't convince billionaire Buffett, taking a Bayesian approach to revenue forecasting for a startup, just might land you a round of financing.

An End To Dangerous Big Data Stalking

You are being stalked. Every website that you visit may add a stalker in the form of tracking cookies to your browser. They know where you have been. And with just a modicum of inference they know who you are.

This web tracking is pervasive. It all goes into a big database. If for some reason, you enter your name on a form, and the form is transmitted to the website in what is known as an HTTP Post, they will harvest your name. But even without your name, they will know what demographic you belong to. They will know your financial standing and how much you earn. They will know what music you listen to and what clothes you buy. And all of this information is processed without the benefit of human eyes sorting and classifying this data. Machine Learning is pervasive.

But here is what is most dangerous about these stalkers. They can make the wrong inference, and put you on a watch list that may be impossible to get off, or you may not even know about. Here is a scenario that could make you a terrorist according to Big Data and Machine Learning.

You are sipping your morning coffee looking at Facebook, and you see a heartbreaking picture of a child caught in the clutches of war in the Middle East. You "Like" the photo. Then it is time for you to go to the airport. You are flying business class and are given a choice of food. There are Halal meals. You are an adventurous foodie, so you tick it to try it. Coupled to that, is that you have an aisle seat. Then you check your Twitter feed. Someone posts about "Freedom of Religion", You favorite the tweet. In the business section of a European website, you see the add for a hedge fund that promises great returns. You click for more information. What you don't know, is that you have put the Big Data Digital Stalkers into overdrive, and you are now a person of interest to several agencies.

As it turns out, the photo that you "Liked" was posted by a terrorist group to garner sympathy. All of the "Likes" are collected as possible links to these terrorists. You are in another database because you chose Halal food instead of the bacon cheeseburger. The aisle seat is problematic. Hijackers do not take window seats. The "Freedom of Religion" tweet was sponsored by the Muslim Anti-Defamation League. Into another database you go. The hedge fund promising great returns is headquartered in the Cayman Islands. The IRS is suddenly interested in you.

The most dangerous thing about Big Data Stalkers, that that they make Bayesian Inferences which are probabilities. Probabilities are just that. They are not certainty. Even with a 99% probability, the next event in the sample space could be wrong -- not what the probability predicts. Machine Learning and Big Data Stalkers are a clear and present danger to personal privacy.

The other intrusion on your life from Big Data Stalking is the stuff done with commercial enterprises. They aim to learn absolutely everything they can about you, because they can sell that data. Big Data can produce new or enhanced revenue streams. Is there a way out of this?

I say that there can be. With a paradigm shift, the consumers of Big Data can get what they want, and your privacy can be protected. How you ask? With a little dash of technology.

Let's suppose that you turn the tables and consent to limited data tracking. That data tracking is now bowdlerized, meaning that sensitive personal stuff is obfuscated or removed. This is done by an app on your device, cell phone, tablet or computer. Then you are paid for that data to the highest bidder. Everyone is happy, and you the consumer benefit from the data collection.

As for the other stuff, technology can help too. I am a huge proponent of Artificial Intelligence. Suppose that you had a proxy entity digital assistant called Blocker. Blocker would surf the web for you, executing your Likes and Dislikes while retaining your anonymity. Blocker would run on a proxy service, so that even IP addresses would be hidden. On top of that, it would surf in anonymous mode. If there wasn't any personal user data to be had, your privacy would be protected. The data flow wouldn't entirely be impeded because through content analysis, you could still make pretty good inferences of the humans behind any wall. For example, a grandma living in Norway wouldn't be listening to rap music, but her grandson might be.

So, with a bit of different thinking, we can mitigate the dangers of Big Data Stalkers. The unfortunate thing, is that many denizens of the Internet, do know or don't care about the Stalkers.

A Returned-Probability Artificial Neural Network - The Quantum Artificial Neural Network

Artificial Neural Networks associated with Deep Learning, Machine Learning using supervised and unsupervised learning are fairly good at figuring out deterministic things. For example they can find an open door for a robot to enter. They can find patterns in a given matrix or collection, or field.

However, sometimes there is no evident computability function. In other words, suppose that you are looking at an event or action that results from a whole bunch of unknown things, with a random bit of chaos thrown in. It is impossible to derive a computable function without years of study and knowing the underlying principles. And even then, it still may be impossible to quantify with an equation, regression formula or such.

But Artificial Neural Nets can be trained to identify things without actually knowing anything about the background causes. If you have a training set with the answers or results of size k (k being a series of cases), then you can always train your Artificial Neural Networks or Multilayer Perceptrons on k-1 sets, and evaluate how well you are doing with the last set. You measure the error rate and back propagate, and off you go to another training epoch if necessary.

This is happening with predicting solar flares and the resultant chaos that it cause with electronics and radio communications when these solar winds hit the earth. Here is a link to the article, where ANN does the predicting:

http://www.dailymail.co.uk/sciencetech/article-2919263/The-computer-predict-SUN-AI-forecasts-devastating-solar-flares-knock-power-grids-Earth.html

In this case, the ANN's have shown that there is a relationship between vector magnetic fields of the surface of the sun, the solar atmosphere and solar flares. That's all well and dandy for deterministic events, but what if the determinism was a probability and not a direct causal relationship mapped to its input parameters? What if there were other unknown or unknownable influence factors?

That's were you need an ANN (Artificial Neural Network) to return a probability as the hypothesis value. This is an easy task for a stats package working on database tables, churning out averages, probabilities, degrees of confidence, standard deviations etc, but I am left wondering if it could be done internally in the guts of the artificial neuron.

The artificial neuron is pretty basic. It sums up all of the inputs and biases multiplied by their weights, and feeds the result to an activation function. It does this many times over in many layers. What if you could encode the guts of the neuron to spit out the probability of the results of what is being inputted? What if somehow you changed the inner workings of the perceptron or neuron to calculate the probability. It seems to me that the activation function is somehow ideally suited to adaptation to do this, because it can be constructed to deliver an activation value of between 0 and 1, which matches probability notation.

Our human brains work well with fuzziness in our chaotic world. We unconsciously map patterns and assign probabilities to them. There is another word for fuzzy values. It is a "quantum" property. The more you know about one property of an object, the less you know about another. Fuzziness. The great leap forward for Artificial Neural Networks, is to become Quantum and deliver a probability. Once we can get an Artificial Neural Net machine to determine probability, then we can apply Bayesian mechanics. That's when it can make inferences, and get a computer on the road to thinking from first principles -- by things that it has learned by itself.

Elliott Wave Principle Re-visited With Computer Trading

I write software. I'm pretty good at it. My strength lies not in streamlined code, but in algorithms. Any code monkey can write code. Most coders today could NOT write an object sorter using recursion and recursion is kiddie scripting in some of the functions that I code.

The real aim of the game is to not to get paid for writing software, but to write software that makes money. A quant is exactly that:

quant

[kwahnt]

noun

Business Slang . an expert in quantitative analysis.

So, one of the ways to write software to make money, is to develop trading software for stocks, bonds, derivatives and Forex. Everyone has their own proprietary technical analysis trading software but they all start with Weighted Moving Averages and all sorts of statistical charting and apparent correlations that you give signals when to buy and sell.

The great granddaddy of them all is the Elliott Wave Principle. If you don't know what the Elliott Wave Principle is, you can read about it HERE.

A typical Elliott Wave pattern stock price looks like this:

Elliott developed his ideas over 60 some years ago, and I idly wondered if Elliott Wave Patterns were still valid in this day and age of computer trading. Would computer trades at split seconds skew an Elliott Wave pattern if and when they occur? (The reason why I say "if" is that determining the milestones of the Elliott Wave pattern is a very subjective thing. Many technical analysts try to debunk the principle and its adherents swear by it.)

So the burning question is and was: Is there something to the Elliott Wave, and how has computerized trading changed the Elliott Wave, if at all?

To do that, I needed some data, and not just large time domain general data. I wanted data points demarcated by seconds, not days or hours. After all computers trade by the second. So I captured the real live second by second trading of Facebook on its opening IPO where volumes were shattered but the price remained flat.

Here is a sample of that data:

To prevent subjective interpretation, I wrote a computer object -- a model of the wave that was magnitude agnostic (meaning that I was just searching for the pattern and didn't care about the price). One of the biggest problems with the Elliott Wave is interpretation and where does one begin to count for the wave pattern. I let the computer do that for me. If the signal (serial stock price changes) didn't fit the pattern, I advanced to the next data point, and tried again. I have to say that the results were pretty dismal.

Then it struck me -- I needed an "ish" engine on this. I have previously discussed "ish" on this blog. It is a form of fuzzy logic that can ignore the odd outlier whilst still identifying the pattern. I used the ish engine to categorize wildly divergent answer schemes of health surveys in Nigeria. Once I incorporated the ish engine into my model, I started to get many more hits where I did identify the Elliott Wave pattern.

To answer the question of how computerized trading was affecting the analysis principle, I had to collect models of the deviation of the Elliot Wave. The first thing that the ish engine picked up, was that computerized trading injected many more outliers that were in fact intermediate steps in the pattern. From a macro perspective, the Elliott Wave still sort-of resembled the pattern, but on a micro level, the fractal pattern was different, and like fractals, this was carried over onto the larger pattern.

Here is a graphic illustration of the outliers where intermediate steps are introduced into the wave pattern:

Instead of going from 1 to 2, now there is a 1A step inserted into the pattern. This was when I tested for 1 deviation per step.

Then I allowed the computer to test for two deviations per step. Now one can see two outliers as the wave progresses from 2 to 2A to 2B to 3. This is so simple to do when you have a computer object that models the wave and allows for ish or deviation. One can run many many epochs (data sessions) over and over again and change the parameters each time.

If one thinks of the wave as a series of vectors, then one begins to see how a direction vector can be incorporated into the ish engine or fuzzy logic. Let's suppose that Talib is right (and I am sure that he is) and there is a lot more randomness than one suspects. My posit was that computerized trading is responsible for generating the randomness.

When I altered the wave model to accommodate a deviation in the direction of the vector component in the wave, the computer came up with a model that was topless:

So, I now had models that the computer had saved. The next step was to assign Bayesian Probabilities to each model. The first injection of Bayesian probability was for predictive effect. Based on where I was at the moment, what magnitude and direction of the price vector would happen next? Then I determine the probability of which overall model that it will fit. From there one can make larger price determinations. Incidentally, no-fit is also an outcome in this model, where there simply isn't a pattern.

What's the next step? The next step is to introduce artificial intelligence multi-layer perceptrons as a fall-through model to analyze the price signal in real time. Then the perceptrons keep correcting themselves based on real time outcomes.

Can this updated algorithm score alpha and make money on stocks, futures, derivatives and Forex? I don't know yet, but I am too busy earning a living to take this to the next step. Are there any fund managers out there willing to fund a research project with the updated Elliott Wave coupled to fuzzy logic, artificial intelligence and Bayesian Inference?

The End of the Line for the Business Intelligence Cube?

I was deep in conversation with a tech-savvy epidemiologist at a dinner party. He is a physician who is the head of an NGO (non-government organization) with offices in various countries on a few continents. He happened to mention that he had over a million record sets that needed data-mining in a very specific way.

His organization had ascertained that the easiest way to convey epidemic data to policy makers was via a 'weather map' where the geographic areas that were in the greatest danger would progress from green to yellow to red when a full blown epidemic developed. To that end they created a data-mining tool for reports. However there was one major flaw with the tool. It could only show results after the fact and didn't perform predictions. Predictions are important for epidemiologists.

I suggested that what his data mining gizmo needed was a Bayesian Inference Engine. Bayesian Inference principles are used for logical inference and prediction on imperfect data sets. A Bayesian operation takes historical data, and calculates the probabilities of a number of events of happening when their predecessor events have taken place. Bayesian inference is a tool in the arsenal of artificial intelligence. It is the perfect tool for running predictions on evolving data. In an epidemic situation, data evolves rapidly. One cannot wait until it is all said and done to run the analysis.

I described to my medical friend how one would make a real time inference engine. Before any row of data is inserted into a database, an inference factory instantiates an inference object. The inference object is used to either look up the probabilistic meta-data for the permutations and combinations of the columns in the row of data (it examines each data dimension) and recalculates the inferential probability with the input of the new data. The output is filtered and deposited into a results table.

Then the thought struck me, that if this function was built into the database engine, there wouldn't be a lot of need for business intelligence cubes that require vast amounts of ETL (Extract Transfer and Load) data dimensioning, data marts and obscure SQL statements the size of a novel.

All of the data would be digested in real time, and mined and refined in one shot. The inferential factory in the database engine would calculate in real time on every data insert, and various filters would be defined for reporting.

With the exabytes and exabytes of data that we are generating, this could be one way of handling the tsunami of data without being overwhelmed by it. And IBM would be awfully sorry that they bought Cognos Business Intelligence Cube software.

Who Was At The Computer -- Solving a Whodunnit

I was idly watching some of the Casey Anthony murder trial being streamed on the Internet. She is charged with brutally disposing of her bothersome two-year-old child who was impinging on her party life.

One of the expert witnesses was an ex-police officer turned geek who wrote the program called "Cache Back". What the program does, is recover the browser cache of the web history after it has been deleted. He discovered that the browsing history contained terms like "chloroform" and how to kill people.

The defense lawyer stands up and tells the computer expert that there is no way that he could tell who was at the keyboard when the queries were made. The computer expert had to agree. Well, if they had geekazoids like me, there is away to state the probability of who was sitting at the computer.

Consider the following equation:

This equation is the basis of Bayesian inference. It is one of the keystones of data analysis and artificial intelligence. A quick explanation of the terms is as follows:

H represents a specific hypothesis, which may or may not be some null hypothesis.
E represents the evidence that has been observed.
P(H) is called the prior probability of H that was inferred before new evidence became available.
P(E | H) is called the conditional probability of seeing the evidence E if the hypothesis H happens to be true. It is also called a likelihood function when it is considered as a function of H for fixed E.
P(E) is called the marginal probability of E: the a priori probability of witnessing the new evidence E under all possible hypotheses.

The theory behind this concept is the idea of querencia. When people log onto a computer, they usually follow a core of usual, habitual persistent URLs. They check their email, Twitter and Facebook page, and then perhaps check the weather or news or such.

So in this methodology to determine who was sitting behind the computer for a particular history, one examines the whole history. One finds the sequences where there is no doubt of the supposed user in question. This could be determined by the URL of a Facebook page or email.

Then one assembles a statistical model of the URL web pages visited, and calculate the variance from the Venn set of URLs as well as the deviation from the usual pattern.

By calculating probabilities from the browsing model, one can then take an unidentified set and using Bayesian inference, determine whether that user had the probability of being the unidentified user.

This is by no means a smoking gun of proof, but it can add one more piece to a circumstantial change of evidence. It can answer the question of "Who was using the computer" with a degree of probability.

This would also be a useful system in a corporate environment to determine what users had breached company policy in visiting banned websites.