His organization had ascertained that the easiest way to convey epidemic data to policy makers was via a 'weather map' where the geographic areas that were in the greatest danger would progress from green to yellow to red when a full blown epidemic developed. To that end they created a data-mining tool for reports. However there was one major flaw with the tool. It could only show results after the fact and didn't perform predictions. Predictions are important for epidemiologists.
I suggested that what his data mining gizmo needed was a Bayesian Inference Engine. Bayesian Inference principles are used for logical inference and prediction on imperfect data sets. A Bayesian operation takes historical data, and calculates the probabilities of a number of events of happening when their predecessor events have taken place. Bayesian inference is a tool in the arsenal of artificial intelligence. It is the perfect tool for running predictions on evolving data. In an epidemic situation, data evolves rapidly. One cannot wait until it is all said and done to run the analysis.
I described to my medical friend how one would make a real time inference engine. Before any row of data is inserted into a database, an inference factory instantiates an inference object. The inference object is used to either look up the probabilistic meta-data for the permutations and combinations of the columns in the row of data (it examines each data dimension) and recalculates the inferential probability with the input of the new data. The output is filtered and deposited into a results table.
Then the thought struck me, that if this function was built into the database engine, there wouldn't be a lot of need for business intelligence cubes that require vast amounts of ETL (Extract Transfer and Load) data dimensioning, data marts and obscure SQL statements the size of a novel.
All of the data would be digested in real time, and mined and refined in one shot. The inferential factory in the database engine would calculate in real time on every data insert, and various filters would be defined for reporting.
With the exabytes and exabytes of data that we are generating, this could be one way of handling the tsunami of data without being overwhelmed by it. And IBM would be awfully sorry that they bought Cognos Business Intelligence Cube software.
No comments:
Post a Comment