All Things Techie With Huge, Unstructured, Intuitive Leaps

Who Was At The Computer -- Solving a Whodunnit

I was idly watching some of the Casey Anthony murder trial being streamed on the Internet. She is charged with brutally disposing of her bothersome two-year-old child who was impinging on her party life.

One of the expert witnesses was an ex-police officer turned geek who wrote the program called "Cache Back". What the program does, is recover the browser cache of the web history after it has been deleted. He discovered that the browsing history contained terms like "chloroform" and how to kill people.

The defense lawyer stands up and tells the computer expert that there is no way that he could tell who was at the keyboard when the queries were made. The computer expert had to agree. Well, if they had geekazoids like me, there is away to state the probability of who was sitting at the computer.

Consider the following equation:

This equation is the basis of Bayesian inference. It is one of the keystones of data analysis and artificial intelligence. A quick explanation of the terms is as follows:

  • H represents a specific hypothesis, which may or may not be some null hypothesis.
  • E represents the evidence that has been observed.
  • P(H) is called the prior probability of H that was inferred before new evidence became available.
  • P(E | H) is called the conditional probability of seeing the evidence E if the hypothesis H happens to be true. It is also called a likelihood function when it is considered as a function of H for fixed E.
  • P(E) is called the marginal probability of E: the a priori probability of witnessing the new evidence E under all possible hypotheses.
The theory behind this concept is the idea of querencia. When people log onto a computer, they usually follow a core of usual, habitual persistent URLs. They check their email, Twitter and Facebook page, and then perhaps check the weather or news or such.

So in this methodology to determine who was sitting behind the computer for a particular history, one examines the whole history. One finds the sequences where there is no doubt of the supposed user in question. This could be determined by the URL of a Facebook page or email.

Then one assembles a statistical model of the URL web pages visited, and calculate the variance from the Venn set of URLs as well as the deviation from the usual pattern.

By calculating probabilities from the browsing model, one can then take an unidentified set and using Bayesian inference, determine whether that user had the probability of being the unidentified user.

This is by no means a smoking gun of proof, but it can add one more piece to a circumstantial change of evidence. It can answer the question of "Who was using the computer" with a degree of probability.

This would also be a useful system in a corporate environment to determine what users had breached company policy in visiting banned websites.

No comments:

Post a Comment