Future Imperfect & Software Stream of Consciousness : event logs

Showing posts with label event logs. Show all posts

Process Mining From Event Logs -- An Untapped Resource And Wave of The Future

A couple of years ago, I was searching for untapped horizons in data mining, and I came across a course given by Professor Wil van der Aalst where he pioneered the technology of business process mining from server event logs. Naturally I signed up for the course. It is and was a fascinating course, not only due to its in-depth and non-trivial treatment of gleaning knowledge from data, but for me, it got the creative juices flowing to think of where it could be applied elsewhere. I was so intrigued with the possibilities, that I created a Google Scholar Alert for Professor van der Aalst's publication. The latest Google alert was on January 31rst, and it was a paper entitled "Connecting databases with process mining". The link is here: http://repository.tue.nl/858271 It was this paper that triggered this article.

I am a huge proponent of AI, Machine Learning and Analytics. In Machine Learning, you gather large datasets, clean the data, section the data into smaller sets for training & evaluation, and then train an AI machine with hundreds, perhaps thousands of training epochs until the probability of gaining the sought-after knowledge crosses an appropriate threshold. Machine intelligence is a huge field of endeavor and it is progressing to be a major part of everyday life in all phases of life. However, it is time consuming to teach the machine and get it right. Professor van der Aalst's area of expertise can provide a better way. Let me explain:

My particular interest, is that I am building a semantic blockchain to record all of the data coupled to vehicles, autonomous or not. Blockchain of course, is an immutable data ledger that is true, autonomous itself in operation, disintermediates third parties and is outage-resistant. Autonomous vehicles will by law, be required to log every move, have records of their software revisions, and have records like post-crash behavior etc.

I immediately saw the possibilities of using this data. Suppose that you are in an autonomous vehicle and that vehicle has never been on a tricky roadway that you need to navigate to get to your destination. Your car doesn't know the route parameters, but thousands of other autonomous vehicles have, including many with your kind of operating system and software. With the connected car, your vehicle would know its GPS coordinates and query a system for the driving details for this piece of roadway that is unknown to the computer. Instead of intense computational ability required to navigate, a recipe with driving features could be downloaded.

Rather than garnering those instructions from repeated training epochs in machine learning, one could apply process mining to the logs to extract the knowledge required. There are already semantic methods of communicating processes, from decision trees to Petri nets, and if the general process were already known to the machine, it would reduce the computational load. As a matter of fact, each vehicle could have a process mining module to extract high level algorithms for the roads that it drives regularly. That in itself will reduce the computational load of the vehicles. It would know in advance, where the stop signs are, for example, and you won't have Youtube videos of self-driving cars going through red lights and stop signs.

It goes a lot further than autonomous vehicles. This concept of creating high level machine processes through event logs can be applied to such diverse fields from robotic manufacturing to cloud server monitoring and numerous fields where human operators or real world human judgement is required.

Process mining could either eliminate machine learning in a lot of instances, or it could supplement it, with a mix of technologies. The aim is the same, which is aggregating data into information and integrating information into knowledge, both for humans and machines.

This process mining business reminds me of the history behind Bayesian Inference. The Reverend Thomas Bayes discovered probability and prior belief equations. They sat on a dusty shelf for over 200 years and they were re-purposed for computer inference and machine intelligence. I think that Professor van der Aalst's methodologies will be re-purposed for things yet un-imagined, and it will not take 200 years to come to fruition.

Event Logs, Process Mining and Artificial Intelligence

In my course on process mining from the Eindoven University of Technology in the Netherlands, on the course forum, a person asked about where to get events logs for process mining. This was the question posted:

Anyhow, as I was watching the lecture on Guidelines for Event Logging, I was struck by the question that usually occurs to me in such courses: But how to do it in practice?

I'm assuming that logging for the Internet of Things is part of the Things that Make Up that Internet. But otherwise? I absolutely abhor having to program, never struck me as that interesting. So how is it done in practice? Do you guys have preset functions/libraries? In case a human needs to log their behaviour, how do you ensure compliance - that they don't forget etc. etc.?

I'd love to hear more on that!

I took it upon myself to reply, and this is what I said:

I am a technical architect (and Chief Technology Officer !) for an eCommerce platform that deals with high-dollar value goods marketed in exclusive circles. We have had the benefit of creating the technology so we created event logs for everything. Here is an example:

1) When you log in, we record the time, the username, the IP address of where the login came and whether the user was using a desk computer or a mobile platform.

2) When you check your messages on our system, they are marked as read with a timestamp. That creates another event log.

3) When you go to view offerings, what you look at is recorded, so we can gather data on what the user likes to buy.

4) If the user is a seller, we record what he uploads into a database table, and every entry has a timestamp column to detect when the data was added.
5) Each sale is recorded along with a timestamp.

6) Each log out is recorded, along with a timestamp. All of this is very easy to do, because when we create a database to store data, we construct it such that each entry (called a row) has a column labeled timestamp where the computer puts the NOW() date/time when the data is recorded. This automagically creates the event logs for us, for all tasks, even tasks that are consided non-process related (which really add value to our processes).

When you are online, every time data is recorded, a timestamp goes along with it. We have event logs for everything.

But you can paper-based event logs that can be transcribed. For example, we looked at an auto repair shop and did a very rudimentary process mining when we were constructing a business app for the company. They took appointments and recorded the time of the call and the time the customer was going to bring the car into the shop. Then the service writer recorded the details on an invoice, service sheet when the customer arrived, and pushed the invoice into a time clock, stamping the arrival time. We then looked at the mechanic's time sheet to see what hours that he billed for the job. Then we know when the customer picked it up, because the payment invoice was timestamped (usually with the cash register receipt). They had a complete event log on various bits of paper floating around the business, and once the business was computerized, they could determine the bottlenecks. (Which turned out to be waiting for ordered parts), This was my first experience of events logs in a non-computerized fashion. Since then, I timestamp every database table that I construct -- even the ones which store metadata and mined data such as standard deviation of an aggregate of characteristics of our top buyers and sellers.

There is now a burgeoning field of using non-sql graph databases (we like neo4j) which can map semantic and/or fuzzy relationships very easily, and this course has taught me to timestamp graphs and edges to monitor significant but transient process relationships in the business milieu.

Hopes this helps,

Most people do not realize how many internet tracks they leave that are event logs when they do ordinary things online. The Germans have a word for this. It is not a nice word. It translates to "digital slime".

Event logs are ubiquitous, and it is my contention that all of these digital tracks and Big Data will lead to a plethora of training data for artificial intelligence. Artificial neural nets need concrete examples to learn from and iterate and re-iterate to "get smart. Process mining will be huge in that respect. Process mining is the first step for computers to learn human behavior. Neural net machines and multilayer perceptrons will pore over process maps gleaned from human behavior and learn to mimic and reproduce expert behavior in a very more and more repeatable fashion that humans can.

Process Mining, OpenStack and Possibly a New Java Framework

In my process data mining course, on the internal forums, an OpenStack developer asked how the event logs from using OpenStack could be used in process mining. This is how I replied:

First of all, let me congratulate you on OpenStack. I am both a user, and I use the services of an OpenStack driven Platform-As-A-Service to host the development of my mobile apps.

I would see several potentially huge benefits if you incorporated process mining into the OpenStack platform. For example. spammers now use virtual OpenStack concept to set up a virtual machine, do their spamming or hacking and then tear down the machine never to be seen again. If you got a signature or a process of this activity, you could theoretically intercept it while it is happening.

Another possibility, is that every time the software does a create, an instantiation, an instant of a virtual, or anything, if you record the timestamps of these machine events, you could provide a QoS or quality of service metric, both for monitoring the cloud and for detecting limitations caused by hardware, software or middleware bottlenecks.

I can see a possibility for mis-configuration of parameters that degrade service quality, that would be picked up by a process mining that would detect missing setup steps in the process. In other words, an arc around a required region would indicate that required steps were missing.

This course has inspired me to start working on a Java framework (maybe a PROM plugin) that operates on an independent thread (maybe in an OpenStack incarnation) that monitors activity on a server and compares it to ideal processes in real time and flags someone if a crucial process deviates from it. I think that I could get this going in a timely fashion.

Once again, this course has opened my eyes to potential methodologies and algorithms that can be applied to non-traditional fields.

Note: PROM is an open source process mining tool. The data mining course is given by the Eindhoven University of Technology in the Netherlands.