Future Imperfect & Software Stream of Consciousness : February 2017

Connected Autonomous Cars, Big Data, and Not Re-inventing The Wheel

Smart Roads Need Not Be So Smart

The introduction of technologies into daily life lets us let go of old paradigms and ways of doing things. It also lets us jettison conventional ideas. I was in a deep conversation last night at dinner with a philosopher friend and I was telling him that I was working with automotive blockchain as a true ledger -- especially for self-driving cars. I mentioned that perhaps we would need smart roads or smart road sign sensors to indicate things like speed limits and such to the autonomous car.

We got into a discussion on how self-driving cars will change everything about mobility -- even the concept of your car sitting in a parking lot all day. For example, after your self-driving car drops you off at work, you can send it out to work for money as an Uber car, and it comes to pick you up after your work day is done. Or you can send it home.

My friend opined that with this and other technologies, one is only limited by the imagination as to what can be implemented. He didn't think that we would need smart roads. He pointed out that using Big Data, the computational load of self-driving cars could be significantly reduced. We wouldn't need smart roads hardware embedded in geographic locations. It was brilliant.

Here is how it will work. My blockchain is intended as a vehicle black box recorder. Everything with the connected car is recorded in real time. This includes GPS coordinates, date, time, and all of the instructions issued by the operating system of the vehicle to drive a particular stretch of road. Here is the clever bit.

Suppose all of this stuff is uploaded to a central repository, and is searchable. The connected autonomous vehicle, upon entering a specific roadway, would access this information. Through Big Data analytics, it would now know average driving conditions and speed for time of day, season of the year, rush hour, rain, sleet snow and it would know the salient features of the roadway. For example, you won't have self-driving cars running red lights or stop signs like you see on Youtube now, because you will have those features available to you. It will know things like where to watch out for other vehicles exiting a driveway (based on history of cars stopping to let these vehicles out). In other words, you will have a smart roadway without sensors and without Internet of Things (IoT) indicators. It will be like Google Street View for autonomous vehicles. The vehicles will be able to search, find and download roadway features, and use these features to navigate, without intense computational load on the car operating system. The onboard driving system would have to only detect anomalies and other traffic. You would not be re-inventing a computational feature map every time that you went down that road.

Smart roads would be smart because there would be a driving-instruction history created by thousands of vehicles on how to navigate these roads. They would be mapped with a GIS system that included driving parameters.

It would be the Google search engine for the brain inside your car. I am sure that Google has already thought of this concept. They were forward-looking enough to start Street View, but there is always room for a better mousetrap hatched by a disrupter. The disruption in this case, is to present the driving parameters in a way that will be understood by all self-driving cars. Therein lies the next billion dollar play.

When The Customer Isn't King - Account & Data Security Breaches That Can Be Prevented

The news for two major retailer giants in Canada has not been good for them or their customers in the past few days. Loblaws, a grocer and dry goods retailer, had their PC Points loyalty system breached. One customer had 110 points worth $110 spent in the province of Quebec, and she has never even visited that province. Another customer who is a system administrator, said that he had a different password for every account, had his points stolen as well. News link: http://globalnews.ca/news/3237876/ps-plus-points-stolen-security-breach/

As well, Canadian Tire, a retail giant that sells everything from automobile accessories to sporting goods to snack foods, has been hacked, compromising both loyalty points and credit card balances online. News link: http://globalnews.ca/news/3236903/exclusive-canadian-tire-website-breached-consumer-accounts-in-question/

The financial losses of hacks such as these, are tremendous. When Target was breached in 2014, they estimated the losses to be $148 million dollars according to an article in Time Magazine. In that same year, job losses due to customer data breaches were estimated at 150,000 people in Europe. The global picture is frightening. McAfee, the Intel security company estimates monetary losses of $160 billion per year for data breaches.

Hacking isn't exactly a new phenomena. In 1979, infamous convicted hacker, Kevin Mitnick broke into his first major computer system, the Ark, the computer system Digital Equipment Corporation (DEC) used for developing their RSTS/E operating system software. The most embarrassing privacy breach came when Ashley Madison, the website for having extra-marital affairs, was hacked and over 30 million names and credit card numbers were exposed, causing at least two suicides.

So in this day and age, why does this happen? Can it be prevented?

Aside from an inside job, one of the reasons that hacking is successful, is the antiquated way that servers, databases and accounts are accessed. To connect to a server, one usually must have a username and a password. This is true to gain access to a server as an administrator. However one doesn't need administrator access to hack into data and accounts. Customer account information is stored in what is known as a 4GL database (4th Generation Language). This table-driven database is usually clustered on it own server and is exposed to the outside world so that its data can be accessed by platforms, analytics, and web interfaces. Again, with a user name and password, once can gain entrance to the data store and exploit the data. Many many databases still have "root" as the username to gain God-like access, and all that you have to do is either guess, derive, or gain access to the password. Many administrators commit the cardinal sin of using the same password on all accounts, and it may be gotten from such things as the name of their pet, which is information on social media. For years, the huge database company Oracle shipped their databases with a default account name of "Scott" and a password of "Tiger", left over from one of the original developers, that were never removed. I walked into many data centers as a consultant, and typed in Scott/Tiger and got access to the crown jewels.

No matter how much security that is built into any system, it is still vulnerable to the shaky access of system of a username and password. There is a better way. It is inexpensive, fairly autonomous, easy to use, and orders of magnitude more secure than a conventional database approach to storing customer data. It is a blockchain.

People know blockchain from the digital crypto-currency Bitcoin, and that fact alone has poisoned the well for quick adoption of blockchain technology. Blockchain is a technology & methodology for the digital recording of any transactions, events, ancillary derived meta-data & chronological logging of any business transaction that requires security, integrity, transparency, efficiency, audit & resistance to outages. It is the acme of trusted data. It also stores values like crypto-currency, digital cash and loyalty points, but its main selling point is that it is a true, autonomous ledger. Period.

When a technology evangelist mentions blockchain to the C-Suite level, several things happen. If they have heard of blockchain and its association with Bitcoin, there is pushback, because of how crypto-currencies have been exploited in the press. If they haven't heard of blockchain or have heard of it, but do not understand it, there is a fear of committing to the unknown. There are only about 2,000 blockchain developers worldwide, and most of them are still building proofs of concept. C-Level tech officers in corporations do not have the tech talent to immediately go to this technology, and it is perceived as untested bleeding edge stuff (not true). The other fly in the ointment, is that there is a blockchain consortium built around the Ethereum platform. That may all be well and good, but Fortune 500 is more suited to a private blockchain, controlled by themselves as they are responsible for their data.

So why is a blockchain more secure? For starters, any responsible blockchain incarnation does away with username and passwords. Authentication is done with a private encryption key right on the device. No amount of keylogging or password trapping will allow the breach. On top of it, conscientious construction of the authentication should be done with a tandem collection of MAC address of MDID of the mobile device. A MAC address is the embedded serial number of the network card in the computer that can easily be collected by any web page and MDID is the hardware serial number of a mobile phone or tablet that can be externally queried. Thus, any machine making changes to the data can be identified by device and encryption key.

On top of all of that, each blockchain query agent needs an encryption key just to read the blockchain. No amount of brute force hacking can get you into the blockchain, unless you are authorized to do so, and have a key created for you.

Blockchains can not only hold digital values like money or loyalty points, but they also can contain bits of code that enable smart contracts. In fact, they can store a digital anything. In other words, when certain conditions are met, actions can happen securely because of code embedded in the blockchain. Blockchains are impervious to data being fraudulently altered, because each transaction is linked to a previous transaction using encryption and hashing. You would have to change the entire transaction history to perpetrate a fraud.

The last benefit of blockchains is not that obvious, but highly desirable. You can write any information to the payload of a blockchain. So if you store transactions with a semantic, machine-readable identifiers, one can perform stream analytics in real time on the transactions. This can be coupled to machine learning, not only to identify fraud, but also to enable wallet-stretch to sell the consumer more things that they really need.

Does a beast such as a private semantic blockchain exist? You bet. Ping me.

Process Mining From Event Logs -- An Untapped Resource And Wave of The Future

A couple of years ago, I was searching for untapped horizons in data mining, and I came across a course given by Professor Wil van der Aalst where he pioneered the technology of business process mining from server event logs. Naturally I signed up for the course. It is and was a fascinating course, not only due to its in-depth and non-trivial treatment of gleaning knowledge from data, but for me, it got the creative juices flowing to think of where it could be applied elsewhere. I was so intrigued with the possibilities, that I created a Google Scholar Alert for Professor van der Aalst's publication. The latest Google alert was on January 31rst, and it was a paper entitled "Connecting databases with process mining". The link is here: http://repository.tue.nl/858271 It was this paper that triggered this article.

I am a huge proponent of AI, Machine Learning and Analytics. In Machine Learning, you gather large datasets, clean the data, section the data into smaller sets for training & evaluation, and then train an AI machine with hundreds, perhaps thousands of training epochs until the probability of gaining the sought-after knowledge crosses an appropriate threshold. Machine intelligence is a huge field of endeavor and it is progressing to be a major part of everyday life in all phases of life. However, it is time consuming to teach the machine and get it right. Professor van der Aalst's area of expertise can provide a better way. Let me explain:

My particular interest, is that I am building a semantic blockchain to record all of the data coupled to vehicles, autonomous or not. Blockchain of course, is an immutable data ledger that is true, autonomous itself in operation, disintermediates third parties and is outage-resistant. Autonomous vehicles will by law, be required to log every move, have records of their software revisions, and have records like post-crash behavior etc.

I immediately saw the possibilities of using this data. Suppose that you are in an autonomous vehicle and that vehicle has never been on a tricky roadway that you need to navigate to get to your destination. Your car doesn't know the route parameters, but thousands of other autonomous vehicles have, including many with your kind of operating system and software. With the connected car, your vehicle would know its GPS coordinates and query a system for the driving details for this piece of roadway that is unknown to the computer. Instead of intense computational ability required to navigate, a recipe with driving features could be downloaded.

Rather than garnering those instructions from repeated training epochs in machine learning, one could apply process mining to the logs to extract the knowledge required. There are already semantic methods of communicating processes, from decision trees to Petri nets, and if the general process were already known to the machine, it would reduce the computational load. As a matter of fact, each vehicle could have a process mining module to extract high level algorithms for the roads that it drives regularly. That in itself will reduce the computational load of the vehicles. It would know in advance, where the stop signs are, for example, and you won't have Youtube videos of self-driving cars going through red lights and stop signs.

It goes a lot further than autonomous vehicles. This concept of creating high level machine processes through event logs can be applied to such diverse fields from robotic manufacturing to cloud server monitoring and numerous fields where human operators or real world human judgement is required.

Process mining could either eliminate machine learning in a lot of instances, or it could supplement it, with a mix of technologies. The aim is the same, which is aggregating data into information and integrating information into knowledge, both for humans and machines.

This process mining business reminds me of the history behind Bayesian Inference. The Reverend Thomas Bayes discovered probability and prior belief equations. They sat on a dusty shelf for over 200 years and they were re-purposed for computer inference in intelligence. I think that Professor van der Aalst's methodologies will be re-purposed for things yet unimagined, and it will not take 200 years to come to fruition.

Professor van der Aalst's next course in process mining begins online on February 20th of this month. Here is the link:

https://www.coursera.org/learn/process-mining