All Things Techie With Huge, Unstructured, Intuitive Leaps
Showing posts with label Multi-Layer-Perceptrons. Show all posts
Showing posts with label Multi-Layer-Perceptrons. Show all posts

Standard Artificial Neural Network Template



For the past few weeks, I have been thinking about having a trained artificial neural network on a computer, transferring it to another computer or another operating system, or even selling the trained network in a digital exchange in the future.

It really doesn't matter in what programming language artificial neural networks are written in.  They all have the same parameters, inputs, outputs, weights, biases etc.  All of these values are particularly suited to be fed into the program using XML document based on an .XSD schema or a light-weight protocol like JSON.  However, to my knowledge, this hasn't been done, so I took it upon myself to crack one out.

It is not only useful in creating portability in a non-trained network, but it also has data elements for a trained network as well, making the results of deep learning, machine learning and AI training portable and available.

Even if there are existing binaries, creating a front end to input the values would take minimal programming, re-programming or updating.

I also took the opportunity to make it extensible and flexible. Also there are elements that are not yet developed (like an XML XSD tag for a function) but I put the capability in, once it is developed.

There are a few other interesting things included.  There is the option to define more than one activation function. The values for the local gradient, the alpha and other parameters are included for further back propagation.

There is room to include a link to the original dataset to which these nets were trained (it could be a URL, a directory pathway, a database URL etc).  There is an element to record the number of training epochs.  With all of these information, the artificial neural net can be re-created from scratch.

There is extensibility in case this network is chained to another. There is the added data dimension in case other type of neurons are invented such as accumulators, or neurons that return a probability.

I put this .xsd template on Github as a public repository. You can download it from here:

http://github.com/kenbodnar/ann_template

Or if you wish, here is the contents of the .xsd called ann.xsd.  It is heavily commented for convenience.


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="artificial_neural_network">
    <xs:complexType>
      <xs:sequence>
        <!-- The "name" element is the name of the network. They should have friendly names that can be referred to if it ever goes up for sale, rent, swap, donate, or promulgate.-->
        <xs:element name="name" type="xs:string" minOccurs="1" maxOccurs="1"/>
        <!-- The "id" element is optional and can be the pkid if the values of this network are stored in an SQL (or NOSQL) database, to be called out and assembled into a network on an ad hoc basis-->
        <xs:element name="id" type="xs:integer" minOccurs="0" maxOccurs="1"/>
        <!-- The "revision" element is for configuration control-->
        <xs:element name="revision" type="xs:string" minOccurs="1" maxOccurs="1"/>
        <!-- The "revision_history" is optional and is an element to describe changes to the network -->
        <xs:element name="revision_history" type="xs:string" minOccurs="0" maxOccurs="1"/>
        <!-- The "classification element" is put in for later use. Someone will come up with a classification algorithm for types of neural nets.There is room for a multiplicity of classifications-->
        <xs:element name="classification" type="xs:string" minOccurs="0" maxOccurs="0"/>
        <!-- The "region" element is optional and will be important if the networks are chained together, and the neurons have different functions than a standard neuron, like an accumulator or a probability computer
        and are grouped by region, disk, server, cloud, partition, etc-->
        <xs:element name="region" type="xs:string" minOccurs="0" maxOccurs="1"/>
        <!-- The "description" element is an optional field, however a very useful one.-->
        <xs:element name="description" type="xs:string" minOccurs="0" maxOccurs="1"/>
        <!-- The "creator" element is optional and denotes who trained these nets -->
        <xs:element name="creator" type="xs:string" minOccurs="0" maxOccurs="1"/>
        <!-- The "notes" element is optional and is self explanatory-->
 <xs:element name="notes" type="xs:string" minOccurs="0" maxOccurs="1"/>
        <!-- The source element defines the origin of the data. It could be a URL -->
 <xs:element name="dataset_source" type="xs:string" minOccurs="0" maxOccurs="1"/>
        <!-- This optional element, together with the source data helps to recreate this network should it go wonky -->
        <xs:element name="number_of_training_epochs" type="xs:integer" minOccurs="0" maxOccurs="1"/>
        <!-- The "number_of_layers" defines the total-->
        <xs:element name="number_of_layers" type="xs:integer" minOccurs="1" maxOccurs="1"/>
        <xs:element name="layers">
          <xs:complexType>
            <xs:sequence>
              <!-- Repeat as necessary for number of layers-->
              <xs:element name="layer" type="xs:string" minOccurs="1" maxOccurs="1">
                <xs:complexType>
                  <xs:sequence>
                    <!-- Layer Naming and Neuron Naming will ultimately have a recognized convention eg. L2-N1 is Layer 2, Neuron #1-->
                    <xs:element name="layer_name" type="xs:string" minOccurs="0" maxOccurs="1"/>
                    <!-- number of neurons is for the benefit of an object-oriented constructor-->
                    <xs:element name="number_of_neurons" type="xs:integer" minOccurs="1" maxOccurs="1"/>
                    <!-- defining the neuron this is repeated as many times as necessary-->
                    <xs:element name="neuron">
                      <xs:complexType>
                        <xs:sequence>
                          <!--optional ~  currently it could be a perceptron, but it could also be a new type, like an accumulator, or probability calculator-->
                          <xs:element name="type" type="xs:string" minOccurs="0" maxOccurs="1"/>
                          <!-- name is optional ~ name will be standardized eg. L1-N1 layer/neuron pair. The reason is that there might be benefit in synaptic joining of this layer to other networks and one must define the joins -->
                          <xs:element name="name" type="xs:string" minOccurs="0" maxOccurs="1"/>
                          <!-- optional ~ again, someone will come up with a classification system-->
                          <xs:element name="neuron_classification" type="xs:string" minOccurs="0" maxOccurs="1"/>
                          <!-- number of inputs-->
                          <xs:element name="number_of_inputs" type="xs:integer" minOccurs="1" maxOccurs="1"/>
                          <!-- required if the input layer is also an output layer - eg. sigmoid, heaviside etc-->
                          <xs:element name="primary_activation_function_name" type="xs:string" minOccurs="0" maxOccurs="1"/>
                          <!-- ~ optional - there is no such thing as a xs:function - yet, but there could be in the future -->
                          <xs:element name="primary_activation_function" type="xs:function" minOccurs="0" maxOccurs="1"/>
                          <!-- in lieu of an embeddable function, a description could go here ~ optional -->
                          <xs:element name="primary_activation_function_description" type="xs:string" minOccurs="0" maxOccurs="1"/>
                          <!-- possible alternate activation functions eg. sigmoid, heaviside etc-->
                          <xs:element name="alternate_activation_function_name" type="xs:string" minOccurs="0" maxOccurs="1"/>
                          <!-- ~ optional - there is no such thing as a xs:function - yet, but there could be in the future -->
                          <xs:element name="alternate__activation_function" type="xs:function" minOccurs="0" maxOccurs="1"/>
                          <!-- in lieu of an embeddable function, a description could go here ~ optional -->
                          <xs:element name="alternate__activation_function_description" type="xs:string" minOccurs="0" maxOccurs="1"/>
                          <!-- if this is an output layer or requires an activation threshold-->
                          <xs:element name="activation_threshold" type="xs:double" minOccurs="1" maxOccurs="1"/>
                          <xs:element name="learning_rate" type="xs:double" minOccurs="1" maxOccurs="1"/>
                          <!-- the alpha or the 'movement' is used in the back propagation formula to calculate new weights-->
                          <xs:element name="alpha" type="xs:double" minOccurs="1" maxOccurs="1"/>
                          <!-- the local gradient is used in back propagation-->
                          <xs:element name="local_gradient" type="xs:double" minOccurs="1" maxOccurs="1"/>
                          <!-- inputs as many as needed-->
                          <xs:element name="input">
                            <xs:complexType>
                              <xs:sequence>
                                <!-- Inputs optionally named in case order is necessary for definition -->
                                <xs:element name="input_name" type="xs:string" minOccurs="0" maxOccurs="1"/>
                                <!-- use appropriate type-->
                                <xs:element name="input_value_double" type="xs:double" minOccurs="0" maxOccurs="unbounded"/>
                                <!-- use appropriate type-->
                                <xs:element name="input_value_integer" type="xs:integer" minOccurs="0" maxOccurs="unbounded"/>
                                <!-- weight for this input-->
                                <xs:element name="input_value_weight" type="xs:double" minOccurs="1" maxOccurs="1"/>
                                <!-- added as a convenince for continuation of back propagation if the network is relocated, moved, cloned, etc-->
                                <xs:element name="input_value_previous_weight" type="xs:double" minOccurs="1" maxOccurs="1"/>
                              </xs:sequence>
                            </xs:complexType>
                          </xs:element>
                          <!-- end of input-->
                          <!-- bias start-->
                          <xs:element name="bias">
                            <xs:complexType>
                              <xs:sequence>
                                <xs:element name="bias_value" type="xs:double" minOccurs="1" maxOccurs="1"/>
                                <xs:element name="bias_value_weight" type="xs:double" minOccurs="1" maxOccurs="1"/>
                                <!-- added as a convenince for continuation of back propagation if the network is relocated, moved, cloned, etc-->
                                <xs:element name="bias_value_previous_weight" type="xs:double" minOccurs="1" maxOccurs="1"/>
                              </xs:sequence>
                            </xs:complexType>
                          </xs:element>
                          <!-- end of bias-->
                          <xs:element name="output">
                            <xs:complexType>
                              <xs:sequence>
                                <!-- outputs optionally named in case order is necessary for definition -->
                                <xs:element name="output_name" type="xs:string" minOccurs="0" maxOccurs="1"/>
                                <xs:element name="output_value_double" type="xs:double" minOccurs="0" maxOccurs="unbounded"/>
                                <!-- hypothetical value is a description of what it means if the neuron activates and fires as output if this is the last layer-->
                                <xs:element name="hypothetical_value" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
                              </xs:sequence>
                            </xs:complexType>
                          </xs:element>
                          <!-- end of output-->
                        </xs:sequence>
                      </xs:complexType>
                    </xs:element>
                    <!-- end of neuron-->
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
              <!-- end of layer-->
            </xs:sequence>
          </xs:complexType>
        </xs:element>
        <!-- end of layers-->
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <!-- network-->
</xs:schema>

I hope this helps someone. This is open source. Please use it and pass it on if you find it useful.

Perils of Overtraining in AI Deep Learning


When we partnered with a local university department of Computer Science to create some Artificial Neural Networks (ANNs) for our platform, we gave them several years of data to play with.  They massaged the input data, created an ANN machine and ran training epochs to kingdom come.

 The trouble with ANNs, is that you can over-train them.  This means that they respond  specifically for the data set in a highly accurate manner, but they are not general enough to accurately process new data.  To put it in general terms, their point-of-view is too narrow, and encompasses only the data that they were trained on.

In the training process, I was intuitively guessing that the learning rate and improved accuracy would improve in an exponential manner with each iterative training epoch.  I was wrong.  Here is a graph showing that the learning rate is rather linear than exponential in the training cycle.


So the minute that the graph stops being linear, is when you stop training.  However, as our university friends found out, they had no way to regress the machine to exactly one training epoch back.  They had no record of the weights, biases, adjusted weights, etc of the epoch after the hours of back propagation or learning, and as a result, they had to re-run all of the training.

Me, I had a rather primitive way of saving the states of the neurons and layers. I mentioned it before. I wrote my machine in Java using object oriented programming, and those objects have the ability to be serialized.  In other words, binary objects in memory can be preserved in a state, written to disk, and then resurrected to be active in the last state that they were in.  Kind of like freezing a body cryogenically, but having the ability to bring it back to life.

So after every training epoch, I serialize the machine.  If I over-train the neural nets, I can get a signal by examining and/or plotting the error rates which are inverse to the accuracy of the nets. In the above graph, once the function stops being linear, I know that I am approaching the over-training event horizon.  Then I can regress with my save serialized versions of the AI machine.

Then the Eureka moment struck me! I had discovered a quick and easy cure for over-training.

I had in a previous blog article, a few down from here (or http://coderzen.blogspot.com/2015/01/brain-cells-for-sale-need-for.html ) I made the case for a standardized AI machine where you could have an XML or JSON lightweight representation of the layers, inputs, number of neurons, outputs and even hypothetical value mappings for the outputs, and then you wouldn't need to serialize the whole machine.  At the end of every training epoch, you just output the recipe for the layers, weights, biases etc, and you could revert to an earlier training incarnation by inputting a new XML file or a JSON object.

It's really time to draw up the .XSD schema for the standardized neuron. I want it to be open source. It would be horrible to be famous for thinking of a having a standardized neural net. Besides, being famous is just a job.

A Returned-Probability Artificial Neural Network - The Quantum Artificial Neural Network


Artificial Neural Networks associated with Deep Learning, Machine Learning using supervised and unsupervised learning are fairly good at figuring out deterministic things. For example they can find an open door for a robot to enter. They can find patterns in a given matrix or collection, or field.

However, sometimes there is no evident computability function. In other words, suppose that you are looking at an event or action that results from a whole bunch of unknown things, with a random bit of chaos thrown in.  It is impossible to derive a computable function without years of study and knowing the underlying principles. And even then, it still may be impossible to quantify with an equation, regression formula or such.

But Artificial Neural Nets can be trained to identify things without actually knowing anything about the background causes.  If you have a training set with the answers or results of size k (k being a series of cases), then you can always train your Artificial Neural Networks or Multilayer Perceptrons on k-1 sets, and evaluate how well you are doing with the last set. You measure the error rate and back propagate, and off you go to another training epoch if necessary.

This is happening with predicting solar flares and the resultant chaos that it cause with electronics and radio communications when these solar winds hit the earth.  Here is a link to the article, where ANN does the predicting:

http://www.dailymail.co.uk/sciencetech/article-2919263/The-computer-predict-SUN-AI-forecasts-devastating-solar-flares-knock-power-grids-Earth.html

In this case, the ANN's have shown that there is a relationship between vector magnetic fields of the surface of the sun, the solar atmosphere and solar flares.  That's all well and dandy for deterministic events, but what if the determinism was a probability and not a direct causal relationship mapped to its input parameters? What if there were other unknown or unknownable influence factors?

That's were you need an ANN (Artificial Neural Network) to return a probability as the hypothesis value. This is an easy task for a stats package working on database tables, churning out averages, probabilities, degrees of confidence, standard deviations etc, but I am left wondering if it could be done internally in the guts of the artificial neuron.

The artificial neuron is pretty basic. It sums up all of the inputs and biases multiplied by their weights, and feeds the result to an activation function.  It does this many times over in many layers.  What if you could encode the guts of the neuron to spit out the probability of the results of what is being inputted? What if somehow you changed the inner workings of the perceptron or neuron to calculate the probability.  It seems to me that the activation function is somehow ideally suited to adaptation to do this, because it can be constructed to deliver an activation value of between 0 and 1, which matches probability notation.

Our human brains work well with fuzziness in our chaotic world.   We unconsciously map patterns and assign probabilities to them. There is another word for fuzzy values. It is a "quantum" property. The more you know about one property of an object, the less you know about another.  Fuzziness. The great leap forward for Artificial Neural Networks, is to become Quantum and deliver a probability.  Once we can get an Artificial Neural Net machine to determine probability, then we can apply Bayesian mechanics. That's when it can make inferences, and get a computer on the road to thinking from first principles -- by things that it has learned by itself.

Brain Cells For Sale ~ The Need For Standardization of Artificial Neural Nets


When it comes to Artificial Neural Networks, the world is awash with roll-your-own. Everyone has their own brand and implementation.  Although the theory and practice is well thought out, tested and put into use, the implementation in almost every case is different. In our company, we have a partner university training artificial neural nets for our field of endeavor as a research project for graduate students.

Very few roll-your-own ANN's or Artificial Neural Networks are object-oriented in terms of the way they are programmed. This is because it is easier to have a monolithic program where each layer resides in an array, and the neurons can input and output to each other easily.  All ANNs are coded in everything from Java, to C, to C++, to C# to kiddie scripting.  I am here to preach today, that there should be a standard Artificial Neuron.  To be more explicit, the standardization should be in the recipe for layers, inputs, weights, biases and outputs.  Let me explain.

While the roll-your-own is efficient for each application, it has several major drawbacks.  Let me go through some of them.

The first one is portability. We have a multitude of platforms on everything from Windows to Linux, to Objective C in the iOS native format, to QNX to folks putting Artificial Neural Networks on silicon, and programming right down to the bare metal, or the semi-metals that dope the silicon matrix in the transistor junctions of the chips. We need to be able to run a particular set of specifically trained neural nets on a variety of platforms.

The multiplicity of platforms was seen early on and as a result, we had strange things developed like CORBA or Common Object Request Broker Architecture being formulated ( http://en.wikipedia.org/wiki/Common_Object_Request_Broker_Architecture ). CORBA came about in the early 1990's in its initial incarnations however it is bulky and adds a code-heavy layer of abstraction to each platform when you want to transport silicon brainiacs like a multilayer perceptron machine. The idea of distributed computing is an enticing one, but due to a large variety of factors, including security and the continued exponential multiplication of integrated transistors on a chip according to Moore's Law, it is a concept that has been obviated for the present time.

My contention, is that if you had a standard for a Neural Net, then you wouldn't have to call some foreign memory or code object from a foreign computer. You would just use a very simple light-weight data protocol to transfer post-learning layers, weights and biases (like JSON)  and bingo -- you can replicate smartness on a new machine in minutes without access to training data, or the time spent training the artificial neural net. It would be like unpacking a thinker in a box. You could be dumber than a second coat of paint, but no one would notice, because your mobile phone did your thinking for you.

There is another aspect to this, and it is the commercial aspect.  If I came across a unique data set, and trained a bunch of neural networks to predict stuff in the realm of that data set, I potentially could have a bunch of very valuable neural nets that I could sell to you.  All that you would have is pay me the money, download my neural net recipe with its standardized notation, and be in business generating your own revenue stream. It wouldn't matter what platform, operating system or chip set that your computer or device used -- the notation for the recipe of the artificial neural network would be agnostic to the binaries.

We are in a very strange time, with the underpinnings of our society changing at a very fast pace.  My contention is that the very nature of employment may change for many people.  We will not longer need to import cheap goods from China that fill the dollar stores. You will order the recipe for a 3D printer and make whatever you need.  This paradigm alone will kill many manufacturing jobs. As a result, the nature of work will change.  People will find a niche, and supply the knowledge in that niche that can be utilized or even materialize that knowledge into what they need.   We will transcend the present paradigm of people supporting themselves by making crafts and selling them on Etsy or writing books and selling them on Amazon.  People will make and sell knowledge products, and one could sell trained neural nets for any field of endeavor.

Just as rooms full of Third World country young men game all day and sell the rewards online to impatient first world gamers, you will have people spending days and weeks training neural nets and sell them on an online marketplace.

That day is coming shortly, and the sooner that we have a standard for Artificial Neural Net recipes, the sooner that we will see intelligence embedded in devices and trained neural nets for sale. You can count on it.

These thoughts were spawned on my daily walk, and you can bet that I have already started to create a schema for a neural net transference, as well as a Java Interface for one version of a standardized neural net.  Stay tuned.

Synaptic Pruning in Artificial Neural Networks and Multilayer Perceptrons

What happens in a baby's mind is fascinating.  While the baby is sleeping, it processes all of the information that its senses took in, and puts in through a huge Mixmaster creating all sorts of connections to memory, storage, logic and emotions.  I love the way that Mother Nature plays dice.  The baby's brain makes synaptic connections between bits of data that are also inappropriate. This is hugely beneficial because once these connections are made, then the logic circuits can evaluate if they are sound and reflect the outside world.  A baby's brain multiplies in size 5 times until it reaches adulthood, largely from creation of synapses or links to neurons (plus other biological infrastructure functions).  This is why a child's imagination is so fertile.

Then we have synaptic pruning near the onset of puberty. ( http://en.wikipedia.org/wiki/Synaptic_pruning ). Once we start thinking about sex, we start pruning the synapses that we think are inappropriate.  The cartoon below gives a very simplistic diagram of pruning inappropriate synapses.  I use the word inappropriate in the sense of what is considered inappropriate by adults and keen rationalists or fairy-tale dogmatics.



How did I get onto this?  I saw a tweet by a hard-core religion fundamentalist who stated that neuroplasticity was the deity's way of fixing a brain.  (In that context, I think that he was implying neural re-wiring to fix apostasy, homosexuality, atheism, and everything else that he didn't approve of.)  I had heard of neuroplasticity but I googled it to ascertain the current scientific thinking of it.  Simply, neuroplasticity is the rewiring or creation of synapsis to take over functions of the brain that have been destroyed by trauma, injury and/or accident.  For example, it has been reported that brain function controlling say motor activity has been discovered in a portion of the brain not known for that activity in an accident victim.  The term synaptic pruning was in this article, and I had to investigate the term.

Once I googled it, it reminded me of the works of Dr. Stephen L. Thaler, PhD.  He has a raft of scientific discovery and patents, and he was an early adopter of artificial neural networks. ( http://imagination-engines.com/iei_founder.php ). In a nutshell, he did some work in Cognition, Consciousness and Creativity in artificial neural networks for which he holds patents.  He discovered that if you randomly destroyed neurons in a massive array of artificial neural networks, as the network was expiring, it came up with creative outputs or solutions.  As a result, he added another layer of neurals nets to observe this.  In essence by killing off neurons randomly, he was doing synaptic pruning of a sort.

Let me quote from Dr. Thaler's website:

After witnessing some really great ideas emerge from the near-death experience of artificial neural networks, Thaler decided to add additional nets to automatically observe and filter for any emerging brainstorms. From this network architecture was born the Creativity Machine (US Patent 5,659,666). Thaler has proposed such neural cascade as a canonical model of consciousness in which the former net manifests what can only be called a stream of consciousness while the second net develops an attitude about the cognitive turnover within the first net (i.e., the subjective feel of consciousness). In this theory, all aspects of both human and animal cognition are modeled in terms of confabulation generation. Thaler is therefore both the founder and architect of confabulation theory and the patent holder for all neural systems that contemplate, invent, and discover via such confabulations.


The idea then struck me, that perhaps it wasn't necessary to destroy the neuron in the network to achieve what Dr. Thaler saw, but rather just do the synaptic pruning, by randomly destroying inputs (and as a result their weights) in the hidden layers of multilayer perceptrons.

After the connection was destroyed, you would still run the AI machine including back propagation and see what comes out.  What a fascinating concept, and I am itching to try this once I find the time.

I am sure that all sorts of people might think that Dr. Thaler is a nutbar, but those were the same people who thought that Benoit Mandelbrot's ideas on fractal geometry were child's play with no practical applications.  Or we see how the works of the Rev. Thomas Bayes who is a relative unknown, publishing only two papers in his lifetime, and dying in 1761 postulated the important Bayesian inference used in Machine Learning.

So Artificial Neural Networks come and go in popularity in the computing field.  I am sure that Dr. Thaler is onto something, and for some strange reason, his theories may pan out to be seminal in the field of machine consciousness that way that Alan Turing's ideas became pivotal in this modern age of technology.  And somewhere in there, synaptic pruning will take place, and it just may not be a footnote in the development of artificial consciousness.

If you are looking for ideas for a master or doctoral thesis, you are welcome.





What I learned from playing with Artificial Neural Nets and Multilayer Perceptrons


Anecdotal Observations about Artificial Neural Networks and Multilayer Perceptrons

I like experimenting. I have had a lot of fun of trying to embed knowledge in the thresholds of a massively parallel Artificial Neural Networks, specifically multi-layer perceptrons.  The field of artificial intelligence is a space where one can create magnificent experiments without the physical ramifications of things going very wrong very quickly, say in other experiments using high energy explosives, raw high voltage electricity or caustic, extremely fast exothermic reactions. Everything happens in the five pounds of laptop sitting on your knees without smoke, or blowing fuses or having to call the fire department.

  I created my own Java framework with each perceptron being its own object.  I am told that object oriented programming is the most advanced form of programming, and in my quest for a possible Nobel Prize, I have to use the most advanced tools available to me.  My perceptrons were connected by axon objects which held the outputs and fed them into the next hidden layer. It was intended to be a fine example of silicon mimicry of noggin topography.

I didn't do anything fancy with the activation function. Rather than a Heaviside function, even though I rather admire the works of the very eccentric Oliver Heaviside ( http://en.wikipedia.org/wiki/Oliver_Heaviside ).  He showed that the math involved in understanding Einstein's theories is less than complex in the mathematics of describing what happens inside an electric power cable.  But je digress.   I chose a sigmoid activation function and there are two common ones:



I chose tanh because for the back propagation, I knew that the differential of tanh was 1 - tanh^2 , and that would save me some time for coding the weight adjustments for back propagation.  I used the standard formula for corrections to the weight on the nodes which minimizes the error of the output given by the fancy schmancy looking equation that merely states the sum of all of the inputs multiplied by their weights, including a bias and bias weight:



The back propagation uses gradient descent and is given by, (hence the need to know the differential of the sigmoid function):

which again is a very smart-looking, complicated-looking equation that adds some real provenance of superior intelligence to this blog entry. Any kindergarten prodigy could understand the concept of the relationship of the local gradient to the differential of the activation or squashing function to get an approximation of the correct weights when the perceptron gives an answer as dumb as a doorpost, or an NFL player hit on the head one too many times.

So having said all of that, these are my observations about multi-layer perceptrons:

1) Bigger is not better

It takes a hell of a lot of training epochs to get a large number of hidden layers to move significantly to a more correct output. I start getting better approximations to the training set more quickly with less layers. I naively thought that the more, the merrier and much smarter.  Less layers get smarter quicker. There is a proviso though. Deep Layers, like Deep Throat does eventually give more satisfying results.  It gives a measure of accuracy in very complex inputs. However it takes a hell of a lot of training and spinning of silicon gears to get there. Moi - I am the impatient type who likes results quickly even though, like the smoked bloaters that I bought yesterday -- they are a bit off.

2) It pays to be explicit.  Since I was using a Sigmoid activation function, I figured that the hidden layers would act as some huge flip-flop or boolean gate array to magically come up with an answer with a minimum amount of neurons. By this I mean, suppose that you had a problem with three inputs and the hypothesis value of the outputs was also three. two, one or zero. Since the output neurons can be activated or not, the binary equivalent of a three possible output scenario can be represented by only two neurons (counting to three in binary goes like:  00,01, 10, 11).  I soon learned to be explicit. If you have three hypothetical output values, you should have three output neurons in the output layer to minimize training epochs.

3) It pays to be a wild and crazy guy. With eager anticipation, I fired up my newly made artificial intelligence machine, expecting a Frankenstein equivalent of Einstein to machine learn and do my tasks for me, and generally make me look brilliant.  I figured that I was on the verge of artificial genius to enhance my brain capacity, which I already figured to be roughly the size of a small planet. So when it came to setting weights and biases, I either went with the integer 1 or a random integer, and figured that the back propagation would clean up and get me to the appropriate figure.  Again, that would be the case if I had the patience to sit through a few million training epochs.  When my perceptrons had the intellectually ability of Popeye the Sailor Man, I was sorely tempted to give up, until I started doing crazy things with the initial weights. In one case, I got satisfaction by starting with a value of 10^-3.

4) It pays to initially log everything. I was getting fabulous results in early testing with minimal training. I figured that I was well on the way to creating my own silicon-based Watson that would reside on my laptop at my beck and call. However as the number of training epochs climbed, the weight correction stalled and the logs reported that the local gradient was NaN or not a number. Several WTF sessions resulted, and it took me through the travail of logging everything to discover that the phi equation where I was supposed to calculate the local gradient, had a mistake. I was squaring Math.tan instead of Math.tanh. It was disappointing to learn that a programming error initially added a lot of accuracy and amazement to my artificial intelligence machine, but as it progressed, it got dumber and dumber. I suppose that it's a good model for the intellectual capacity of a human being's Life journey, but that wasn't what I was aiming for.

5) One of the most difficult things using multi-layer perceptrons is framing the tool such that the functioning maps clearly to a hypothesis value with a minimal amount of jumping through loops and hoops and a pile of remedial programming. In other words, to make the thing do real life work (instead of constructing a piece of software that autonomously mimics an exclusive OR gate, as most tutorials in this field do) you have to design it such that real world inputs can be mapped into the morphology, topology and operating methodology of an artificial neural net, and the outputs can be mapped to significant hypothesis values.

In other words, a high school girl can code a functional multi-layer perceptron machine (and she has -- if you watch TED talks, she diagnoses cancer with it), however it takes a bit of real work to make it solve real life problems. But when you do, machine learning is one of the most sublime achievements of the human race. The machines achieve a level of logic that their carbon-based creators cannot.  And that is why Dr. Stephen Hawking says that Artificial Intelligence poses a threat to mankind.  I am not worried about the threat of artificial intelligence. I am more worried about the threat of a fanatic, with a bunch of explosives strapped to his chest. It is only logical, and it doesn't take very many training epochs to figure that one out.

Event Logs, Process Mining and Artificial Intelligence


In my course on process mining from the Eindoven University of Technology in the Netherlands, on the course forum, a person asked about where to get events logs for process mining.  This was the question posted:

Anyhow, as I was watching the lecture on Guidelines for Event Logging, I was struck by the question that usually occurs to me in such courses: But how to do it in practice?

I'm assuming that logging for the Internet of Things is part of the Things that Make Up that Internet. But otherwise? I absolutely abhor having to program, never struck me as that interesting. So how is it done in practice? Do you guys have preset functions/libraries? In case a human needs to log their behaviour, how do you ensure compliance - that they don't forget etc. etc.?

I'd love to hear more on that!

I took it upon myself to reply, and this is what I said:

I am a technical architect (and Chief Technology Officer !) for an eCommerce platform that deals with high-dollar value goods marketed in exclusive circles. We have had the benefit of creating the technology so we created event logs for everything.  Here is an example:

1) When you log in, we record the time, the username, the IP address of where the login came and whether the user was using a desk computer or a mobile platform.

2) When you check your messages on our system, they are marked as read with a timestamp. That creates another event log.

3) When you go to view offerings, what you look at is recorded, so we can gather data on what the user likes to buy.

4) If the user is a seller, we record what he uploads into a database table, and every entry has a timestamp column to detect when the data was added.
5) Each sale is recorded along with a timestamp.

6) Each log out is recorded, along with a timestamp.  All of this is very easy to do, because when we create a database to store data, we construct it such that each entry (called a row) has a column labeled timestamp where the computer puts the NOW() date/time when the data is recorded.  This automagically creates the event logs for us, for all tasks, even tasks that are consided non-process related (which really add value to our processes).

When you are online, every time data is recorded, a timestamp goes along with it. We have event logs for everything.

But you can paper-based event logs that can be transcribed.  For example, we looked at an auto repair shop  and did a very rudimentary process mining when we were constructing a business app for the company. They took appointments and recorded the time of the call and the time the customer was going to bring the car into the shop. Then the service writer recorded the details on an invoice, service sheet when the customer arrived, and pushed the invoice into a time clock, stamping the arrival time. We then looked at the mechanic's time sheet to see what hours that he billed for the job. Then we know when the customer picked it up, because the payment invoice was timestamped (usually with the cash register receipt).  They had a complete event log on various bits of paper floating around the business, and once the business was computerized, they could determine the bottlenecks. (Which turned out to be waiting for ordered parts),  This was my first experience of events logs in a non-computerized fashion.  Since then, I timestamp every database table that I construct -- even the ones which store metadata and mined data such as standard deviation of an aggregate of characteristics of our top buyers and sellers.

There is now a burgeoning field of using non-sql graph databases (we like neo4j) which can map semantic and/or fuzzy relationships very easily, and this course has taught me to timestamp graphs and edges to monitor significant but transient process relationships in the business milieu.

Hopes this helps,

Most people do not realize how many internet tracks they leave that are event logs when they do ordinary things online. The Germans have a word for this. It is not a nice word. It translates to "digital slime".

Event logs are ubiquitous, and it is my contention that all of these digital tracks and Big Data will lead to a plethora of training data for artificial intelligence. Artificial neural nets need concrete examples to learn from and iterate and re-iterate to "get smart.  Process mining will be huge in that respect. Process mining is the first step for computers to learn human behavior. Neural net machines and multilayer perceptrons will pore over process maps gleaned from human behavior and learn to mimic and reproduce expert behavior in a very more and more repeatable fashion that humans can.

Elliott Wave Principle Re-visited With Computer Trading

I write software. I'm pretty good at it. My strength lies not in streamlined code, but in algorithms. Any code monkey can write code. Most coders today could NOT write an object sorter using recursion and recursion is kiddie scripting in some of the functions that I code.

The real aim of the game is to not to get paid for writing software, but to write software that makes money. A quant is exactly that:

quant

[kwahnt]
noun
Business Slang . an expert in quantitative analysis.


So, one of the ways to write software to make money, is to develop trading software for stocks, bonds, derivatives and Forex. Everyone has their own proprietary technical analysis trading software but they all start with Weighted Moving Averages and all sorts of statistical charting and apparent correlations that you give signals when to buy and sell.

The great granddaddy of them all is the Elliott Wave Principle. If you don't know what the Elliott Wave Principle is, you can read about it HERE.

A typical Elliott Wave pattern stock price looks like this:


Elliott developed his ideas over 60 some years ago, and I idly wondered if Elliott Wave Patterns were still valid in this day and age of computer trading. Would computer trades at split seconds skew an Elliott Wave pattern if and when they occur? (The reason why I say "if" is that determining the milestones of the Elliott Wave pattern is a very subjective thing. Many technical analysts try to debunk the principle and its adherents swear by it.)

So the burning question is and was: Is there something to the Elliott Wave, and how has computerized trading changed the Elliott Wave, if at all?

To do that, I needed some data, and not just large time domain general data. I wanted data points demarcated by seconds, not days or hours. After all computers trade by the second. So I captured the real live second by second trading of Facebook on its opening IPO where volumes were shattered but the price remained flat.

Here is a sample of that data:


To prevent subjective interpretation, I wrote a computer object -- a model of the wave that was magnitude agnostic (meaning that I was just searching for the pattern and didn't care about the price). One of the biggest problems with the Elliott Wave is interpretation and where does one begin to count for the wave pattern. I let the computer do that for me. If the signal (serial stock price changes) didn't fit the pattern, I advanced to the next data point, and tried again. I have to say that the results were pretty dismal.

Then it struck me -- I needed an "ish" engine on this. I have previously discussed "ish" on this blog. It is a form of fuzzy logic that can ignore the odd outlier whilst still identifying the pattern. I used the ish engine to categorize wildly divergent answer schemes of health surveys in Nigeria. Once I incorporated the ish engine into my model, I started to get many more hits where I did identify the Elliott Wave pattern.

To answer the question of how computerized trading was affecting the analysis principle, I had to collect models of the deviation of the Elliot Wave. The first thing that the ish engine picked up, was that computerized trading injected many more outliers that were in fact intermediate steps in the pattern. From a macro perspective, the Elliott Wave still sort-of resembled the pattern, but on a micro level, the fractal pattern was different, and like fractals, this was carried over onto the larger pattern.

Here is a graphic illustration of the outliers where intermediate steps are introduced into the wave pattern:


Instead of going from 1 to 2, now there is a 1A step inserted into the pattern. This was when I tested for 1 deviation per step.

Then I allowed the computer to test for two deviations per step. Now one can see two outliers as the wave progresses from 2 to 2A to 2B to 3. This is so simple to do when you have a computer object that models the wave and allows for ish or deviation. One can run many many epochs (data sessions) over and over again and change the parameters each time.

If one thinks of the wave as a series of vectors, then one begins to see how a direction vector can be incorporated into the ish engine or fuzzy logic. Let's suppose that Talib is right (and I am sure that he is) and there is a lot more randomness than one suspects. My posit was that computerized trading is responsible for generating the randomness.

When I altered the wave model to accommodate a deviation in the direction of the vector component in the wave, the computer came up with a model that was topless:



So, I now had models that the computer had saved. The next step was to assign Bayesian Probabilities to each model. The first injection of Bayesian probability was for predictive effect. Based on where I was at the moment, what magnitude and direction of the price vector would happen next? Then I determine the probability of which overall model that it will fit. From there one can make larger price determinations. Incidentally, no-fit is also an outcome in this model, where there simply isn't a pattern.

What's the next step? The next step is to introduce artificial intelligence multi-layer perceptrons as a fall-through model to analyze the price signal in real time. Then the perceptrons keep correcting themselves based on real time outcomes.

Can this updated algorithm score alpha and make money on stocks, futures, derivatives and Forex? I don't know yet, but I am too busy earning a living to take this to the next step. Are there any fund managers out there willing to fund a research project with the updated Elliott Wave coupled to fuzzy logic, artificial intelligence and Bayesian Inference?