Future Imperfect & Software Stream of Consciousness : Perils of Overtraining in AI Deep Learning

When we partnered with a local university department of Computer Science to create some Artificial Neural Networks (ANNs) for our platform, we gave them several years of data to play with. They massaged the input data, created an ANN machine and ran training epochs to kingdom come.

The trouble with ANNs, is that you can over-train them. This means that they respond specifically for the data set in a highly accurate manner, but they are not general enough to accurately process new data. To put it in general terms, their point-of-view is too narrow, and encompasses only the data that they were trained on.

In the training process, I was intuitively guessing that the learning rate and improved accuracy would improve in an exponential manner with each iterative training epoch. I was wrong. Here is a graph showing that the learning rate is rather linear than exponential in the training cycle.

So the minute that the graph stops being linear, is when you stop training. However, as our university friends found out, they had no way to regress the machine to exactly one training epoch back. They had no record of the weights, biases, adjusted weights, etc of the epoch after the hours of back propagation or learning, and as a result, they had to re-run all of the training.

Me, I had a rather primitive way of saving the states of the neurons and layers. I mentioned it before. I wrote my machine in Java using object oriented programming, and those objects have the ability to be serialized. In other words, binary objects in memory can be preserved in a state, written to disk, and then resurrected to be active in the last state that they were in. Kind of like freezing a body cryogenically, but having the ability to bring it back to life.

So after every training epoch, I serialize the machine. If I over-train the neural nets, I can get a signal by examining and/or plotting the error rates which are inverse to the accuracy of the nets. In the above graph, once the function stops being linear, I know that I am approaching the over-training event horizon. Then I can regress with my save serialized versions of the AI machine.

Then the Eureka moment struck me! I had discovered a quick and easy cure for over-training.

I had in a previous blog article, a few down from here (or http://coderzen.blogspot.com/2015/01/brain-cells-for-sale-need-for.html ) I made the case for a standardized AI machine where you could have an XML or JSON lightweight representation of the layers, inputs, number of neurons, outputs and even hypothetical value mappings for the outputs, and then you wouldn't need to serialize the whole machine. At the end of every training epoch, you just output the recipe for the layers, weights, biases etc, and you could revert to an earlier training incarnation by inputting a new XML file or a JSON object.

It's really time to draw up the .XSD schema for the standardized neuron. I want it to be open source. It would be horrible to be famous for thinking of a having a standardized neural net. Besides, being famous is just a job.

Future Imperfect & Software Stream of Consciousness

Perils of Overtraining in AI Deep Learning

No comments:

Post a Comment