Cognizer-Understanding AI 2.0

Introducing AI 3.0 / Natural Language Intelligence / The Corporate Brain

By Jack Porter

July 29, 2019

Where AI 1.0 relied on brittle engineered procedures, in AI 2.0, data scientists focused on advanced math. This started out with basic statistics and turned into hundreds of algorithms that could predict some form of trend or classification.

Used independently, these algorithms rarely are able to achieve accuracies above about 50%. This is primarily because most data science problems are non-linear. That is, the data space is not consistent. If the problem you are trying to solve is churn at a bank, some of the customers could be leaving the bank because they were turned down for a car loan, while others because of banking fees. Still others could be leaving because they moved, divorced or even died. Each of these has its own pattern and therefore needs its own model for predictions.

Data scientists tried to solve this by creating “ensembles” of models that could predict each form of behavior. This helped a lot, but required them to understand the underlying parameter of each of the behavior patterns. Many times these patterns were very sophisticated, involving hundreds or thousands of features over time.

In 2007, Geoffrey Hinton introduced his seminal paper “Unsupervised Learning of Image Recognition” that became the inflection point for Deep Learning. This was not the first research in this area. In fact, even Hinton had written an important paper 11 years earlier on backpropagation. But the 2007 paper was timed well, and the concept of Deep Learning was born. Then, in 2012, Hinton, Ilya Sutskever and Alex Krizhevsky used Deep Learning in the acclaimed ImageNet competition and blew the record away. Their advanced Deep Learning model improved the error rate of image recognition by a whopping 10.8 percentage points, a 41% advantage over the next competitor. With that, Deep Learning was off to the races.

Since that time, Deep Learning has gone from Hinton’s team getting a 23% error rate 29 out of 38 teams in the ImageNet competition getting less than 5% error rate by 2017. All used Deep Learning. In fact, in 2019, ImageNet researchers consistently get error rates below 2%.

But Deep Learning’s amazing performance is not restricted to image recognition. It is basically good at any type of classification problem where there is plenty of labeled data. This includes voice recognition, cancer screening, autonomous cars and robotics. It can be used against business data for customer engagement, fraud, anti-money laundering and retention. It is used in diverse industries such as banking, pharmaceuticals, chemical, oil and gas, and agriculture. In each of these situations, good data scientists with lots of data can get the classification to more than 90% accuracy. Again, this is a game changer.

The concept of Deep Learning is that the data scientist would create a stacked neural network and “feed forward” labeled data. During the classification, if the model did not equal the label, the error would be back propagated down the network, adjusting the weights of each neuron as it goes. This process was iterated using a gradient descent until the error was collapsed and outcome converged.

As Deep Learning took hold, it began to diversify into several architectures. Convolutional networks were used for spatial problems such as image recognition, Recurrent Neural Networks for longitudinal analysis and Self-Organizing Maps for dimensionality reduction. Today, there are more than 25 unique architectures and many more variations. Deep Learning’s advantage is that the “Feature Detection” is done automatically. Data scientists do not have to guess what is causing the predictive behavior; the network picks this up by itself as the weights of the neurons converge.

The problem with Deep Learning is that it requires a lot of data, and the data must be labeled. There are many research projects trying to reduce this requirement, but it is still a big problem. In addition, Deep Learning is really only focused on classification, either spatial or temporal, but not at the same time. This means it is really great at classifying images, but not great at predicting sequences of data.

This is where our brain is formidable. Unlike AI based on engineered procedures or mathematically calculated classifications, our brain is a “Prediction Engine.” It is great at constructing a model of the world, and then predicting future outcomes and identifying anomalies based on those predictions. It can do this with very little data and uses transfer learning to establish similar behavior.

This is what scientists consider intelligence, and this will be the basis of the next generation of AI, which will work much more like our human brains.