In this quest for making machines think, the neurophysiologist Warren McCullough and mathematician Walter Pitts in 1943 worked together to understand 'how neurons work'. They modeled a simple neural network using electrical circuits. This was the first step in the invention of an artificial neuron. The model’s simplicity was a major limitation since it only accepted binary inputs, incorporated threshold step activation functions and it did not account weights. In 1949, Donald Hebb proposed when two neurons fire together then the connection between two neurons is strengthened. It was ascertained that this is one of fundamental operations for learning and memory. In the 1950s, Nathanial Rochester, from IBM Research Laboratories, tried to simulate a neural network on IBM 704. He was the chief architect of IBM 701 computers. His group was assigned to pattern recognition, information theory and so on. In 1956, Nathaniel Rochester (IBM Laboratories) along with John McCarthy (Dartmouth College), Claude E.Shannon (Bell Telephone Laboratories) and Marvin L. Minsky (Harvard University) formed a working group and submitted a proposal for a workshop/conference to the Rockefeller Foundation.
This workshop took place in July, August 1956 and is recognized as the official birthplace of Artificial Intelligence. Attendees came from diverse backgrounds: electrical engineering, psychology, mathematics and more. The proposal for the workshop is shared below, which gives a broader definition of Artificial Intelligence:
“An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. … For the present purpose the artificial intelligence problem is taken to be that of making a machine behave in ways that would be called intelligent if a human were so behaving.” [9]
“The attendees at the 1956 Dartmouth conference shared a common defining belief, namely that the act of thinking is not something unique either to humans or indeed even biological beings. Rather, they believed that computation is a formally deducible phenomenon which can
be understood in a scientific way and that the best nonhuman instrument for doing so is the digital computer (McCormack, 1979).” [9]
Marvin Minsky, Claude Shannon, Ray Solomonoff and other scientists at the Dartmouth Summer Research Project on Artificial Intelligence (Photo: Margaret Minsky) [9]
An image of the perceptron from Rosenblatt's “The Design of an Intelligent Automaton,” Summer 1958 [8]
It was period when everyone was trying to implement a neural network for a real-time application. Though the traditional Von Neumann architecture dominated the computing scene then, John Von Neumann worked on imitating neural functions by using telegraph relays or vacuum tube in 1957 and was not quite successful. Frank Rosenblatt developed the first perceptron by altering McCulloch-Pitt’s neuron. This perceptron follows the Hebbe’s rule, that is weighting the inputs, and was the building block of formation of neural networks. In July 1958, IBM 704, a 5-ton computer, the size of a room was fed a series of punch cards. The computer taught itself to distinguish the cards marked on the left from the cards marked on the right in 50 trials. This was the first demonstration of perceptron, the neural machine capable of delivering and conceiving the original idea which is still in use. He discussed the perceptron in detail in his 1962 book, Principles of Neurodynamics.
Around the same time, in 1959, Bernard Widrow and Marcian Hoff (Stanford) developed models called ADALINE and MADALINE. For Stanford’s love of acronyms, the names mean Multiple ADAptive LINear Elements. ADALINE could recognize binary patterns, if it was reading streaming bits from a phone line, it could predict the next one. MADALINE was the first neural network applied to real world problems that eliminates the echoes on phone lines using an adaptive filter. Though the system is as ancient as air traffic control systems, it is still in use.
Marvin Minsky and Seymour Papert in 1969, proved that the perceptron was limited in their book Perceptrons. At conferences, Minsky and Rosenblatt publicly debated the viability of perceptron.
“Rosenblatt had a vision that he could make computers see and understand language. And Marvin Minsky pointed out that that’s not going to happen, because the functions are just too simple.” [8]
The problem with Rosenblatt’s perceptron was it only had one layer and while modern networks have many layers. The real contending problem was that the model provided provision for generating AND, OR, NAND and NOR gates but could not generate the XOR gate. This led to a winter period in neural networks where research progress halted in this direction for a few years.
The thawing of this frosty AI winter began in 1982 when Jon Hopfield published what is known today as the HopfieldNet at the NAS(National Academy of Sciences). Around the same time, the field got its much needed competitive boost when Japan announced its 5th generation effort on the research of Neural networks. This helped US research institutes pitch for larger amounts of funding and in 1985 the American Institute of Physicals established the Neural Networks for Computing annual meeting , followed by the Institute of Electrical and Electronics Engineers (IEEE) in 1987.
The year 1997 was another milestone moment. A recurrent neural network framework , LSTM was proposed by Schmidhuber and Hochrelter. They introduced the Constant Error Carousel units, to deal with the vanishing gradient problem.
In 1998, Yann LeCun published a seminal paper on the Gradient - Based Learning applied to document recognition. Gradient based learning provided the framework to build systems, which cater to learning architectures that can handle high dimensions of input, high degree of variability and the complex non linear relationships between the inputs and outputs.
Summaries of some of the important academic papers are discussed below:
-
Attention Is All You Need
The authors propose a simple network architecture called the Transformer based on attention mechanisms, dispensing the need of using complex recurrent networks or CNNs by relying entirely on self attention. The paper describes the architecture of the model with an encoder decoder structure where each encoder has sublayers, one a multihead self attention mechanism and another a fully connected feed forward network. The decoder is composed of a stack of 6 identical layers where the sublayer performs multi head attention over the output of the encoder stack. The application of attention is done in 3 different ways:
- In the “encoder-decoder” attention layers mimic the typical encoder-decoder attention mechanism in sequence to sequence models.
- The encoder contains self attention layers where all the keys,values and queries come from the output of the previous layer in the encoder. All the positions in the encoder can attend to all the positions in the previous layer of the encoder.
- The self attention layers in the decoder allow each position in the decoder to attend to all positions in the decoder.
The paper compares the self attention model to the recurrent and convolutional networks and comes to the conclusion that self attention layers are faster than recurrent layers in terms of computational complexity. The complexity of a separable convolution is equal to the combination of a self attention layer and a point wise feed forward layer i.e. the approach taken in the paper.
This paper achieves a new state of the art when it comes to translational tasks.
The code is available at https://github.com/tensorflow/tensor2tensor.
2. Bandwidth Prediction and Congestion Control for ABR Traffic Based on Neural Networks
The paper uses Back propagation Neural Networks for congestion control in ATM networks for predicting the bursty available bandwidth for ABR traffic effectively and to force the queue level in the Buffer to a desired region. The fairness of this model is achieved through the fair algorithm.
3. Using Artificial Neural Network Modeling in Forecasting Revenue: Case Study in National Insurance Company/Iraq
The paper aims to forecast the insurance premiums revenue of the National Insurance Company between the years 2012 to 2053 using Artificial Neural Network based on the annual data available for the insurance premiums revenue between 1970 to 2011. The authors used a Neural network fitting tool to help select the data, create and train a network and evaluate its performance.The approach is used by taking the annual investment income of National Insurance Company as an independent (Input) variable. The experiments show that the best architecture for fitting a neural network as one input vector, five hidden layers and the output is one vector (1-5-1), the ratio of increasing the insurance premiums revenue is approximately 120% for the period between 2012 and 2053.
4. Neural Network Approach to Forecast the State of the Internet of Things Elements
The aim of this paper is to use neural networks to predict the states of the elements present in an IOT based architecture. The proposed model/ architecture of the neural networks is a combination of multilayered perpetron and probabilistic neural networks. Authors analyze the performance of this model based on accuracy and efficiency of the model. The combined ANN network helps realize a forecasting and monitoring model for the Internet of Things. This results in reduction of IOT administration costs and emergency resolutions.
5. AnyNets: Adaptive Deep Neural Networks for medical data with missing values
The paper introduces a novel class of adaptive deep neural networks called AnyNets which is designed to remove the need for imputation for missing data for patient records in medicine. This is because a large number of patient records contain incomplete information and measurements which are then filled up using default values which causes bias and limits generalization. The paper goes on to process various kinds of input values through the medical datasets under both supervised and unsupervised learning and achieve better results than the electronic medical records or registry.
References:
- https://home.csulb.edu/~cwallis/artificialn/History.htm#:~:text=One%20of%20the%20difficulties%20with,incorporate%20weighting%20the%20different%20inputs.
- https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history1.html
- https://medium.com/analytics-vidhya/brief-history-of-neural-networks-44c2bf72eec
- https://towardsdatascience.com/a-concise-history-of-neural-networks-2070655d3fec
- https://blogs.umass.edu/comphon/2017/06/15/did-frank-rosenblatt-invent-deep-learning-in-1962/
- https://home.csulb.edu/~cwallis/artificialn/History.htm#:~:text=One%20of%20the%20difficulties%20with,incorporate%20weighting%20the%20different%20inputs.
- https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history1.html
- https://news.cornell.edu/stories/2019/09/professors-perceptron-paved-way-ai-60-years-too-soon
- https://www.cantorsparadise.com/the-birthplace-of-ai-9ab7d4e5fb00
- https://medium.com/analytics-vidhya/understanding-basics-of-deep-learning-by-solving-xor-problem-cb3ff6a18a06
- https://www.forbes.com/sites/gilpress/2017/08/27/artificial-intelligence-ai-defined/?sh=55526a987661
- https://developer.ibm.com/articles/cc-cognitive-neural-networks-deep-dive/