A Quick History of AI, ML and, DL

Have you ever wondered about the history of AI, ML, and DL? In today’s world, we often hear about stuff like AI, ML, DL, etc. and, if we try to know about the technical products around us, you will find that they are very heavily dependent on concepts of AI, ML, DL, etc. Due to the advancements in new algorithms and technology, there is a huge industrial demand for these domains. Thus, many are trying to catch the trend. You might be wondering that these are very new technologies that exploded.

But, you would be surprised to know that these are rooted back in the early 1940s. It would be somewhat inappropriate to ask who invented these domains. It is a combination of many individuals, who contributed with distinct inventions of algorithms, methods, frameworks, etc. The history of AI-ML is quite an interesting topic.

So, let us begin our journey by quick time-travel to know how it originated and how it became our daily necessity.

1943 – The inaugural Mathematical representation of Neural Network

History of AI ML and DL- Inaugural Mathematical representation of Neural Network

Warren Sturgis McCulloch, a neurophysiologist, and Walter Harry Pitts, a logician, proposed the first mathematical model of a neural network. You can read about their work in the article by the ref. “McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics5(4), 115-133.” Their work aimed to mimic human thought processes. The proposed model was also known as McCulloch-Pitts neurons.

1949 – The organization of behavior: A neuropsychological theory

History of AI, MLm DL - Organization of Behavior of Neural Networks

The book “The Organization of Behavior” by Donald Olding Hebb, a psychologist, was published in 1949. The book discussed how behavior relates to neural networks and brain activity. Later it became one of the foundation stones of Machine Learning. You can read about the book by the ref. “Hebb, D. O. (2005). The organization of behavior: A neuropsychological theory. Psychology Press.”


1951 – First Artificial Neural Network

History of AI ML and DL - First ANN

Marvin Minsky, a computer scientist, and Dean Edmonds formulated the first artificial neural network. The neural network consisted of 40 interconnected neurons with short- and long-term memory. To have an insight about their work, you can go through the link:- https://cyberneticzoo.com/mazesolvers/1951-maze-solver-minsky-edmonds-american/.

1952 – The foresight of machine learning

History of AI ML and DL - Begining of ML

Alan Turing was a mathematician famous for decoding the encryption of German Enigma machines during the Second World War and describing a method known as the Turing Test, forming the basis for artificial intelligence. Later on in his work, he predicted the development of machine learning. He wrote a paper on Computing Machinery and Intelligence, where he mentioned the Turning Test. The test aimed to conclude whether a machine can think. You can read about their work in the article by the ref. “Turing, A. M. (2009). Computing machinery and intelligence. Springer, Dordrecht.”

1952 – Arthur Samuel coins the term Machine Learning

History of AI ML and DL - Arthur Samuel Coins Machine Learning

Arthur Lee Samuel, a computer scientist, was a pioneer of artificial intelligence. He managed to make computers learn from their experience. While at IBM, he formulates the first machine learning algorithms. The algorithms aimed to play a game of checkers. The algorithm was a special one that with each move, the computer would be better and better, correcting its errors and finding more reliable ways to win from that data. This game was one of the first examples of learning by machines. You can have a look at the article by the ref. “Samuel, A. L. (1959). Some studies in machine learning use the game of checkers. IBM Journal of research and development3(3), 210-229.”


1956 – The Dartmouth Workshop

History of AI ML and DL - Dartmouth Conference

The Dartmouth Summer Research Project on Artificial Intelligence was a 1956 summer workshop widely considered to be the founding event of artificial intelligence as a field. The project lasted approximately six to eight weeks. In this workshop, prominent scientists from Mathematics, Engineering, Computer, and Cognitive Sciences had a brainstorming session on AI and ML research. You can have a look at the article by the ref. “McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (2006). A proposal for the Dartmouth summer research project on artificial intelligence, august 31, 1955. AI magazine27(4), 12-12.”


1957 – Planting the seeds for deep neural networks


History of AI ML - Beginning of DNN

Frank Rosenblatt, a psychologist notable in the field of artificial intelligence. He authored a paper about “The Perceptron: A Perceiving and Recognizing Automaton” in 1957. Here, he discussed the construction of an electronic or electromechanical system. The system aimed to learn and understand the similarities or correspondences between patterns of optical, electrical, or tonal data in a way that is nearly comparable to the processing of a biological brain. You can have a look at the article by the ref. “Rosenblatt, F. (1957). The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory.”

1959 – Discovery of cells in visual cortex


History of AI ML and DL - Cells in Visual Cortex

Two neurophysiologists David Hunter Hubel and Torsten Nils Wiesel, worked together to discover two types of cells in the primary visual cortex: simple cells and complex cells. Their research inspired the formulation of various forms of Artificial Neural Networks. You can have a look at the article by the ref. “Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurons in the cat’s striate cortex. The Journal of Physiology148(3), 574-591.”

1960 – Basics of a continuous backpropagation

Henry J. Kelley is a professor in the fields of aerospace and ocean engineering. He authored a paper about “Gradient Theory of Optimal Flight Paths”. In the research paper, he discussed the behavior of systems with inputs and how that behavior is updated based on feedback. This concept led to the foundation of continuous Backpropagation that is used as an essential feature to reduce the loss in Neural Networks over the years. You can read the article by the ref. “Kelley, H. J. (1960). Gradient theory of optimal flight paths. Ars Journal30(10), 947-954.”


1965 – Multi-Layer Perceptron


Alexey Ivakhnenko, a mathematician, and Valentin Lapa built a working deep learning network. The network is sometimes also called the first-ever multilayer perceptron. This network uses a polynomial activation function and is trained by utilizing the Group Method of Data Handling (GMDH).

GMDH referred to a group of inductive algorithms concerning computer-based mathematical models of multiple parameters which are fully automatic in their structure. The algorithm uses deep feed-forward multi-layer perceptron using statistical approaches at every layer to obtain the optimum features and forward them within the network. You can read the article by the ref. “Ivakhnenko, A. G., & Ivakhnenko, G. A. (1995). The review of problems solvable by algorithms of the group method of data handling (GMDH). Pattern recognition and image analysis c/c of raspoznavaniye obrazov i analiz izobrazhenii5, 527-535.”


1967 – Nearest Neighbor Algorithm

History of AI ML and DL - A new architecture Nearest Neighbor Algorithm

Thomas M. Cover, an information theorist, and Peter E. Hart is a computer scientist who published a paper on Nearest Neighbor Algorithm in IEEE 1967. One of the fundamental decision procedures used for classification is the Nearest Neighbour (NN) rule. It classifies a sample based on the category of its Nearest Neighbor. You can read the article by the ref. “Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory13(1), 21-27.”

1979 – Neural Network learns to recognize images

The History of Machine Learning: How Did It All Start?

Kunihiko Fukushima, a computer scientist, is widely recognized for his work on Neocognitron. It is a multilayered neural network that is utilized to identify patterns in images. This framework has been used for recognizing handwritten patterns, recommendation systems, and natural language processing. In the coming years, his work helped to build the first Convolutional Neural Networks (CNNs). You can read the article by the ref. “Fukushima, K. (1979). A neural network model for a mechanism of pattern recognition unaffected by shift in position-Neocognitron. IEICE Technical Report, A62(10), 658-665.” and “Fukushima, K., & Miyake, S. (1982). Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In Competition and cooperation in neural nets (pp. 267-285). Springer, Berlin, Heidelberg.”


1982 – Associative Neural Networks

John Joseph Hopfield, a scientist, is widely recognized for his work on Associative Neural Network. It is an ensemble-based method inspired by the function and structure of neural network correlations in the brain. The method operates by simulating the short- and long-term memory of neural networks. You can go through the article by the ref. “Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences79(8), 2554-2558.”


1985 – Program pronounces the English words in the identical way a baby does

NETtalk: a parallel network that learns to read aloud | Semantic Scholar

Terry Sejnowski, a computer researcher, is famous for coupling his expertise in biology and neural networks. In 1985 he built NETtalk, a program aimed to pronounce English words similarly a baby does and improve its efficiency over time. You can go through the article by the ref. “Sejnowski, T. J., & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce English text. Complex systems1(1), 145-168.”


1986 – Learning Representations by Back-propagating Errors

Learning representations by back propagating errors

David Rumelhart, a psychologist, Geoffrey Hinton, a computer scientist, and Ronald J. Williams, a professor. Together they authored a paper on “Learning Representations by Back-propagating Errors” in 1985. The paper discusses backpropagation in much greater detail. They explained how it could enhance neural networks like ANNs, CNNs, etc. for many tasks. You can read the article by the ref. “Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature323(6088), 533-536.”


1986 – Restricted Boltzmann machine (RBM)

Deep Learning — Restricted Boltzmann Machine

Paul Smolensky, a cognitive scientist, introduces the concept of the Restricted Boltzmann machine. A restricted Boltzmann machine is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. This algorithm is useful for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modeling. You can read the article by the ref. “Zhang, N., Ding, S., Zhang, J., & Xue, Y. (2018). An overview of restricted Boltzmann machines. Neurocomputing275, 1186-1199.”


1989 – Handwritten digit recognition with a Backpropagation network

Handwritten digit recognition using Backpropagation

Yann André LeCun, a computer scientist, utilized the concepts of Convolutional Neural Networks and Backpropagation to read and recognize patterns in handwritten digits in 1989. You can read the article by the ref. “LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation1(4), 541-551.”


1989 – Q-learning

History of AI ML and DL - Q-Learning

Christopher Watkins wrote a thesis on Learning from Delayed Rewards in 1989. He presented the theory of Q-learning, which considerably improves the practicality and usefulness of Reinforcement Learning.

The algorithm has the potential to learn optimal control directly without representing the transition possibilities of the Markov Decision Process. You can have a look at the article by the ref. “Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine learning8(3-4), 279-292.”


1995 – Support Vector Machine (SVM)

Diagnosis using Support Vector Machines

Though SVM has been present since the 1960s, Corinna Cortes, a computer scientist, Vladimir Naumovich Vapnik developed the current standard model of SVM in 1995. Support Vector Machine (SVM) is a linear model for classification and regression problems. It can solve linear and non-linear problems and work well for many practical problems. The idea of SVM is simple: The algorithm creates a line or a hyperplane which separates the data into classes. You can go through the article by the ref. “Cortes, C., & Vapnik, V. (1995). Support vector machine. Machine learning20(3), 273-297.”


1995 – Random Decision Forests

History of AI ML and DL - Random Decision Forests

Tin Kam Ho is a computer scientist who introduced the concept of Random decision Forests in 1995. Random forests or random decision forests are an ensemble learning method for classification, regression, and other tasks that generates multiple decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. You can go through the article by the ref. “Ho, T. K. (1995, August). Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition (Vol. 1, pp. 278-282). IEEE.” and “Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence20(8), 832-844.”


1997 – Long short-term memory (LSTM)

History of AI ML and DL - Long short-term memory (LSTM)

Long short-term memory (LSTM) was introduced by two computer scientists Jürgen Schmidhuber and Josef Sepp Hochreiter, in 1997. They improve both the effectiveness and practicality of recurrent neural networks by excluding the long-term dependency problem. Long short-term memory is an artificial recurrent neural network architecture used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. It can process not only single data points but also entire sequences of data. You can read the article by the ref. “Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation9(8), 1735-1780.”

1997 – Computer beats Garry Kasparov

History of AI ML and DL - Deep Blue, defeated GM Garry Kasparov

IBM computer, Deep Blue, defeated GM Garry Kasparov in a chess game. This event gave proof that machines’ intelligence was catching up to human intelligence. You can read about the game by the ref. https://bit.ly/3nnrMY0.


2009 – ImageNet

History of AI ML and DL - ImageNet

Fei-Fei Li, a computer scientist, launched an extensive visual database of labeled images “ImageNet. She expanded the data available for training algorithms as she assumed that AI and ML required big training data that matches the real-world scenario. The database consists of 14 million (14,197,122 at last count) labeled images to researchers, educators, and students. You can have a look at the database by the link https://www.kaggle.com/c/imagenet-object-localization-challenge.


2012 – AlexNet

Explanation of AlexNet

Alex Krizhevsky is a computer scientist most noted for his work on artificial neural networks and deep learning. In 1012 he introduced AlexNet that improved upon LeNet-5. It initially contained only eight layers – five convolutional followed by three fully connected layers using rectified linear units. You can read about the architecture by the ref. “Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems25, 1097-1105.”


2012 – Google Brain learns to identify cats on photos


Having a comprehensive machine learning environment, Google X Lab has built an artificial intelligence algorithm, Google Brain. In 2012 the architecture became famously good at image processing, capable of recognizing cats in pictures. Their network trained on randomly taken 10,000,000 unlabeled images. The network was able to identify cat images with 74.8 percent accuracy. This was another step in image recognition. You can read the article by the link: – https://bit.ly/2ZAOYKu.


2014 – Generative Adversarial Networks (GAN)

Ian J. Goodfellow is a researcher, and his team introduced Generative Adversarial Networks (GAN) in 2014. The Generative Adversarial Networks could be one of the most powerful algorithms in AI. A generative adversarial network (GAN) is a machine learning model in which two neural networks compete with each other to become more accurate in their predictions. GANs typically run unsupervised and use a cooperative zero-sum game framework to learn. You can read more about GAN by the ref. “Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems27.”


2014 – DeepFace

Facebook's DeepFace

In 2014, the Facebook research team built DeepFace. It is a deep learning facial recognition system consisting of a nine-layer neural network trained on 4 million images of Facebook users. The neural network can spot a human face in images with an accuracy of 97.35%. You can read more about DeepFace by the ref. “Taigman, Y., Yang, M., Ranzato, M. A., & Wolf, L. (2014). DeepFace: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1701-1708).”


2014 – Chatbot

Eugene Goostman chatbot claiming to pass the Turing test

In 2014 Vladimir Veselov, Eugene Demchenko, and Sergey Ulasen developed the first chatbot and named Eugene Goostman. Some regard it as having passed the Turing test, a test of a computer’s ability to communicate indistinguishably from a human. Goostman is portrayed as a 13-year-old Ukrainian boy with characteristics that are intended to induce forgiveness in those with whom it interacts for its grammatical errors and lack of general knowledge. You can read more about chatbot by the link:- https://en.wikipedia.org/wiki/Eugene_Goostman.


2016 – Face2Face


A group of scientists presented Face2Face during the Conference on Computer Vision and Pattern Recognition in 2016. It is an approach for a real-time facial reenactment of a monocular target video sequence. The source sequence is also a monocular video stream, captured live with a commodity webcam. You can read more about Face2Face by the ref. “Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2016). Face2face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2387-2395).”


2018: BERT


In 2018 Google developed BERT, the first bidirectional unsupervised language. It can perform several natural language processing tasks using transfer learning. You can read more about BERT by the ref. “Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.”


Leave a Reply

Your email address will not be published. Required fields are marked *