The Artificial Intelligence Papers

Original Research Papers With Tutorial Commentaries

About this book

Modern artificial intelligence (AI) is built upon a relatively small number of foundational research papers, which have been collected and republished in this unique 350-page book. The first chapter provides a summary of the historical roots of AI, and subsequent chapters trace its development, from Rosenblatt's perceptron in 1958 to one of the early GPT models in 2019. Each paper is introduced with a commentary on its historical context and a tutorial-style technical summary. In several chapters, additional context is provided by the paper's original author(s). Written in an informal style, with a comprehensive glossary and tutorial appendices, this book is essential reading for students and researchers who wish to understand the fundamental building blocks of modern AI.

Published July 2024

Paperback ISBN: 9781068620003

Download Chapter 1 (PDF)

Corrections

Amazon.com logo
Amazon.co.uk logo

Contents

Preface


1. The Origins of Modern Artificial Intelligence 

1.1 Introduction
1.2 Turing on Computing Machinery and Intelligence
1.3 The Dartmouth Summer Research Project
1.4 The Origins of Artificial Neural Networks
1.5 Modern Neural Networks 

1.6 Reinforcement Learning


2. The Perceptron - 1958

Context

Technical Summary

2.1 Architecture
2.2 Activation
2.3 Learning
2.4 Hebbian Learning
2.5 The Perceptron's Nemesis: Exclusive OR
2.6 List of Mathematical Symbols

Research Paper: The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain

3. Hopfield Nets - 1982

Context

Technical Summary

3.1 Learning
3.2 Recall: Content Addressable Memory
3.3 Tolerance to Damage
3.4 The Energy Function
3.5 Results

3.6 Comments by the Paper's Author: JJ Hopfield

Research Paper: Neural Networks and Physical Systems with Emergent Collective Computational Abilities


4. Boltzmann Machines - 1984

Context

Technical Summary

4.1 The Boltzmann Machine Energy Function
4.2 Simulated Annealing

4.3 Learning by Sculpting Distributions}
4.4 Learning in Boltzmann Machines
4.5 Learning by Minimising Kullback--Liebler Distance
4.6 Learning by Maximising Likelihood
4.7 Results: Autoencoders and Exclusive OR
4.8 List of Mathematical Symbols
4.9 Comments by the Paper's Author: T Sejnowski

Research Paper: Boltzmann Machines: Constraint Satisfaction Networks That Learn


5. Backpropagation Networks - 1985

Context

Technical Summary

5.1 The Backpropagation Algorithm: Summary
5.2 Forward Propagation of Input States
5.3 Backward Propagation of Errors
5.4 Weights as Vectors
5.5 Results: Exclusive OR and Autoencoders
5.6 List of Mathematical Symbols

Research Paper: Learning Internal Representations by Error Propagation


6. Reinforcement Learning - 1983

Context

Technical Summary

6.1 The Associative Search Element (ASE)
6.2 The Adaptive Critic Element (ACE)
6.3 Results
6.4 Comments by the Paper's Authors: A Barto, R Sutton and C Anderson

Research Paper: Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems


7. Convolutional Neural Networks - 1989

Context

Technical Summary

7.1 The Convolutional Neural Network
7.2 LeNet5: Convolutional Neural Networks in 1998
7.3 Results of the LeCun et al. (1989) paper

Research Paper: Backpropagation Applied to Handwritten Zip Code Recognition


8. Deep Convolutional Neural Networks - 2012

Context

Technical Summary

9.1 AlexNet Architecture
9.2 Training
9.3 Results
Research Paper: ImageNet Classification With Deep Convolutional Neural Networks


9. Variational Autoencoders - 2013

Context

Technical Summary

9.1 Overview
9.2 Latent Variables and Manifolds
9.3 Key Quantities
9.4 How Variational Autoencoders Work
9.5 The Evidence Lower Bound
9.6 Maximising the Lower Bound
9.7 Results

9.8 List of Mathematical Symbols 

Research Paper: Auto-Encoding Variational Bayes


10. Generative Adversarial Networks - 2014

Context

Technical Summary

10.1 The Generative Adversarial Net Architecture
10.2 Training Generative Adversarial Nets

Research Paper: Generative Adversarial Nets


11. Diffusion Models - 2015

Context

Technical Summary

11.1 The Forward Trajectory: Encoding as Diffusion
11.2 The Reverse Trajectory: Decoding

11.3 Defining a Lower Bound

11.4 Architecture and Training

11.5 Results

11.6 List of Mathematical Symbols

Research Paper: Deep Unsupervised Learning Using Nonequilibrium Thermodynamics


12. Interlude: Learning Sequences 

12.1 Introduction 

12.2 Static Networks for Sequences

12.3 Dynamic Networks for Sequences

12.4 Temporal Deep Neural Networks


13. Neural Probabilistic Language Model - 2000

Context

Technical Summary

13.1 Measuring Linguistic Performance

13.2 Architecture and Training

13.3 Results

13.4 List of Mathematical Symbols

13.5 Comments by the Paper's Author: Y Bengio

Research Paper: A Neural Probabilistic Language Model


14. Transformer Networks - 2017

Context

Technical Summary

14.1 The Short Version

14.2 The Long Version

14.3 Results

14.4 List of Mathematical Symbols

Research Paper: Attention Is All You Need


15. GPT-2 - 2019

Context

Technical Summary

Research Paper: Language Models Are Unsupervised Multitask Learners


16. Conclusion

16.1 Steam-Powered AI

16.2 Black Boxes

16.3 AI: Back to the Future 


Appendices

A. Glossary
B. A Vector Matrix Tutorial
C. Maximum Likelihood Estimation
D. Bayes' Theorem


References


Index

Creative Commons License icon

Creative Commons License

Figures and text not derived from other sources in The Artificial Intelligence Papers by James V Stone are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Reviews

"James Stone has done it again: another masterful book that takes you straight to the heart of current thinking in artificial intelligence (AI) -- and its foundations. From perceptrons in 1958 to generative pre-trained transformers (GPTs), this book scaffolds the history of AI with landmark papers that chart progress over the last half-century -- as witnessed by the author. In short, this book represents an intellectual string of pearls that would complement the bookshelf of anyone invested in the forthcoming age of artificial intelligence."

Karl J Friston, MBBS, MA, MRCPsych, MAE, FMedSci, FRBS, FRS.

Scientific Director: Wellcome Centre for Human Neuroimaging.

Professor: Queen Square Institute of Neurology, University College London.

Honorary Consultant: The National Hospital for Neurology and Neurosurgery.


"I learned a lot from this collection of classic papers about the neural network approach to artificial intelligence. Spanning all the major advances from perceptrons to large language models (e.g. GPT), the collection is expertly curated and accompanied by insightful tutorials, along with intimate reminiscences from several of the pioneering researchers themselves."

Steven Strogatz, Professor of Applied Mathematics, Cornell University. Author of Nonlinear Dynamics and Chaos, 2024. 


"To define the future, one must study the past.  Stone's book collects together the most significant papers on neural networks from the perceptron to GPT-2.  Each paper is explained in modern terms, and in many cases, comments by the original authors are included.  This book describes a riveting intellectual journey that is only just beginning."

Simon Prince,  Honorary Professor of Computer Science, University of Bath, England. Author of Understanding Deep Learning, 2023.


"Connectionist models of the brain date back to the work of Hebb in 1949, and the first faltering first steps towards practical applications followed soon after Rosenblatt's seminal 1958 paper on the perceptron. As of 2024, models firmly rooted in connectionism, from generative adversarial networks (GANs) to transformers, have heralded a renaissance in artificial intelligence that is revolutionising the nature of our digital age. This latest volume by James Stone collects the pivotal connectionist papers from 1958 right up to today's radical innovations, and provides an illuminating descriptive narrative charting the theoretical, technical, and application-based historical development in a lucid tutorial style.  A welcome, much needed, and valuable addition to the current canon on artificial intelligence." 

Mark A Girolami, FREng FRSE. Chief Scientist: The Alan Turing Institute. Sir Kirby Laing Professor of Civil Engineering,  University of Cambridge, England.