The Artificial Intelligence Papers
Original Research Papers With Tutorial Commentaries
About this book
Modern artificial intelligence (AI) is built upon a relatively small number of foundational research papers, which have been collected and republished in this unique 350-page book. The first chapter provides a summary of the historical roots of AI, and subsequent chapters trace its development, from Rosenblatt's perceptron in 1958 to one of the early GPT models in 2019. Each paper is introduced with a commentary on its historical context and a tutorial-style technical summary. In several chapters, additional context is provided by the paper's original author(s). Written in an informal style, with a comprehensive glossary and tutorial appendices, this book is essential reading for students and researchers who wish to understand the fundamental building blocks of modern AI.
Published July 2024
Paperback ISBN: 9781068620003
Contents
Preface
1. The Origins of Modern Artificial Intelligence
1.1 Introduction
1.2 Turing on Computing Machinery and Intelligence
1.3 The Dartmouth Summer Research Project
1.4 The Origins of Artificial Neural Networks
1.5 Modern Neural Networks
1.6 Reinforcement Learning
2. The Perceptron - 1958
Context
Technical Summary
2.1 Architecture
2.2 Activation
2.3 Learning
2.4 Hebbian Learning
2.5 The Perceptron's Nemesis: Exclusive OR
2.6 List of Mathematical Symbols
Research Paper: The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain
3. Hopfield Nets - 1982
Context
Technical Summary
3.1 Learning
3.2 Recall: Content Addressable Memory
3.3 Tolerance to Damage
3.4 The Energy Function
3.5 Results
3.6 Comments by the Paper's Author: JJ Hopfield
Research Paper: Neural Networks and Physical Systems with Emergent Collective Computational Abilities
4. Boltzmann Machines - 1984
Context
Technical Summary
4.1 The Boltzmann Machine Energy Function
4.2 Simulated Annealing
4.3 Learning by Sculpting Distributions}
4.4 Learning in Boltzmann Machines
4.5 Learning by Minimising Kullback--Liebler Distance
4.6 Learning by Maximising Likelihood
4.7 Results: Autoencoders and Exclusive OR
4.8 List of Mathematical Symbols
4.9 Comments by the Paper's Author: T Sejnowski
Research Paper: Boltzmann Machines: Constraint Satisfaction Networks That Learn
5. Backpropagation Networks - 1985
Context
Technical Summary
5.1 The Backpropagation Algorithm: Summary
5.2 Forward Propagation of Input States
5.3 Backward Propagation of Errors
5.4 Weights as Vectors
5.5 Results: Exclusive OR and Autoencoders
5.6 List of Mathematical Symbols
Research Paper: Learning Internal Representations by Error Propagation
6. Reinforcement Learning - 1983
Context
Technical Summary
6.1 The Associative Search Element (ASE)
6.2 The Adaptive Critic Element (ACE)
6.3 Results
6.4 Comments by the Paper's Authors: A Barto, R Sutton and C Anderson
Research Paper: Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems
7. Convolutional Neural Networks - 1989
Context
Technical Summary
7.1 The Convolutional Neural Network
7.2 LeNet5: Convolutional Neural Networks in 1998
7.3 Results of the LeCun et al. (1989) paper
Research Paper: Backpropagation Applied to Handwritten Zip Code Recognition
8. Deep Convolutional Neural Networks - 2012
Context
Technical Summary
9.1 AlexNet Architecture
9.2 Training
9.3 Results
Research Paper: ImageNet Classification With Deep Convolutional Neural Networks
9. Variational Autoencoders - 2013
Context
Technical Summary
9.1 Overview
9.2 Latent Variables and Manifolds
9.3 Key Quantities
9.4 How Variational Autoencoders Work
9.5 The Evidence Lower Bound
9.6 Maximising the Lower Bound
9.7 Results
9.8 List of Mathematical Symbols
Research Paper: Auto-Encoding Variational Bayes
10. Generative Adversarial Networks - 2014
Context
Technical Summary
10.1 The Generative Adversarial Net Architecture
10.2 Training Generative Adversarial Nets
Research Paper: Generative Adversarial Nets
11. Diffusion Models - 2015
Context
Technical Summary
11.1 The Forward Trajectory: Encoding as Diffusion
11.2 The Reverse Trajectory: Decoding
11.3 Defining a Lower Bound
11.4 Architecture and Training
11.5 Results
11.6 List of Mathematical Symbols
Research Paper: Deep Unsupervised Learning Using Nonequilibrium Thermodynamics
12. Interlude: Learning Sequences
12.1 Introduction
12.2 Static Networks for Sequences
12.3 Dynamic Networks for Sequences
12.4 Temporal Deep Neural Networks
13. Neural Probabilistic Language Model - 2000
Context
Technical Summary
13.1 Measuring Linguistic Performance
13.2 Architecture and Training
13.3 Results
13.4 List of Mathematical Symbols
13.5 Comments by the Paper's Author: Y Bengio
Research Paper: A Neural Probabilistic Language Model
14. Transformer Networks - 2017
Context
Technical Summary
14.1 The Short Version
14.2 The Long Version
14.3 Results
14.4 List of Mathematical Symbols
Research Paper: Attention Is All You Need
15. GPT-2 - 2019
Context
Technical Summary
Research Paper: Language Models Are Unsupervised Multitask Learners
16. Conclusion
16.1 Steam-Powered AI
16.2 Black Boxes
16.3 AI: Back to the Future
Appendices
A. Glossary
B. A Vector Matrix Tutorial
C. Maximum Likelihood Estimation
D. Bayes' Theorem
References
Index
Creative Commons License
Figures and text not derived from other sources in The Artificial Intelligence Papers by James V Stone are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Reviews
"James Stone has done it again: another masterful book that takes you straight to the heart of current thinking in artificial intelligence (AI) -- and its foundations. From perceptrons in 1958 to generative pre-trained transformers (GPTs), this book scaffolds the history of AI with landmark papers that chart progress over the last half-century -- as witnessed by the author. In short, this book represents an intellectual string of pearls that would complement the bookshelf of anyone invested in the forthcoming age of artificial intelligence."
Karl J Friston, MBBS, MA, MRCPsych, MAE, FMedSci, FRBS, FRS.
Scientific Director: Wellcome Centre for Human Neuroimaging.
Professor: Queen Square Institute of Neurology, University College London.
Honorary Consultant: The National Hospital for Neurology and Neurosurgery.
"I learned a lot from this collection of classic papers about the neural network approach to artificial intelligence. Spanning all the major advances from perceptrons to large language models (e.g. GPT), the collection is expertly curated and accompanied by insightful tutorials, along with intimate reminiscences from several of the pioneering researchers themselves."
Steven Strogatz, Professor of Applied Mathematics, Cornell University. Author of Nonlinear Dynamics and Chaos, 2024.
"To define the future, one must study the past. Stone's book collects together the most significant papers on neural networks from the perceptron to GPT-2. Each paper is explained in modern terms, and in many cases, comments by the original authors are included. This book describes a riveting intellectual journey that is only just beginning."
Simon Prince, Honorary Professor of Computer Science, University of Bath, England. Author of Understanding Deep Learning, 2023.
"Connectionist models of the brain date back to the work of Hebb in 1949, and the first faltering first steps towards practical applications followed soon after Rosenblatt's seminal 1958 paper on the perceptron. As of 2024, models firmly rooted in connectionism, from generative adversarial networks (GANs) to transformers, have heralded a renaissance in artificial intelligence that is revolutionising the nature of our digital age. This latest volume by James Stone collects the pivotal connectionist papers from 1958 right up to today's radical innovations, and provides an illuminating descriptive narrative charting the theoretical, technical, and application-based historical development in a lucid tutorial style. A welcome, much needed, and valuable addition to the current canon on artificial intelligence."
Mark A Girolami, FREng FRSE. Chief Scientist: The Alan Turing Institute. Sir Kirby Laing Professor of Civil Engineering, University of Cambridge, England.