Neural Computation: From McCulloch-Pitts to Modern Networks
In 1942, a 19-year-old runaway named Walter Pitts was living in the attic of Warren McCulloch’s house in Chicago. Pitts had fled an abusive home in Detroit, spent time sleeping under the University of Chicago library, been taken in by Bertrand Russell (briefly), and ended up in correspondence with McCulloch — a neurophysiologist at the Illinois Neuropsychiatric Institute who was trying to formalize what a neuron does in mathematical terms.
The paper they published together in 1943 — “A Logical Calculus of the Ideas Immanent in Nervous Activity,” in The Bulletin of Mathematical Biophysics — is the founding document of both artificial intelligence and modern neuroscience. Its central claim was that the “all-or-none” character of nervous activity — a neuron either fires or it does not — means that neural events can be treated with propositional logic. A neuron is a binary unit. A network of neurons is a logical circuit. And any logical proposition can, in principle, be computed by some configuration of such a network.
This is, in precise mathematical terms, the description of a cellular automaton. The McCulloch-Pitts neuron preceded von Neumann’s CA work by several years — and in a real sense, the neuron came first.
The McCulloch-Pitts Neuron as a CA Cell
The formal McCulloch-Pitts neuron is simple:
- It has a set of excitatory inputs and a set of inhibitory inputs
- It fires (outputs 1) if the sum of its excitatory inputs minus the sum of its inhibitory inputs exceeds a threshold T
- Otherwise it is silent (outputs 0)
This is a threshold function of its inputs — a specific case of the general Boolean functions Kauffman would later study in gene regulatory networks, and structurally identical to the birth-and-survival rule of a Life-like CA. The Life rule for a dead cell is: fire (become alive) if and only if the sum of live neighbors equals exactly 3. Replace “sum equals 3” with “sum exceeds threshold T” and you have a McCulloch-Pitts neuron.
McCulloch and Pitts proved several results about networks of such neurons. First: any logical proposition expressible in propositional calculus can be computed by some finite network of McCulloch-Pitts neurons. Second (following from the first): networks of McCulloch-Pitts neurons are Turing-complete — they can compute anything that any other computational system can compute. This was not a statement about biology; it was a statement about the logical power of threshold automata, and it held regardless of whether real neurons actually behaved this way.
The biological plausibility of the model was, and remains, limited. Real neurons are not binary — they fire in complex patterns, with variable timing that carries information. Real synapses are not binary weights — they have continuous strengths that change over time. Real neural dynamics involve continuous-time differential equations, not synchronous discrete updates. But the McCulloch-Pitts model captures something essential: the local, threshold-based, binary character of neural decision-making. It is the CA of neuroscience — idealized, powerful, and foundational.
Neural Networks as CA
A network of McCulloch-Pitts neurons can be viewed as a CA with an irregular graph structure. The cells are neurons, the connections are synapses (defining the neighborhood of each cell), and the update rule for each cell is the threshold function of its inputs.
Where Conway’s Life has a regular grid with a uniform 8-neighbor neighborhood and a single shared rule, a neural network has an arbitrary connectivity graph with a different neighborhood for each cell and a potentially different threshold for each cell. This makes neural networks far more expressive than Life-like CA — they can represent a vastly larger set of possible computational structures — while preserving the essential CA property: global behavior emerges from local updates.
The direction of causality runs both ways. From CA to neural networks: the study of CA rules helped clarify what types of computation are possible with local interactions, and this understanding has informed neural network theory. From neural networks to CA: the neural network literature developed techniques (backpropagation, gradient descent, energy function analysis) that have since been applied to CA design and analysis. The two traditions were, for decades, largely separate. They have been converging since the 1980s.
Hopfield Networks: Memory as Attractor Dynamics
In 1982, John Hopfield published a paper that transformed the study of neural networks. “Neural networks and physical systems with emergent computational abilities,” published in the Proceedings of the National Academy of Sciences, introduced what is now called the Hopfield network: a fully connected network of binary neurons (each connected to every other neuron, no self-connections) with symmetric weights, updating according to a threshold rule.
Hopfield’s key contribution was to show that such a network has a Lyapunov function — an “energy” that decreases monotonically under the network’s dynamics. This guaranteed that the dynamics converge: no matter what state you start in, the network evolves downhill on the energy landscape until it reaches a local minimum — a fixed-point attractor.
The computational idea is associative memory: store a set of patterns by setting the connection weights according to the Hebbian rule (increase the weight between two neurons that co-fire), and the stored patterns become the energy minima. Start the network in a partial or degraded version of a stored pattern, and the dynamics pull it toward the complete pattern. The network remembers by completing what it’s given.
The connection to CA is direct and deep. A Hopfield network is a CA: cells (neurons) with binary states, local update rules (the threshold function, parameterized by the weights), and synchronous or asynchronous updating. Its attractor structure — the set of energy minima and their basins — is the same mathematical object that Kauffman’s NK model uses to describe gene regulatory dynamics and that Life’s state space uses to characterize pattern evolution.
Hopfield’s energy function also made contact with the Ising model of statistical physics — a two-dimensional grid of binary spins interacting with their neighbors, used to model ferromagnetism. The Ising model is itself a CA. Hopfield networks, Ising models, Kauffman networks, and Conway’s Life are all instances of the same general structure: binary-state units with deterministic update rules, studied through their attractor dynamics. The 1982 paper revealed that physics and neural computation were studying the same mathematics from different directions.
A typical Hopfield network with N neurons can reliably store approximately 0.14N patterns before memory errors become common — the so-called storage capacity. More recent work (Krotov and Hopfield 2016; Ramsauer et al. 2020) has shown that modern “dense associative memory” networks can store exponentially more patterns, at the cost of requiring more complex update rules.
Excitable Media: The Brain’s CA in Living Tissue
Beyond formal models, there is a physical CA occurring in living neural and cardiac tissue right now. It is called an excitable medium, and its most dramatic manifestation is the spiral wave.
An excitable medium is a spatially extended system in which each local region can exist in one of three states: resting (quiescent), excited (firing), and refractory (unable to fire again for a recovery period). When a resting region is excited by its neighbors, it fires; it then enters refractory, during which it cannot be re-excited; and it finally returns to resting. This is a three-state CA, and its dynamics are governed by the same local rules that govern Life-like automata, with one important addition: the refractory period creates a directional asymmetry that standard Life lacks.
The canonical chemical demonstration is the Belousov-Zhabotinsky reaction — a chemical oscillator in which reagents cycle through oxidized and reduced states. In a thin layer, this reaction spontaneously organizes into target waves (concentric rings of oxidation spreading outward) and spiral waves (rotating pinwheels). The BZ reaction is an excitable medium in a dish, and it has been used for half a century as a model system for studying wave dynamics in excitable media generally.
In cardiac tissue, the same dynamics occur, with direct medical relevance. Normal heartbeats are propagated by a single excitation wave originating at the sinoatrial node and sweeping across the heart. When this wave breaks — due to tissue damage, altered ion channel expression, or other factors — it can curl into a spiral wave, re-exciting tissue that has just recovered. The result is a tachycardia: the heart beats too fast, driven by a rotating spiral rather than the normal pacemaker. If multiple spirals form and fragment, the result is fibrillation: chaotic, unsynchronized excitation that cannot pump blood. Ventricular fibrillation kills within minutes.
This is the cellular automaton of the heart. The three-state excitable medium model, studied by James Greenberg, Stuart Hastings, and others in the 1970s and 1980s, predicted spiral wave formation and the conditions under which spirals would stabilize, drift, or break apart. These predictions have since been confirmed in both chemical and biological preparations, and they have informed the development of defibrillation protocols: the electric shock used to treat fibrillation terminates all wave propagation simultaneously, allowing normal pacemaker activity to re-establish.
Neural tissue shows analogous dynamics. Spreading cortical depression — the traveling wave of suppressed neural activity associated with migraine aura — is an excitable medium phenomenon. The characteristic visual aura of migraine, a fortification spectrum of flickering geometric patterns, is the visual cortex’s CA dynamics made visible to the person experiencing them.
Neural Cellular Automata: Closing the Loop in 2020
For seventy years, the connection between neural networks and cellular automata was structural — they were instances of the same mathematical framework — but the two traditions developed largely separately. Neural network researchers used continuous dynamics, gradient descent, and learned weights. CA researchers used discrete rules, synchronous updates, and exhaustive simulation.
In February 2020, Alexander Mordvintsev, Ettore Randazzo, Eyvind Niklasson, and Michael Levin published “Growing Neural Cellular Automata” in Distill, merging the two traditions explicitly. Their system: each cell in a 2D grid runs a small neural network that takes the cell’s current state and its 3×3 neighborhood as input, and outputs the cell’s next state. The neural network weights are shared across all cells (translation invariance, exactly as in Life). The whole system is therefore a CA — but with a learned rule rather than a hand-specified one.
They trained this system to grow a target image (typically a 40×40-pixel emoji) from a single seed cell, using gradient descent on the difference between the grown pattern and the target. The results were striking. The system learned to grow the target pattern reliably, and then — without any specific training for this — to self-repair: when cells were randomly destroyed, the surrounding cells regenerated the missing pattern within a few dozen steps.
The self-repair result is the one that made biologists take notice. Real organisms regenerate. Salamanders regrow lost limbs. Planarian flatworms regrow entire heads, complete with functioning brains. This regenerative capacity has been one of the deepest puzzles in developmental biology: how does a fragment of an organism “know” what shape to recreate? Mordvintsev’s neural CA demonstrated that a local update rule, learned to produce a target global pattern, acquires regenerative behavior as a consequence — not as an additional objective, but because any rule that reliably grows a pattern from a single seed also has enough local pattern-recognition to restore the pattern when parts are removed.
Michael Levin, the developmental biologist on that team, has since argued that this suggests a new framework for understanding biological regeneration: not as a separate capacity requiring separate explanation, but as a consequence of the local rules that control normal development. The CA framework makes this connection explicit.
The Deep Learning Connection: CA as Inductive Bias
The 2020 neural CA paper has a broader implication for machine learning. Convolutional neural networks (CNNs) — the architecture responsible for most of deep learning’s practical successes — are, in a precise sense, learned CA rules. A convolution is a local operation: each output position is computed from a fixed-size neighborhood of input positions. CNN layers update every spatial position in parallel based on the same local operation. A CNN forward pass is a CA step.
This connection is not merely formal. The properties that make CNNs powerful — their ability to learn translation-invariant, locally-structured features — are exactly the properties that make CA rules interesting. A CNN trained on image recognition learns a hierarchy of local rules that extract progressively more abstract features. A CA with an interesting rule produces progressively more complex global behavior from the same local rule applied at every cell.
The growing field of neural CA research is, in effect, taking this connection seriously and building systems that foreground it. Rather than designing deep networks with many layers and thousands of parameters, neural CA researchers ask: what single-step local rule, applied repeatedly, produces the desired global pattern? This is Conway’s question, asked with gradient descent as the search tool.
The answer, increasingly, is: a richer variety of global behaviors than anyone would have predicted from the simplicity of the local rules. Which is exactly what the Game of Life told us in 1970.