Cellular Automata and Machine Learning
There is a short paper in the November 2020 issue of Distill — a peer-reviewed machine learning journal that runs in a web browser, where figures move and readers can interact with the experiments — that begins with a single animated image: a lizard emoji. The lizard is seeded from a single pixel and grows, over the course of a few hundred steps, into its full 40×40-pixel form. Then some of its pixels are removed at random. The remaining pixels regenerate the missing ones. More pixels are removed. It regenerates again.
The paper, “Growing Neural Cellular Automata,” was written by Alexander Mordvintsev, Ettore Randazzo, Eyvind Niklasson, and Michael Levin, and published in February 2020. It demonstrated something that felt simultaneously obvious and astonishing: that a neural network trained to produce a pattern from a seed could acquire, as a consequence of that training objective alone, the capacity to repair that pattern when parts were destroyed.
The implications ran in two directions simultaneously. Into machine learning: toward architectures that learn local rules rather than global mappings. Into biology: toward a mechanistic account of how organisms maintain and regenerate their shape. At the junction of these two directions sat the same mathematical object that Conway had been working with in 1970 — a grid of cells, a local rule, global behavior that surprises you.
What Neural Cellular Automata Are
A neural cellular automaton (NCA) is a cellular automaton in which the update rule — the function that determines each cell’s next state from its current state and its neighborhood — is parameterized by a neural network whose weights are learned by gradient descent.
In a conventional CA like Conway’s Life, the update rule is specified by hand (B3/S23: a dead cell is born with exactly 3 live neighbors; a live cell survives with 2 or 3 live neighbors). The rule is simple and fixed. In an NCA, the rule is a small neural network — typically 2–4 layers, a few hundred to a few thousand parameters — that takes the cell’s current state vector and its neighborhood as inputs and outputs the cell’s next state vector. The weights of this network are shared across all cells (translation invariance), so the system remains a CA in the precise sense: the same rule applied everywhere simultaneously.
The key addition is differentiability. The entire CA forward pass — running the system for N steps from a seed — can be unrolled into a computational graph and differentiated with respect to the network weights. This means you can specify a loss function (how well does the final state match a target pattern?) and use gradient descent to find weights that minimize it. You are not designing the CA rule; you are training the CA rule.
The 2020 Mordvintsev Paper: What It Found
Mordvintsev and colleagues trained NCAs on a simple task: grow a target image (a 40×40-pixel emoji rendered as a pattern of RGBA values) from a single colored seed cell, in a grid initialized to transparent. The training procedure was conceptually elegant: run the CA for a random number of steps between 64 and 96, compute the pixel-by-pixel difference from the target image, backpropagate through the entire step sequence, update the weights.
The trained NCAs grew their target images reliably. But the more interesting results came from stress tests.
Self-repair under damage. The trained NCA was run to its stable state (grown lizard). Then some fraction of cells were deleted — set to transparent. The CA continued running. Within 25–50 steps, the missing cells had been regenerated from the surrounding context. This was not explicitly trained. The NCA had learned, as a consequence of being trained to grow, that it should also respond to local pattern information and fill in what was missing.
Self-repair under persistent damage. Even when cells were randomly deleted at every step, the NCA maintained its pattern — growing faster than it was damaged. The system had acquired robustness as a structural consequence of learning to grow.
Rotation invariance. A follow-up version trained to be isotropic (rotationally invariant) produced NCAs that could self-repair rotation and orientation, not just deletion.
The biological parallel that Mordvintsev and Levin emphasized was planarian flatworm regeneration. Cut a planarian in half, and each half regrows the missing portion — including a complete head with functional brain. This is not a pre-specified response; it is a consequence of the local rules governing cell behavior, which were shaped by evolution to produce and maintain a specific global form. The NCA demonstrates, in a controlled experimental system, that this is possible: a local rule can produce both growth and regeneration as two aspects of the same underlying dynamics.
Self-Organising Textures
The Mordvintsev paper launched a small explosion of NCA applications. One of the most productive was texture synthesis.
Classical texture synthesis (in the tradition of Efros and Leung 1999, and Heeger and Bergen 1995) works by sampling patches from an example texture and stitching them together, or by matching statistics of filter responses across scales. These are feed-forward methods: given an example, they generate a new sample. They do not grow the texture from local rules; they copy it from global statistics.
NCA texture synthesis works differently. Train an NCA to maintain a target texture (rather than to grow a target image from a seed). The trained NCA then acts as a texture generator that can be evaluated at any resolution, produces seamless textures by construction (the same local rule everywhere, with no boundary effects), and can be interpolated (NCA rules can be smoothly interpolated between two textures, producing a blend that maintains the local coherence of both).
The deeper property is self-organization. The NCA texture is not stored; it is generated at each step by the local rule. If cells are perturbed — the texture damaged — the rule repairs the damage. The texture is an attractor of the dynamical system, not a fixed pattern.
This is exactly what Conway’s Life demonstrated: the still lifes and oscillators of Life are attractors of the B3/S23 rule, not patterns that were placed there. They emerge from the dynamics and are maintained by the dynamics. NCA textures extend this principle to learned rules optimized for arbitrary target textures.
Graph Neural Networks: CA on Irregular Grids
The architectures most responsible for modern machine learning’s practical successes — transformers for language, convolutional networks for images — can be understood as special cases of a more general principle: computing local functions of structured data.
Graph neural networks (GNNs) make this generalization explicit. In a GNN, data is represented on a graph: nodes have feature vectors, and edges connect related nodes. At each layer, each node updates its feature vector by aggregating messages from its neighbors according to a learned function. This is a CA on an irregular grid: the “neighborhood” of each cell is defined by the graph edges rather than by spatial adjacency, and the update rule (the message-passing function and the aggregation) is learned rather than specified.
The analogy is precise:
| CA component | GNN component |
|---|---|
| Grid cell | Graph node |
| Cell state | Node feature vector |
| Spatial neighborhood | Graph neighborhood (edges) |
| Update rule | Message-passing + aggregation function |
| Synchronous step | GNN layer |
The CA perspective clarifies something about GNNs that is otherwise obscured by their generality: the expressive power of a GNN is limited by the locality of its message-passing. A GNN with K layers can only pass information between nodes that are at most K edges apart. This is exactly the speed-of-light limitation in Life: no information can travel faster than one cell per generation. Deep GNNs (many layers) correspond to CA run for many steps.
The connection also suggests a failure mode. Very deep GNNs suffer from “over-smoothing” — the tendency of repeated local averaging to wash out node feature differences until all nodes have similar representations. This is the GNN analog of a CA rule that produces uniformity: repeated application of a mean-field update rule drives every cell toward the average. Fixing over-smoothing in GNNs and fixing uniformity in CA rules are, in the CA framing, the same problem.
Flow-Lenia: Evolution Within the CA
A different strand of ML-meets-CA research has converged on Lenia — a continuous-space, continuous-state generalization of Conway’s Life developed by Bert Wang-Chak Chan and first published in 2019. Lenia supports hundreds of complex self-organizing “lifeforms” that resemble microscopic organisms, emerging from smooth update rules applied across a continuous grid.
In December 2022, researchers published Flow-Lenia (arXiv:2212.07906), which combined Lenia with machine learning-based open-ended evolution. The key innovation: rather than fixing the Lenia update rule parameters globally, Flow-Lenia makes the parameters local — each cell carries its own CA parameters as part of its state, which can evolve along with the other state variables.
The result was a system that could support multiple different “species” of Lenia organisms — each governed by slightly different local rules — coexisting and interacting in the same grid. Evolution could operate not just on the organisms’ morphology but on their governing physics. The CA rules themselves became evolvable.
This closes a loop that Avida and Tierra left open. Avida evolved organisms within a fixed fitness landscape; Flow-Lenia evolves organisms that can modify the rules of their own universe. Whether this leads to the open-ended evolution that characterizes life on Earth — perpetually increasing complexity without convergence — remains an open research question. But the framework is the right one, and the connection to Life is explicit: Flow-Lenia is the natural endpoint of the trajectory that started with Conway in 1970.
The Unifying Insight
There is a narrative that connects Conway’s 1970 paper to Mordvintsev’s 2020 paper, and it can be stated in a single sentence:
Machine learning is rediscovering that local interaction rules can produce global structure.
Conway’s Life demonstrated this with four rules and a binary grid. The field spent fifty years building on this insight: Wolfram studied the space of possible rules; Kauffman applied Boolean networks to biology; Hopfield showed that neural dynamics converge to attractors. Then machine learning arrived, built powerful global function approximators (deep networks with millions of parameters trained on massive datasets), achieved remarkable results — and began, in the 2020s, to rediscover the efficiency of local rules.
NCAs are more parameter-efficient than global networks for many tasks involving spatial structure, because spatial structure is inherently local. A local rule learned by gradient descent generalizes across spatial scales in a way that a global function approximator cannot. An NCA trained on 40×40 images can be applied to 400×400 grids — the same rule, more cells — and the spatial structures scale accordingly. A CNN trained on 40×40 images, applied to 400×400 images, does not automatically generalize.
The pattern-recognition capacity that makes NCAs powerful is the same capacity that makes Life’s gliders interesting: a small local configuration that, when placed in the right context, propagates and interacts in complex ways. The difference is that in Life, the rule is fixed and the patterns are discovered; in NCAs, the patterns are specified and the rule is learned.
Conway’s question was: what simple rule produces interesting global behavior? Neural cellular automata ask the inverse: what rule produces this specific global behavior? Both questions have the same mathematical structure, and both are answerable.
That is not a coincidence. It is a fact about what computation is.