Closed-Loop Discovery Systems
In September 2021, David Baker’s lab at the University of Washington reported proteins designed entirely by a generative model — no natural template, no homolog in the Protein Data Bank — that folded into their designed structures with atomic accuracy. The proteins were confirmed by X-ray crystallography. Some of them had no natural analog: they were folds that evolution had never explored. The pipeline that produced them ran through the same loop repeatedly: generate a sequence, predict the fold, score the predicted structure against design objectives, update the generative model, repeat.
This is the closed-loop discovery architecture: generative model proposes candidates, simulator evaluates them, fitness function scores the evaluation, generator updates. No human judgment intervenes between proposal and evaluation. The loop runs until a stopping criterion is met or the score converges.
The structure of the loop is not new. Evolutionary algorithms have operated this way since the 1960s. Bayesian optimization closed the loop for expensive black-box functions in the 1990s. What is new is the scale and the domain coverage: closed-loop systems now run in drug discovery, materials science, CA rule exploration, and automated scientific research. The loop has become fast enough, and the generative models powerful enough, that the domain of application has expanded dramatically.
The Architecture
The components are consistent across domains. A generative model produces candidates: sequences, rule tables, molecular structures, experimental protocols. A simulator runs each candidate forward: fold prediction, CA evolution, molecular docking, finite element analysis. A fitness function converts the simulator output to a scalar score: binding affinity, pattern complexity, material property, experimental metric. The scored candidates update the generative model — either directly, through fine-tuning or gradient descent, or indirectly, by serving as training examples for the next generation.
The loop is the architecture. Each component can be swapped independently. You can replace a physics-based simulator with a learned surrogate model to speed up evaluation. You can replace a simple fitness function with a neural reward model trained to approximate human preferences. You can replace gradient descent with evolutionary operators. The loop structure persists through all these variations, and its properties are the loop’s properties.
AlphaFold / AlphaProteo demonstrates the mature version. AlphaFold (2020–2021) predicted protein folding with near-experimental accuracy; AlphaProteo (2024) extended this to protein design — generating sequences that fold into target structures and bind target proteins. The fitness function included binding affinity predictions and structural stability metrics. The loop produced binders with affinities competitive with antibodies, in weeks rather than years.
Drug discovery illustrates a different regime. Insilico Medicine’s generative chemistry pipeline (2023) produced novel molecular candidates for a therapeutic target, screened them computationally, selected candidates for synthesis, tested them experimentally, and used the experimental results to update the generative model. The resulting candidate, INS018_055, entered Phase II clinical trials — a timeline roughly four years faster than conventional drug development for a comparable starting point.
CA rule exploration uses the same architecture at smaller scale. A generative model (an LLM, a genetic algorithm, or a random search with a learned surrogate) proposes rule tables. A CA simulator evaluates each rule for target properties — glider support, self-replication, Class IV behavior. A complexity score ranks the candidates. The top candidates inform the next round of proposals. Searches that would take years of manual exploration complete in hours.
What the Loop Produces
The results are not marginal improvements. AlphaProteo produced protein binders that outperformed existing experimental results in several target categories. Insilico’s pipeline identified a molecular scaffold that was genuinely novel — not a modification of a known drug — and demonstrated therapeutic activity. The CA searches have found rule tables with properties not previously documented in the literature.
The loop produces things because optimization is powerful when the fitness function captures the right objective. If you can specify what you want precisely enough to score it, the loop will find something that scores well. This is the correct characterization of what the loop does: it optimizes against a score. The output is the highest-scoring thing the generative model could find in the search space the optimizer explored.
That framing also describes the limitation.
The Theoretical Grounding Problem
A closed-loop system can find things. Understanding why those things work — why a protein with a specific sequence folds to bind a specific target, why a CA rule with a specific transition table supports gliders — requires the same kind of analysis that the loop bypasses.
In emergence terms: the loop runs the simulation, but it doesn’t analyze the rule. You have the output; you have the rule that produced it; you don’t have the explanation of why that rule produces that output. For Conway’s Life, the loop would find gliders immediately. It would not find the insight that gliders emerge because the B3/S23 rule is balanced near the edge of chaos, supports both local stability and information propagation, and is Turing complete.
The specific tradeoff is between optimization and generalization. A closed-loop drug discovery pipeline that finds a molecule binding a specific protein doesn’t learn why certain chemical motifs bind that protein class in general. The next discovery, for a different but structurally related protein, starts the loop again from scratch. The loop doesn’t accumulate transferable knowledge; it accumulates results. This is useful for producing results. It is not useful for building theory.
AlphaFold makes the tradeoff visible. The model predicts folding with extraordinary accuracy — but the prediction mechanism is not a theory of how proteins fold. The model has learned a mapping from sequence to structure that accurately captures the regularities in the training data. The causal process by which a polypeptide chain collapses into its native structure, driven by hydrophobic forces, hydrogen bonding, and entropic constraints, is not represented. If you want to understand folding kinetics, misfolding pathways, or why some sequences are particularly aggregation-prone, AlphaFold is not the right tool. For those questions, you need the mechanism, not the map.
For cellular automata specifically, the same tradeoff holds. A closed-loop search can find a CA rule with interesting properties in an hour. Determining whether those properties are mathematically significant — whether they generalize, whether they connect to known results, whether the rule is Turing complete — requires analysis that the loop does not perform. The loop produces the starting point for that analysis, not the analysis itself.
The Open Problem
The question that matters for emergence research is how to extract theoretical understanding from closed-loop pipelines. Two partial approaches are active:
Surrogate interpretability: once the loop has found something interesting, apply interpretability methods to the trained surrogate models inside the loop — the learned fitness approximators, the generative model’s learned priors — to understand what structure the loop has implicitly learned to seek. This gives you the loop’s implicit theory, which may be a good approximation to the correct theory. It is also expensive to extract and not guaranteed to be faithful.
Guided loops: inject theoretical constraints into the fitness function and generator. If you know that Class IV rules cluster near Langton’s lambda = 0.45, you can bias the search toward that region and reduce the output to rules worth analyzing. This requires the theoretical insight before the loop runs — it doesn’t extract theory from the loop’s output.
Neither approach fully solves the problem. What would solve it is a fitness function that rewards theoretical significance rather than raw performance — one that assigns higher scores to results that are more surprising given existing theory, more generalizable, more connected to known open problems. Building that fitness function is, in effect, encoding scientific taste into a scalar. That is hard in any domain. In emergence research, where significance depends on connections to a broad mathematical framework that is still being developed, it is the central unsolved problem.