Information Theory and Life

On a July morning in 1948, the Bell System Technical Journal published a paper by Claude Shannon titled “A Mathematical Theory of Communication.” It was not, by the standards of the time, an obviously important paper. It was about telephone and telegraph systems — about how to transmit messages reliably through noisy channels. But the mathematical framework Shannon built to answer that engineering question turned out to describe something far more fundamental than telephone lines.

Shannon showed that information had a precise, quantifiable definition — entropy — and that this definition was not merely useful for engineers but was connected to the deepest questions in physics, biology, and computation. The paper has since been called the “Magna Carta of the Information Age” by Scientific American. It has tens of thousands of citations, spanning every field that touches on data, communication, or organized complexity.

Conway’s Life, published twenty-two years later, is one of the richest examples of Shannon’s framework in action. A Life simulation begins with maximum entropy — a random soup of cells — and evolves toward low entropy: still lifes, oscillators, persistent structures. Information is created, propagated, processed, and destroyed. Landauer’s principle, formulated in 1961, tells us that this last operation — the destruction of information — has a thermodynamic cost. Life pays that cost at every generation.

Shannon Entropy: A Definition

Shannon’s central concept is entropy, defined for a probability distribution p₁, p₂, …, pₙ as:

H = -Σ pᵢ log₂(pᵢ)

The units are bits. The entropy measures the average uncertainty about a random variable — or equivalently, the average information content of an observation.

For a Life grid, the relevant probability distribution is the distribution of cell states. If each cell is independently alive or dead with probability 1/2, the entropy is one bit per cell — maximum entropy. There is nothing to know about one cell that tells you anything about another; the grid is maximally unpredictable.

As Life evolves from a random initial configuration, this changes. Cells that die in the first generation were probably surrounded by zero or one or four or more live neighbors — precisely the configurations where the rules produce death. The structure of the rules imposes correlations between cells: knowing the state of a cell at generation t gives you probabilistic information about its neighbors’ states at generation t-1. The entropy of the grid falls below one bit per cell.

By the time a typical random soup has reached a stable state — say, 1,000 generations in — the entropy is much lower. Still lifes are perfectly predictable: knowing that a cell is part of a block or a beehive tells you exactly what all its neighbors are. Oscillators are locally predictable: their entropy is low, though their states cycle. The entropy reduction from the initial random soup to the final stable state represents the information that Life’s rules have “extracted” — the structure they have imposed on the initial randomness.

The Speed of Light: Life’s Information Limit

In physics, the speed of light limits how quickly information can travel from one point in space to another. No signal can propagate faster than c. This limit is not an engineering constraint but a fundamental consequence of the structure of spacetime.

Life has an analog. Because each cell’s state at generation t+1 depends only on its own state and the states of its eight immediate neighbors at generation t, information can propagate at most one cell per generation. A disturbance at position (x, y) can affect position (x+n, y+m) only after at least max(|n|, |m|) generations. This is Life’s “speed of light” — often written c in Life notation, with specific spaceship speeds measured as fractions of it.

The glider travels at c/4 (diagonally, one cell every four generations). The lightweight spaceship travels at c/2 (orthogonally, two cells every four generations). No signal, no pattern, no influence can move faster than one cell per generation in any direction. This is a precise, provable statement about the information-theoretic structure of Life.

The speed-of-light limit has practical consequences for Life computation. In a Life-based computer, signals travel at speeds bounded by c. Logic gates that operate on colliding glider streams must be arranged so that the signals arrive at the right place at the right time — a constraint that the speed limit makes precise. And the limit implies that Life’s universe has a causal structure analogous to the light-cones of special relativity: the set of cells that can causally influence a given cell at generation t is bounded by a diamond-shaped “past light-cone” whose radius grows by one cell per generation going back in time.

Mutual Information and the Decay of Correlations

Two cells in a Life grid that are far apart start out (in a random initial configuration) as statistically independent: knowing the state of one tells you nothing about the state of the other. As the grid evolves, this changes.

The mutual information between two cells — a measure of how much knowing one cell’s state reduces uncertainty about the other’s — grows as Life evolves, at least initially. Neighboring cells become correlated because the rules couple them directly. Cells several cells apart become correlated because information propagates through the intermediate cells at the speed-of-light limit.

This process has a spatial structure: the mutual information between two cells at distance d takes at least d generations to develop (the information must propagate across the intermediate cells). After the initial transient, in a typical evolved Life grid, the correlations have a characteristic range determined by the size of the stable structures: cells within the same still life are perfectly correlated; cells in different still lifes, separated by empty space, become nearly independent again.

The spatial decay of correlations is one signature of the difference between Life and physical systems with long-range interactions. In Life, correlations are strictly bounded by the light-cone structure. In quantum mechanics, entanglement can create correlations between distant particles that are not mediated by any local propagation. Life is a classical, local system — correlations respect the speed-of-light limit, and entropy is local in a way that quantum entropy is not.

Landauer’s Principle and the Cost of Forgetting

In 1961, Rolf Landauer published a paper in IBM Journal of Research and Development that addressed a question no one had thought to ask: is there a thermodynamic cost to erasing information?

The answer, Landauer showed, is yes. Erasing one bit of information — logically irreversibly reducing two possible states to one — requires dissipating at least k_B T ln 2 of heat into the environment, where k_B is Boltzmann’s constant and T is the temperature of the environment. At room temperature (300 K), this is about 2.9 × 10⁻²¹ joules per bit — a vanishingly small amount, far below the energy consumption of any current transistor, but physically real and theoretically significant.

Landauer’s principle connects information to physics in a precise way. Information is not an abstract quantity floating above the physical world — it is instantiated in physical states, and changing those states has physical costs. Logical irreversibility (many-to-one operations: AND, OR, ERASE) implies physical irreversibility, which implies heat dissipation.

Life is thermodynamically irreversible. At every generation, the Life rule takes multiple possible predecessor configurations and maps them to a single successor configuration. Many different arrangements of cells at time t produce the same configuration at time t+1. This many-to-one mapping is logically irreversible — you cannot, in general, reconstruct the configuration at time t from the configuration at time t+1. By Landauer’s principle, this irreversibility has a thermodynamic cost.

This means that a physical implementation of Life — a physical device that instantiates the Life rules in actual matter — must dissipate heat at every generation. The minimum heat per bit erased is k_B T ln 2, and at each generation, Life erases many bits: every cell that dies takes with it the information about whether it was about to die from underpopulation or overpopulation. The specific mechanism differs, but the cost is real.

Charles Bennett and Reversible Computation

Landauer’s principle was initially controversial — the idea that logical operations had physical costs seemed to confuse abstract mathematics with physics. The controversy was resolved in the 1970s and 1980s by Charles H. Bennett, a physicist at IBM Research, who demonstrated that reversible computing — using only logic gates that preserve all input information, so that every computation can be run backward — could in principle be accomplished at arbitrarily low thermodynamic cost.

Bennett’s insight: if a computation is logically reversible, it is physically reversible, and a physically reversible process can exchange arbitrarily little heat with its environment (in the limit of infinitely slow operation). Erasure is only necessary for logically irreversible operations, and logically reversible computers need not erase anything.

This has implications for Life. A reversible cellular automaton — one in which every configuration has exactly one predecessor — can be implemented at arbitrarily low thermodynamic cost, in principle. Life is not reversible: its many-to-one structure means that it necessarily erases information at every step, and it necessarily dissipates heat. A reversible variant of Life (several have been designed, including Margolus’s billiard ball model and Creutz’s reversible CA) would be thermodynamically free to run.

The comparison between reversible and irreversible CA is illuminating for thinking about what makes Life computationally rich. Life’s irreversibility is part of what makes it interesting: the many-to-one structure means that information is compressed, organized, and structured as the simulation runs, which is precisely what creates the persistent low-entropy structures (still lifes, oscillators) that make Life visually and mathematically fascinating. Reversible CA cannot create these structures — they are thermodynamically conservative and therefore entropically conservative. The complexity of Life is paid for, in a deep thermodynamic sense, by the information it erases.

Entropy in Life: A Summary

The information-theoretic picture of Life can be summarized in a few statements:

Initial state (random soup): Maximum entropy (≈1 bit per cell). No correlations. Maximum uncertainty.

Early evolution (first 100 generations): Rapid entropy decrease. Correlations develop. Most cells die or stabilize into recognizable structures.

Methuselah phase (generations 100–1,000 for typical starting configurations): Slow entropy decrease. Complex patterns simplify. Long-lived structures resolve.

Final state (stable configuration): Low entropy. Still lifes are perfectly predictable. Oscillators are locally periodic. Significant structure has been imposed on the initial randomness.

The entropy that was present in the initial random soup has been partially preserved (in the structure of the final configuration) and partially destroyed (erased by the logically irreversible Life rules). The destroyed entropy, by Landauer’s principle, became heat. The preserved entropy became structure — the spatial correlations that make still lifes look like something rather than nothing.

This is, in miniature, a model of physical processes. Stars form from random distributions of gas by a combination of gravity and thermodynamics. Snowflakes form from supercooled water by crystallization. Life — the real phenomenon — maintains its organization against the entropy increase demanded by the second law by dissipating heat constantly. Conway’s Life, in a physical implementation, does the same. The rules are different; the thermodynamic structure is the same.

The Physical Realization Question

A final question, which Landauer’s principle makes precise: how much energy would it take to run Life on a physical substrate at the theoretical minimum?

For a 1,000 × 1,000 grid, running for 1,000 generations, with an average cell death rate of, say, 30% per generation (rough estimate for typical Life dynamics), the number of bits erased is roughly 1,000 × 1,000 × 1,000 × 0.3 = 3 × 10⁸ bits. At k_B T ln 2 per bit (room temperature), the minimum thermodynamic energy cost is roughly 8.7 × 10⁻¹³ joules — less than one picojoule. Current transistors dissipate roughly 10⁻¹⁸ to 10⁻¹⁶ joules per operation, so a modern silicon implementation of Life already operates within a few orders of magnitude of the Landauer limit.

The gap between current computing energy costs and the Landauer limit is one of the major research themes in energy-efficient computing. Life, because it is a concrete and well-defined computation, is a useful benchmark for this research: as physical implementations approach the Landauer limit, a Life simulator becomes a progressively more efficient and more interesting machine. At the limit, it would be thermodynamically reversible — but then it could no longer be Life.