
Open-Ended Discovery · Neural Cellular Automata · Emergent Dynamics
1FLAIR, University of Oxford · 2ELLIS Institute Tübingen · 3University of Freiburg · 4Prior Labs
PBT-NCA is a meta-optimization framework for evolving Petri Dish Neural Cellular Automata under novelty-driven competitive pressure. Instead of collapsing into static order or noise, the system continually discovers new lifelike behaviors ranging from coordinated motion and scattering to colonization and symbiotic partitioning. By rewarding diversity within and across worlds, PBT-NCA sustains open-ended dynamics at the edge of chaos.
Without any handcrafted targets, PBT-NCA spontaneously evolves a diverse ecosystem of structures like shooters, gliders, amoebas, colonies, and spaceships. All arise from pure multi-agent competition.
Fluid, shape-shifting macro-structures that migrate across the substrate with coordinated. Differentiated behavior mirroring primordial multicellular organisms.
Directed projectile ejected from stable territorial clusters, similar to glider-gun structures in classical CA literature.
Persistent traveling waves that self-replicate across the grid without a fixed template, a mark of open-ended CA systems.
A macroscopic entity (red) emits a small cluster of cells across the substrate to perform spatial colonization.
Decentralized foraging behavior where agents shape the environment initially occupied by other passive agents.
Local interactions producing highly structured, periodically replicating entities with internal substructure pointing to the computational ubiquity of cellular automata.
Small, persistent clusters occupying territory in real time, resembling the terraforming of an archipelago.
This animated figure recreates the composite novelty score plot from the paper: the smoothed population novelty score grows across meta-iterations while representative emergent worlds appear at the moments they enter the evolutionary record.
A meta-optimization loop that transforms standard population-based training into an open-ended regime discovery engine by replacing stationary fitness with novelty-driven selection pressure operating at two timescales.
Each of the P = 30 worlds is rolled out for Tw inner steps. Agents update via gradient-based learning while competing on the shared grid. Trajectories are scored by the dual-novelty fitness.
Top-m behavioral descriptors and species occupancy statistics (μ, σ, δ, entropy, alive-mass change) are appended to a bounded FIFO archive. Archive novelty is computed as k-NN distance (k = 8) in descriptor space.
Each frame is encoded by a frozen DINOv2 encoder. Per-world diversity is the median cosine distance to all other worlds at the same timestep, averaged over the rollout to reward novel morphology beyond what handcrafted descriptors capture.
Every K meta-iterations, the lowest-fitness worlds are replaced by Lamarckian copies of elite parents: weights, optimizer state, and ecological context are inherited, then crossover, mutation, and Gaussian weight perturbation are applied.








Clone weights, optimizer state, and context.
Combine strong lineages during exploit.
Alter hyper parameters and world-level choices.
Apply Gaussian noise to diversify behavior.
Spawn new worlds to replace the discarded ones.





If you find this work useful, please cite the paper: