Chapter 1
Context

The study of living organisms has traditionally been undertaken on two fronts, experimental biology and theoretical biology. With the advent of computers a third category of inquiry has been made possible: simulation. In some respects, simulation is a functional mixture of the two traditional methods; with the mathematical rigor of theoretical biology and the possibility of discovery more reminiscent of experimentation.

In other respects computer simulation complements conventional biology. Biologists traditionally use the reductionist approach to natural systems, breaking them down into successively smaller and smaller parts. By analyzing subsystems, biologists deduce basic mechanisms and make inferences about general principles underlying the behavior of life on Earth. In contrast, simulation allows a researcher to reverse the trend and use a synthetic approach, building life-like systems from the bottom-up rather than top-down [Langton 89].

Thesis goal

This thesis describes the design and implementation of a C++ class library[1] called SIMBIOSYS. The goal of developing this system is to facilitate the construction of simulations that capture the complex dynamic and adaptive behaviors exhibited by natural living systems. The reason for creating these simulations is two-fold:

computer simulation is a valid approach to performing research in the biological sciences which complements experimental and theoretical biology: simulation as science,
biological metaphors are being used to solve problems in computer science (e.g. optimization using genetic algorithms and sub-symbolic processing using neural networks): simulation as engineering.

Simulation as science

The study of life is complicated by the difficulties involved in data acquisition and the complex nature of the data itself. Evolutionary biologists are especially disadvantaged by the time scales inherent in their research. Computer simulations offer biologists an instrument for performing experiments that would be impossible to conduct in the field or the lab. Biologist Charles Taylor puts it succinctly:

Though still nascent, the field of artificial life has begun to provide the means for better understanding emergent properties. [...] Reductionist science, the method which characterizes most serious academic research today, is largely analytic, breaking things down, and so misses these emergent properties. Reductionist science has been enormously successful in many respects, but many features are being ignored. This occurs not because those features are uninteresting or unimportant. To the contrary, one would love to study them, but the appropriate tools and effective methods for researching them have been lacking. [Taylor 92]

Heinz Pagels writes in his Dreams of Reason:

Possession of a program with unique analytic capabilities puts a scientist in as much of a privileged position to make new discoveries as the possession of a powerful telescope. These cognitive instruments are in the realm of concepts and information; nonetheless they are used by scientists as tools, extensions of our ordinary human analytic capability. [Pagels 89]

At the time of this writing there are no evolutionary simulation libraries in the public domain; clearly there is a need for better simulation tools. Though a system spanning all of biology from biochemistry to macroevolution would be ideal, the scope of SIMBIOSYS has been restricted to the latter end of the spectrum, from ethology to macroevolution, to make the project tractable. The class library lends itself well to problems in evolutionary biology, particularly those dealing with evolutionarily stable strategies, i. e., strategies that do well in populations dominated by the same strategy [Maynard Smith 74]. Examples of these types of problems include the following:

Sexual selection and female choice: the paradox of the evolution of female preferences for otherwise maladaptive traits in males is an old problem [Darwin 71]. The classic example from nature is the peacock. Male peacocks grow enormous, brightly coloured tail feathers which makes them easily spotted by predators and hampers movement. This liability is offset by females preferring to mate with males having long tails.

Sexual vs. asexual reproduction: the origin, maintenance, and prevalence of sexual reproduction is paradoxical because sexual reproduction costs more resources than asexual reproduction. Simulations can be used to explore hypotheses for why natural selection has not removed species that rely on the more costly reproductive method. One such theory suggests that host-parasite coevolution can result in selection for higher recombination rates in the host species [Hamilton 90].

Haploidy vs. diploidy: ploidy refers to the number of sets of chromosomes contained in a cell, haploid for one set and diploid for two. Most sexually reproducing species are diploid and it is unclear what advantages are conferred by having a second set of chromosomes.

Sex ratios in haplodiploid species: some species of social insects (bees, ants, termites) are referred to as haplodiploid because all males are haploid and all females are diploid. Predicting an evolutionarily stable sex ratio is non-trivial because there is a conflict between the queen and workers.

The extended phenotype: the theory of the extended phenotype [Dawkins 83] challenges the central theorem of modern ethology, that organisms behave in such a way as to maximize their inclusive fitness. Dawkins argues that there is no reason that the phenotype (manifestation, or influence) of the genes must stop at the organism's body. Perhaps an organism's behavior benefits a gene located in another organism's cells. Simulations modeling the effects of the extended phenotype would be of great interest.

Simulation as engineering

Computer simulations offer natural scientists important new insights into the processes they are studying. In turn, the study of nature offers powerful new metaphors to the fields of computer science and software engineering. Two major themes from biology are currently being exploited to build useful systems: emergent computation and selective systems. They are especially relevant to the field of adaptive systems, parallel processing, and cognitive and biological modeling [Forrest 91].

Emergent computation is characterized by the following constituents:

1. a collection of agents, each following explicit instructions,

2. interactions among the agents which form implicit global phenomena at the macroscopic level, i.e. epiphenomena,

3. a natural interpretation of the epiphenomena as computations.

In nature, the evolutionary process occurs whenever the following four conditions are satisfied:

1. an entity has the ability to reproduce itself,

2. there is a population of such entities,

3. there is some variety among the entities, and

4. some difference in ability to reproduce in the environment is associated with the variety.

Both emergent computation and selective systems can be classified on a dimension representing the abstraction level of the system, from more abstract at one end of the spectrum, to more biologically plausible at the other end as shown in Table 1. Each type is discussed briefly in the following sections.

	emergent behavior	selective systems
more abstract	cellular automata	genetic algorithms
hybrid	neural networks	genetic programming
more biological	neuronal networks	evolutionary programming

Table 1. Biological computation framework

Genetic algorithms

A genetic algorithm (GA) is a general search technique inspired by Darwinian natural selection. It simulates evolution in that the dual processes of variation and selection are used to explore an abstract search space, but in most genetic algorithms there is little emphasis on modeling natural evolution per se [Collins 92b].

A genetic algorithm works by generating an initial population of random bit strings. The bit strings are evaluated by some arbitrary external fitness function, and the ones that perform relatively better become more prevalent in the population, replacing ones that do not perform as well. Variation is introduced with genetic operators which mimic those found in nature. A more detailed description of genetic algorithms can be found on p.12.

Genetic programming

Genetic programming is a specialization of genetic algorithms in which the bit strings are interpreted as computer programs. A wide variety of seemingly different problems from many different fields can be reformulated in terms of program induction, that is, searching for an algorithm that will compute some desired output when presented with particular inputs [Koza 92]. Examples include the following:

Optimal control involves finding a strategy that uses the current state variables of a system to choose values for one or more control variables in order to move the system toward some desired state while simultaneously minimizing some cost function.

Planning in artificial intelligence and robotics requires finding a sequence of actions that uses information from sensors about the state of a system to select effector actions which change that state.

Game playing involves finding a strategy that specifies what move a player is to make at each point in a game given known information about the game.

Empirical discovery requires finding a model that relates a given finite sampling of values of the independent variables and the associated values of the dependent variables for some observed system in the real world.

Induction of decision trees is one approach to the problem of classifying an object in a universe into a particular class on the basis of its attributes. A decision tree corresponds to a program consisting of functions that test the attributes of the object.

Automatic programming of cellular automata requires induction of the set of state-transition rules which will generate some desired global behavior.

Evolution of emergent behavior involves finding a set of seemingly simple rules which, when applied repetitively, lead to complex overall behavior. One example is the problem of finding a set of rules governing the behavior of an ant such that a colony of ants operating in parallel will work together to find all available food and transport it back to a nest.

Each of these problems can be attacked by applying a genetic algorithm to a representation of a computer program. The GA operates on the programs as passive data, generating a population of random programs to start, then introducing variation by applying (possibly specialized) genetic operators. The fitness function is evaluated by running each program in the population, and comparing its output or behavior against the desired result.

Evolutionary programming

Evolutionary programming is a specialization of genetic programming with special emphasis on biological realism [Collins 92b]. The bit string is interpreted as a program that, when decoded and executed, creates an agent which interacts in an environment. The fitness function is an implicit property of the environment which often includes other evolving agents. In contrast to most abstract genetic algorithms, the selection and mating process of the artificial evolution GA includes spatial structure; a given agent has a higher probability of mating with another in its proximity.

Cellular automata

Cellular automata (CA) represent emergent behavior in its most abstract form. CA are dynamical systems in which space and time are (usually) discrete [Culik 90, but see MacLennan 90]. The states of cells in a regular n-dimensional lattice are updated synchronously according to a set of deterministic local interaction rules. Each cell obeys the same set of rules and has a finite number of possible states.

CA are well-suited for modeling natural systems that can be described as massive collections of relatively simple objects interacting locally with one another. Since von Neumann first attempted to model a self-reproducing system with CA, they have found diverse applications in modeling complex systems in physics, chemistry and biology.

Neural networks

A great deal of attention has been given to modeling cognitive processes with emergent computation, often in the form of an artificial neural network. Artificial neural networks, like cellular automata, exploit the emergent computation of a collection of simple processors called formal neurons or simply units.

The units are generally simplified abstractions of biological neurons connected in a directed graph. Each unit has a numerical activation value which it computes from the values of neighboring elements using a simple formula. The connections carry a numerical strength or weight. In an obvious neural allusion, positive weights are called excitatory and negative weights are called inhibitory [Smolensky 88]. A more detailed description of neural networks can be found on p.17.

Neuronal networks

Neuronal networks are distinguished from neural networks by a more biologically plausible neuron model. Realistic simulations are based upon mathematical models of the biophysical processes underlying the operation of nerve cells taking into account such attributes as resting potential, passive and active membrane properties, and complex synaptic and dendritic interactions [Beer 90].

Artificial life and artificial intelligence

The field of artificial life brings together the themes of emergent computation and selective systems into a study of human-made systems that exhibit behaviors characteristic of natural living systems. The fields of artificial intelligence and artificial life have much in common. Both reconsider conventional notions by removing them from a natural setting and placing them in a computational perspective.

It has been suggested that artificial life may represent a foundation for artificial intelligence in that the basic repertoire of abilities an organism needs to stay alive can be used as building blocks for higher level intelligence [Belew 91]. Ecologists have long recognized that the complexity of an organism's behavior is related to the complexity of the environment it must "solve". For instance, if food is a plentiful commodity, a random walk may be a sufficient foraging strategy. As food becomes more scarce, some form of perception and locomotion control is necessary for survival. If multiple food types are required, then resource-specific strategies and priority setting (planning) become necessary.

Artificial Life may be used to avoid some of the criticisms leveled against artificial neural networks which have been criticized from a philosophical perspective for being ungrounded and therefore meaningless [Dreyfus 88]. Computational neuroethology has been proposed as a method of eliminating the ad hoc semantics of contemporary connectionist models [Cliff 90]. The idea is to embed the connectionist model in a sensori-motor system. Future models should be situated in closed-environment simulation systems; output of the simulated nervous system is expressed as observable behavior.

Structure of the thesis

Chapter 2, "Survey", begins with more detailed descriptions of genetic algorithms and artificial neural networks. This provides background information for readers unfamiliar with those particular areas. Then an overview of many current evolutionary simulations systems is presented. Emphasis is placed on the aspects relevant to designing a general evolutionary simulation toolkit.

Chapter 3, "Analysis", develops a framework for comparing the systems surveyed in Chapter 2. Common elements are factored out into programmatic abstractions used as the basis for designing the library classes.

Chapter 4, "Design", details the process of applying a user-centered toolkit design methodology to the problem of a evolutionary simulation class library. The principles behind object-oriented programming and class libraries are also discussed.

Chapter 5, "Implementation", describes the sets of features provided by the SIMBIOSYS classes. Following the principles developed in Chapter 4, a set of classes is created to fit into a simulation architecture. This includes flexible abstract base classes and specialized derived classes for the following:

simulations of an environment,
populations of autonomous agents,
programs that drive the behavior of agents,
a genetic mechanism for simulated evolution,
instruments and controls for the user interface.

Chapter 6, "Evaluation", tests the class library by applying it to three diverse problems. The first is an implementation of Conway's Game of Life which demonstrates some of the basic facilities provided by the class framework. The second reproduces the Genesys/Tracker simulation [Jefferson 92; Collins 92b]. Two types of ants, one driven by a finite state machine and one driven by an artificial neural network are evolved to follow a trail of food. The third simulation implements a variation on the prisoners' dilemma problem [Axelrod 87] which adds a spatial component.

Chapter 7, "Conclusions", summarizes the contributions of the thesis and presents several areas for future work.

Chapter 1Context

Thesis goal

Simulation as science

Simulation as engineering

Chapter 1
Context