Chapter 4
Design

In this chapter developer-oriented design is applied to the SIMBIOSYS framework classes based on the analysis in the previous chapter. Justifications are provided for using an object-oriented framework. Finally, the objects comprising the framework are described along with their respective relationships and interactions.

Developer-oriented toolkit design

Software developers use toolkits to build better programs faster. A well-designed library not only decreases time and effort spent programming, it also increases the quality of the resulting program by using well-constructed and principled modules. The toolkit usually provides a set of library subroutines, or increasingly, object oriented class libraries [Booch 90]. The libraries provide general components which can be extended by the applications developer or specialized components which can be used by the developer as-is.

The success of a toolkit depends on how well it meets the varied needs of applications developers. In the design of end-user applications the practice of user-centered design [Norman 86] has proven to be reasonably successful at determining the requirements for new systems. This methodology consists of observing users of the system, designing and prototyping, eliciting feedback and repeating the process. Roseman argues that the an analogous methodology can be applied to toolkit design. The SIMBIOSYS class library was designed using the variation of user-centered design methodology described in [Roseman 93]. The seven steps comprising Roseman's approach are outlined below.

1. Specify toolkit domain

2. Identify developers

3. Identify use of toolkit

4. Consider target applications

5. Design for proper use

6. Apply design affordances

7. Iterate design

Specify toolkit domain. The first step in toolkit design is to identify the domain of applications it is being designed for. It is an important and difficult task to generalize the features enough to be useful in a broad range of applications within the domain, and at the same time specialize the features enough to meet the specific needs of the application developers. The application domain considered in this thesis is that of ethological simulation applications with an emphasis on evolving behavior. Non-biological applications of evolutionary programming such as using genetic algorithms to optimize the values of a set of parameters in an equation are explicitly excluded. Although it would have been useful to include functionality to support the simulation of biological morphogenesis, it must be excluded to reduce the scope of this project to something feasible for a Masters thesis.

Identify developers. The next step in user-centered toolkit design is to identify the target developers. This consideration will directly affect all levels of design decisions: the platform(s) the toolkit will support, language of implementation, structure and architecture of the modules in the toolkit and the selection of features that will be provided. Since most ethological simulations are developed in research labs, it is assumed that the application developers will be relatively sophisticated. The toolkit will be delivered on a MS Windows platform because it is the one most accessible to biology labs (as opposed to UNIX platforms in computer science labs). The language of choice for MS Windows development is C++, specifically MS Visual C++ with Microsoft Foundation Classes (MFC). The toolkit will be built as an extension to the MFC to take advantage of the facilities already implemented.

Identify use of toolkit. A toolkit may provide objects usable as-is for building end-user applications. However the developer must still do some work to assemble the components into a program -- implementation by composition. The danger with this closed module approach is that it is very difficult, if not impossible, to design a priori a set of objects that can be used to construct a broad range of applications without modification, especially for a discipline as young as evolutionary simulation.

Object-oriented technology provides a possible solution called the open module approach [Meyer 88]. A module is said to be open if it is available for extension, for example by adding new data fields or by overriding a function it performs with a different implementation. In object-oriented terminology a module combining data with functions that operate on that data is called a class. One class can be derived from another. This means it inherits all the properties of the other including data and functions. A derived class (also called a subclass) inherits from a base class (also called a superclass) [Ellis 90]. A base class is available for extension in Meyer's terms because a derived class can add new data fields or override functions provided by the base class.

Concrete classes can be instantiated by a program, that is, a program can create objects (also called instances) using the class as a type. Abstract base classes cannot be instantiated, they can only be used by other classes for inheritance.

The approach taken with this project is a combination of the two types of classes: the major components of the system will be abstract base classes, designed for extension. In addition, classes derived from the base classes are usable in new applications without modification. The derived classes will also illustrate how the base classes are to be used.

Consider target applications. The task of identifying which components to include in the framework was accomplished by informal toolkit induction, by analogy with category induction. By analyzing existing applications in the same domain, it is possible to extract common components that are amenable to reification. This was accomplished in the previous chapter, however the scope of the framework goes beyond merely recreating what has been done already. During the analysis an effort was made to consider possible future applications as well.

Design for proper use. The previous steps all work towards identifying the set of components and their respective features to include in the toolkit. Beyond the set of components, a framework must also define the relationships between components and the ways in which the components interact dynamically. A toolkit should embody a philosophy of how applications should be developed using the toolkit, and should encourage developers to build programs properly [Roseman 93].

Apply design affordances. A design affordance is a property of a toolkit object that suggests how it can be used. For example, the raised appearance of a pushbutton control in a graphical user interface suggests "pushing" to the user. The same principle can be applied to framework components which suggest appropriate uses of the toolkit to application developers.

The SIMBIOSYS framework applies design affordances in the architecture of the components. The classes were designed so that it is easy for the developer to combine the instances in a certain way. This is done by defining methods to build a memory-resident model of the simulation which consists of instances of the various framework classes and subclasses. For example, the World class implements a method to add Things to the World. Since the method specifies type Thing, only instances of Thing or subclasses of Thing can be added to the world using that method, so it is easy and natural for the programmer to do so.

Iterate design. It is an unfortunate fact that for any sufficiently complex design it is unlikely the designer will get it right the first time. There are too many uncertainties involved in the assumptions made and a perfectly rational design decision may be obviated in light of new information. The developer-oriented approach to this problem is to apply a sort of supervised genetic algorithm to the toolkit itself: apply a preliminary design to standard applications, identify problems, redesign and iterate.

The original incarnation of the SIMBIOSYS framework was a program written for DOS on an IBM XT based on [Nolfi 90]. The classes were ported to a UNIX/Motif platform and generalized for two graduate course term projects. Recently the framework has moved back to a PC platform, albeit in Microsoft Windows this time, and it is still undergoing incremental revisions. It is a goal of this thesis to develop the toolkit to a sufficient point where an initial release can be given to potential developers. It is important to note that this is merely the first iteration of developer-oriented design.

Classes and frameworks

Class libraries provide a practical means to fulfill the simulation system criteria outlined at the end of Chapter 4. Modularity is the key to the aims of reusability and extensibility [Meyer 88; Stroustrup 86]. Modular design facilitates the achievement of the following design goals:

decomposability - breaking a problem down into several subproblems,

composability - constructing a system from several subsystems,

understandibility - minimizing the number of related modules needed to understand a particular module,

continuity - minimizing the number of modules that need be changed to make a small change in a problem specification

protection - minimizing the number of modules affected by a run-time exception.

A framework is a significant collection of collaborating classes that capture both the small-scale patterns and major mechanisms that, in turn, implement the common requirements and design in a specific application domain.

The subsystems identified in the analysis of the existing evolutionary simulations make good candidates for abstract base classes in the SIMBIOSYS class framework. An abstract base class cannot be instantiated in an application. The application programmer must derive a subclass from the abstract base class and provide one or more methods the base class defined but did not implement. The purpose of the abstract base class is twofold. First, it defines an interface: the set of routines other objects will use to communicate with subclasses of the base class. Second, it provides some of the functionality needed by the subclasses. Taken as a group, the collection of abstract base classes in the SIMBIOSYS framework define an architecture for the simulations.

The Environment

An abstract base class representing the environment is required. It is responsible for the "physics" of the world: space and time. It is also responsible for keeping track of all objects in the world and controlling which object has the thread of control. Objects contained in the world will be able to query the world about what other objects are in their local environment. Subclasses derived from the generic World class will implement specifics. Possible subclasses include a two-dimension discrete world and a two-dimensional continuous world.

Denizens of the World

The library requires several classes to represent objects which inhabit the environment. All classes will be derived from an abstract base class which will define the protocols for agent/environment and agent/agent interaction implemented with a form of dynamic binding. This top-level class will also store information common to all objects in the world including position, orientation, and stored energy. Though subclasses of Thing will have read access to this information, only the world object will be able to change them.

Possible subclasses include obstacles which prevent passage, food which stores energy, and agents which are described in the next section. All subclasses will have to implement a method to draw themselves in a window for animation/visualization purposes.

Agents

Agents are the really interesting part of the simulation: the objects in the world which display some kind of active, autonomous behavior. Whereas all other objects in the world are passive, agents will have the ability to perceive their local environment and act. Every agent will store an instance of a Program class which drives its behavior.

The definition of the Agent class is necessarily abstract, only defining an interface for the world object to mediate communication and interaction between agents and their environment. Subclasses of Agent will be responsible for creating an instance of Program, and for using the program instance to determine its intentions.

Programs

When the agent is given the thread of control by the world instance, it will query the world for some relevant information from its local environment (e.g. what objects are in the area directly in front of me?), and encode this information into an input vector which is passed to the program instance. The program then uses this input to generate a corresponding output vector which the agent translates into an intention that is passed on to the world.

The abstract base class Program will define the protocol used to communicate between the agent and its program. In this respect, the program is acting as the agent's brain. Subclasses of Program will implement specific types of programs, e.g. a finite-state machine, a Turing machine, a feed-forward neural network, a recurrent neural network, etc.

Genetics

The library will contain several classes which together implement evolutionary mechanisms. An abstract base class Genotype will define several operations common to all types of genotypes including mutation, crossover operators and methods to query values at specified positions. A haploid subclass will store a single bitstring while a diploid subclass will store two bitstrings. The latter will require a different crossover operator.

A Phenotype subclass of Agent will store an instance of genotype which it will use to build itself before it is added to the world. The obvious application of this would be for a phenotype to use its genotype to specify its own program. In this way, the behavior of the population will evolve over time, adapting to its environment to increase its fitness, however this is defined.

Instruments

This category of framework classes implements the user-interface of the simulation. Since standard windows platforms already have well-established class frameworks for interface programming, these classes will have to be designed to complement the native window classes rather than replace them. As such, they act more like an interface to the windows classes that the rest of the framework can use for purposes of graphical display and user-interaction.

The main instrument subclass and primary means for the user to visualize the simulation will be the Map class. It will be responsible for displaying an animation of the world instance with all the objects in the world. Another instrument will be the line graph which will display the history of one or more relevant system variables such as population size or average fitness.

The Simulation

Finally, a class is needed to represent the simulation itself. Its responsibilities include keeping track of all other objects in the application either directly or indirectly, mediating the thread of control, and starting, pausing, stopping, saving and restoring the simulation. Functions and objects within the application but outside the simulation framework (such as the user interface components, e.g. a pushbutton callback function) will be able to manipulate the simulation through this object.

Static Architecture

Figure 18. Simulation Model Entity Relationship Diagram

This section will describe the relationships the instances of the framework classes will have with one another. An entity relationship diagram illustrating the simulation model is shown in Figure 18. At the top level the application is represented by a single simulation object. The simulation contains a world, one or more populations and any number of instruments. Each population consists of a collection of phenotypes, each of which in turn stores a program (because phenotypes are agents) and a genotype. The world object has a collection of things including the phenotypes of each population.

Dynamic Interaction

This section describes the built-in behavior of the system once the static structure is in place. An important consideration here is that the framework is being designed for a windows-based platform which is intrinsically event-driven. This means that the framework classes will not be able to keep the thread of control for the whole application; the thread must revert back to the base windows system reasonably often to allow for display output (redrawing windows and controls) and user interaction (menu selection).

Simulation Cycles

A simulation run consists of the execution of a hierarchy of cycles. At the top level, typically hundreds or thousands of breeding cycles are executed. Each breeding cycle consists of one or more environmental cycles which in turn are comprised of a number of action cycles (terminology borrowed from [MacLennan 93]). During an action cycle, each agent determines an intention by perceiving its local environment, sending its perception to its internal program, and translating the output of the program into a specific intention. The intentions are collected by the world object which resolves them into actions which affect the next state of the world. This interaction is displayed schematically in Figure 19.

Figure 19. Action Cycle Interaction

After a number of action cycles, an environmental cycle is executed. At this point statistics may be collected and calculated, fitnesses of phenotypes are calculated and processes may change the world in a significant way, independent of the actions of the agents contained within the world. After a number of environmental cycles a breeding cycle takes place. For each population in the world, the phenotypes are sorted based on their fitness. Parents are chosen and selected members of the population are replaced with new phenotypes derived from the parents.