Several methods to incorporate semantic awareness in genetic programming have been proposed in the last few years. These methods cover fundamental parts of the evolutionary process: from the population initialization, through different ways of modifying or extending the existing genetic operators, to formal methods, until the definition of completely new genetic operators. The objectives are also distinct: from the maintenance of semantic diversity to the study of semantic locality; from the use of semantics for constructing solutions which obey certain constraints to the exploitation of the geometry of the semantic topological space aimed at defining easy-to-search fitness landscapes. All these approaches have shown, in different ways and amounts, that incorporating semantic awareness may help improving the power of genetic programming. This survey analyzes and discusses the state of the art in the field, organizing the existing methods into different categories. It restricts itself to studies where semantics is intended as the set of output values of a program on the training data, a definition that is common to a rather large set of recent contributions. It does not discuss methods for incorporating semantic information into grammar-based genetic programming or approaches based on formal methods. The objective is keeping the community updated on this interesting research track, hoping to motivate new and stimulating contributions.
Geometric semantic operators are new and promising genetic operators for genetic programming. They have the property of inducing a unimodal error surface for any supervised learning problem, i.e., any problem consisting in finding the match between a set of input data and known target values (like regression and classification). Thanks to an efficient implementation of these operators, it was possible to apply them to a set of real-life problems, obtaining very encouraging results. We have now made this implementation publicly available as open source software, and here we describe how to use it. We also reveal details of the implementation and perform an investigation of its efficiency in terms of running time and memory occupation, both theoretically and experimentally. The source code and documentation are available for download at http://gsgp.sourceforge.net.
We describe a broad class of problems, called “uncompromising problems,” which are characterized by the requirement that solutions must perform optimally on each of many test cases. Many of the problems that have long motivated genetic programming research, including the automation of many traditional programming tasks, are uncompromising. We describe and analyze the recently proposed “lexicase” parent selection algorithm and show that it can facilitate the solution of uncompromising problems by genetic programming. Unlike most traditional parent selection techniques, lexicase selection does not base selection on a fitness value that is aggregated over all test cases; rather, it considers test cases one at a time in random order. We present results comparing lexicase selection to more traditional parent selection methods, including standard tournament selection and implicit fitness sharing, on four uncompromising problems: 1) finding terms in finite algebras; 2) designing digital multipliers; 3) counting words in files; and 4) performing symbolic regression of the factorial function. We provide evidence that lexicase selection maintains higher levels of population diversity than other selection methods, which may partially explain its utility as a parent selection algorithm in the context of uncompromising problems.
Energy consumption has long been emphasized as an important policy issue in today's economies. In particular, the energy efficiency of residential buildings is considered a top priority of a country's energy policy. The paper proposes a genetic programming-based framework for estimating the energy performance of residential buildings. The objective is to build a model able to predict the heating load and the cooling load of residential buildings. An accurate prediction of these parameters facilitates a better control of energy consumption and, moreover, it helps choosing the energy supplier that better fits the energy needs, which is considered an important issue in the deregulated energy market. The proposed framework blends a recently developed version of genetic programming with a local search method and linear scaling. The resulting system enables us to build a model that produces an accurate estimation of both considered parameters. Extensive simulations on 768 diverse residential buildings confirm the suitability of the proposed method in predicting heating load and cooling load. In particular, the proposed method is more accurate than the existing state-of-the-art techniques.
Recent interest in the development and use of non-trivial benchmark problems for genetic programming research has highlighted the scarcity of general program synthesis (also called "traditional programming") benchmark problems. We present a suite of 29 general program synthesis benchmark problems systematically selected from sources of introductory computer science programming problems. This suite is suitable for experiments with any program synthesis system driven by input/output examples. We present results from illustrative experiments using our reference implementation of the problems in the PushGP genetic programming system. The results show that the problems in the suite vary in difficulty and can be useful for assessing the capabilities of a program synthesis system.