I have been getting to grips with the tidy LSPM package from the folks at FOSS trading. One of the issues I needed to understand better was the optimization methods used by LSPM – I was finding that the optimizations were taking huuuuge amounts of time and were often unstable (i.e. re-running them gave significantly different results). So, while, I originally planned on posting about LSPM itself with some suggestions as to how best to use it, I need to lead in with a post about the differential evolution optimization package used by LSPM.

## Introduction

What follows is a basic introduction to the Differential Evolution Algorithm, originally designed by Reiner Storn and Kenneth Price. I cannot find a web presence for Mr Price, so a link to the book he wrote on the subject will have to suffice for the time being!

## How Does It Work?

As always, my favorite approach is to consider the simplest example necessary to illustrate how something works. In this case that would be a 2-D decision space. All evolutionary algorithms follow the same basic pattern:

Initialize population with size, NP Evaluate each member of initial population wrt objective DO UNTIL (termination criteria are met) Create mutant population Cross mutants with existing population Evaluate progeny wrt objective Create new generation, replacing weak with strong Report results

In the previously explored Harmony Search Algorithm the population is updated one at a time. That is, we create a single new candidate from the existing population and see if it “knocks out” the weakest current member. However, Differential Evolution creates one new candidate (or “Trial Vector”) for **each** existing member (“Candidate Vector”) of the population, so there are NP potential new solutions. As we shall see, DE pits the trial against the candidate from which it was derived and keeps the “fitter” vector. If you think about it, both methods ensure that strong new candidates do not “wipe out” all of the weaker members of the existing populations – even weak population members have value in their contribution to bio-diversity!

Let’s take a look at a 2-D example:

### Mutation

The diagram on the left shows the “mutation” step. x1 and x2 are the axes of the decision space – i.e. the place we are going to look for the optimum solution to some objective function. So, __Xi__ = (xi1, xi2), called a “candidate vector” represents a solution that currently exists in the population, there are NP such solutions in each generation. I have left __Xi__ off the left hand diagram for clarity. __Xr0__, __Xr1__ and __Xr2__ are randomly selected **different** members of the population. __d__ is the vector connecting __Xr1__ and __Xr2__. __d__ is multiplied by the scale factor, F, and added to __Xr0__. The result is called the “mutant vector”, __V__. __Xr0__ is referred to as the “parent vector”. F usually takes a value between 0 and 2.

### Crossover

The diagram on the right illustrates the “crossover” step in which the mutant vector and the candidate vector are “crossed” to see if a fitter “child” results. The new candidate is called the “trial vector”. The algorithm sets the values of the elements of the trial vector by randomly choosing the corresponding element from the mutant or the candidate. The probability of selecting the element from the mutant is equal to CR, the “crossover rate”. Clearly there is a finite probability that none of the mutant vector’s elements are selected: (1 – CR)^2 in the 2-D case. To avoid this, the algorithm randomly chooses from {1, 2} with uniform probability and that element is taken from the mutant vector. This means that the trial vector never equals the candidate vector. The trial vector equals the mutant vector with probability CR (because at least 1 element will change with probability 1). The two other possibilities for the trial vector (moving on the x1 axis OR moving on the x2 axis) occur with probabilities 0.5.(1 – CR).

### Procedure

In our simple example, we would repeat the steps above for all NP members of the current generation. Once complete we evaluate the objective function for each of the trial vectors and compare each outcome to the outcome for the candidate vector (which we would have kept “on file” from a previous iteration). If the trial objective function value is better than that produced by the candidate, then the trial vector replaces the candidate. In this way a new generation of NP candidates is ready for the next iteration.

We stop when we have reached a pre-set maximum number of iterations, when the objective function has reached a pre-set target, or the objective function has not improved for a certain number of iterations or by at least a certain amount.

## Discussion

We can deduce a number of things from this simple example:

- Larger values of NP lead to more thorough exploration of the decision space and greater probability of finding the global minimum.
- The minimum population size is 4 (3 to create the mutant vector plus the candidate). i.e. NP > 3
- The lower the value of F, the smaller the perturbations on the parent vector. This has the risk of getting stuck at a local minimum. Larger values of F can lead to “thrashing around” the decision space
- Low values of CR lead to trial vectors moving away from the candidate parallel to the axes. This is good if the change in the objective function as a result of a change in x1 is not affected by the value of x2. In trading system design there is often strong interaction between the parameters, so higher values of CR work better.
- As the population homes in on the objective function minimum, the vectors get closer together, thus |
__d__| decreases and the granularity of the search improves. i.e. the algorithm is adaptive to the state of the search!

I want to draw attention to a couple of additional differences in DE vs Harmony Search (How does it work?). DE draws its “genetic material” for each trial vector from only a subset of the population (4 vectors), whereas HS draws at random from the entire population when building a trial vector. I think this means the mutant vectors will be closer in the decision space to the existing generation. In addition, HS does the crossover first, then mutates the result to get a trial vector – I don’t see he significance of this right now. Finally, the degree of mutation in HS does not change over the generations.

There are all kinds of variations on this general DE theme. They likely are suited more to one problem than another. Some variations include adding in some scaling of the distance between the parent and candidate vectors when creating the mutant vector. Also, one could weight the probabilities depending upon the fitness of the various vectors, etc. Rather than comparing a trial vector to a specific candidate vector, one can simply rank the current and proposed generation and take the top half. My sneaking suspicion is that additional complication leads to increasing specialization; the more decisions that have to be made, the less stochastic the process becomes and one loses the very thing that makes these algorithms effective.

Since much of the testing in system trading is stochastic in nature as a result of employing Monte Carlo simulations, we will find ourselves optimizing stochastic objective functions. Therefore, we should always re-run the objective function against an apparently optimal solution vector to ensure that we did not just happen upon an outlier that falls back into the pack upon re-examination. An alternative is to take an average of all the solutions when the termination condition is reached. Fortunately, these variations are available in the R package DEoptim.

To get a feel for DE in action, visit Reiner Storn’s web page which contains a Java applet demonstrating the search for a 4 or an 8 term polynomial that threads a U-shaped passageway. You can change the settings for the algorithm and the passageway and watch the algorithm go to work.

I will try to post on the relationship between LSPM and DEoptim shortly to explain how I think you can get the best out of these two brilliant packages.

EDIT: couple typos, plus expanded on the comparison of DE to Harmony Search.

## Share

If you found this post informative, please share it!