The Efficient Set GA for Stock Portfolios

Jacqueline Shoaf and James A. Foster

Dept. of Computer Science, University of Idaho, Moscow, Idaho

email: ,

Abstract

The genetic algorithm (GA) for the efficient set portfolio problem based on the Markowitz model introduced by Shoaf and Foster[4] offers significant benefits over the quadratic programming approach. These benefits include simultaneous optimization of risk and return. The efficient set GA uses an indirect representation style in order to avoid unfeasible solutions and penalty functions. This representation is generally applicable to problems which seek an optimal partition for a given amount of some resource which includes both negative and positive allocations. Efficient set GA evolution scales well and is O(n log n) with a small constant for portfolios containing up to n=100 stocks. Using demes further improves the quality of solution and the run time for this GA.

1.   Introduction

The efficient set portfolio problem is to find the allocation of investments for given set of securities with minimum risk for any given rate of return. Markowitz’s [2] approach to solving this problem uses the covariance matrix derived from historical rates of return to predict the variance, or “risk factor” of any allocation of resources. The Markowitz model accomodates both long and short positions. A long position represents an allocation for purchase of securities, whereas a short position represents an allocation from the sale of borrowed securities.

In the Markowitz model, the weighted sum of the values in the rates of return covariance matrix represents the overall variance, srp2 , of a portfolio. Let n be the number of stocks in the portfolio, xi be the proportion of resources allocated for stock i (negative for short positions), E(r*p) be the given expected rate of return for the portfolio, and E(rj) be the expected rate of return for each security. The objective equation for the efficient set portfolio problem is:

min s(rp)2 =

with the following constraints:

1) E(r*p)=

2) 1.0 =

The quadratic programming approach to this problem described by Haugen [1, Appendix 3] requires the objective equation to be rewritten in Lagrangian form. For minimization, the partial derivative of each variable is taken and set to 0. This leaves a set of linear simultaneous equations which can be solved for the coefficients of allocation for the minimum variance portfolio. This portfolio will have the given expected rate of return , E(r*p), which is specified as a constraint. Note that E(r*p) is required as input, so the quadratic programming approach can only solve the efficient set problem for one portfolio rate of return. Also, the algorithm for solving a set of linear simultaneous equations has time complexity between O(n2) and O(n3), according to Smith [5].

2. The Efficient Set GA

The GA solution to the efficient set problem Shoaf and Foster [4] alters the problem slightly to solve for an efficient set portfolio over the entire range of potential expected portfolio returns. Each member of the GA population represents an allocation of resources for the portfolio. The user selects a desirable balance between risk and return using adjustable constants in the GA fitness function:

where a,g, d are set by the user, E(rp) represents the expected rate of return of the portfolio represented by the population member, and E(r*p) represents the user’s target expected rate of portfolio return.

Because the efficient set portfolio problem is an allocation problem, a direct representation of resource allocation by each population member in the GA will not work well. This type of representation will result in predominantly unfeasible solutions in every generation, where the allocations do not sum to 1.0.

Our representation has a single field of k+1 bits for each security. The first bit indicates whether the position on that stock will be long (one) or short (zero). The remaining k bits are an unsigned index onto an “allocation wheel”. Conceptually, this is a wheel representing the resources to be allocated. It is divided into 2k equal sections, each indexed by a k bit binary value. The distance between an index and the index of the next long position, plus any enclosed short position wedges, is the percentage of the total resource allocation for the security with that index. (See Figure 1.)

More precisely, suppose that i1,…,in are the indices for n securities, in non-decreasing order, mod 2k (so that 000, for example, follows 111). Now, let S be the set of indexes of securities with short positions. Let, L(j) be the next index of a long position on the allocation wheel. Now, let dj be:

with a=-1 and b=1 when jÎ S, and a=1 and b=0 otherwise.

Pictorially, dj is the length of the arc between the index ij and the next long security, with any subtended short position arcs added in. Figure 1 is an example of this representation for k=3 and n=5.

An important benefit of this representation is that , for any set of short positions S and any chromosome. That is, the total investment is always one hundred percent of the available resources. This makes it impossible to create an unfeasible solution. So, every member of the population in each generation can contribute to the next generation and no valuable schemata are discarded. This also avoids any unpredictable influence on evolution by penalty functions in the fitness function, since they are not necessary. In general this representation makes the GA efficient. This indirect style of representation should work for any optimization problem where allocation proportions may be either positive or negative and must sum to a given value.

Sum of Allocation s for Stocks 0-4 (in order):

.125 + -.25 + -.125 + .625 + .625 = 1.00

Figure 1. Allocation based on solution representation

One of the less obvious effects of this representation style, however, is the sensitivity of the efficient set GA to increases in mutation and crossover rates. A change in the index of one security can affect one to two other security allocations.

Experiments were conducted using a small set of 5 stocks [3] comparing the effective set GA to the quadratic programming technique and demonstrated the benefits of simultaneously optimizing for return and risk. These experiments demonstrated that the GA could find portfolio allocations with similar risk and higher rates of return than the risk-constrained quadratic programming solution.

The data for these experiments was derived from end-of-week closing data accumulated over an eleven month period beginning October 3, 1994. The covariance matrix for stock rates of return is shown in Table 1 and the averaged annualized rates of return are shown in Table 2.

Table 1. Stock covariance matrix

CYBE / ISLI / NBL / ORLY
CYBE
ISLI / 2.45
NBL / 1.36 / 2.76
ORLY / 0.07 / -0.10 / 0.99
RGIS / 0.55 / 0.44 / -0.02 / 0.68

Table 2. Average Rate of Return

CYBE / ISLI / NBL / ORLY / RGIS
0.589 / -1.573 / 1.219 / .159 / -0.094

The allocation proportions, obtained using the quadratic programming technique with a given expected return rate of .15, are shown in Table 3. This solution yields a minimum risk, srp2, of .405.

Table 3. Allocation by quadratic programming

CYBE / ISLI / NBL / ORLY / RGIS
-0.l0 / 0.15 / 0.30 / 0.55 / 0.10

Five randomly-seeded GA runs were conducted using the same data [3]. The length of each chromosome was 35 bits, with individual fields of 7 bits. Each run lasted 200 generations using a population size of 300.

The allocation proportions obtained using the GA are shown in Table 4. This solution is the best over all five runs. These allocations yield an expected rate of return, of .52 with an associated minimum risk, srp2 , of .384. In this case the efficient set GA was clearly able to find a better solution than quadratic programming.

Table 4. Allocation by GA

CYBE / ISLI / NBL / ORLY / RGIS
0.0 / 0.047 / .422 / .516 / 0.016

Figure 2.

The convergence graph of the averaged GA solution, which plots average generation fitness and best-of-generation fitness against generation, is shown in Figure 2. The plot demonstrates the expected exponential increase in fitness values.

3. Complexity of the Efficient Set GA

We also ran to determine the exact expected time complexity of the GA as a function of n, the number of securities in the portfolio. Since n is a variable that is local only to the objective function of the GA, this would confirm that the time complexity of the fitness function dominates the GA as n increases. An efficient algorithm in terms of n implies that the GA can be used for portfolio allocations involving large numbers of securities. Notice that the size of the chromosomes, and therefore the complexity of the GA, depends on both the number of securities and the number of slices on the allocation wheel, so it is not obvious a priori that the number of securities is the critical performance parameter.

Each chromosome contains n fields, representing an investment position (long or short) and the index used for allocation for each security. Thus the product of n and the field size determines the total length of each chromosome. However, the fitness function does a number of transformations based on the values in each field, in order to convert the chromosome into a portfolio allocation. These transformations affect the algorithmic time complexity of the GA. The indirect representation method for allocation determination described earlier requires that the fields of the chromosome be sorted, an operation of average expected complexity in O(n log n) using a quicksort (the O() worst case behavior rarely shows up in practice). We anticipated that sorting would dominate the time complexity of the fitness function and the GA.

The experiments consisted of 25 sets of GA runs, one set for each of 5 different values of n: 8, 16, 32, 64 and 100. Time was clocked on either side of the evolution step in the GA in order to bypass time required by chromosome initialization, which is assumed to be linear in n with a small constant. Between the sets, all other GA constants remained the same. However, since the absolute length of the chromosome is also affected by the change in n, a separate set of experiments was conducted to determine which type of change dominated time complexity of the GA.

A fixed population size of 100 chromosomes and runs lasting 100 generations were used. These and other basic GA parameters are noted in Table 5.

Table 5. Basic GA parameters for complexity experiments

GA Type / Simple
Crossover / 2-Pt
Selection / Roulette-Wheel
Field Size / 7
Population Size / 100
Crossover Rate / .6
Mutation Rate / .001
Generations / 100

The averaged experimental results are summarized on the chart in Figure 3. Let t be the time, in seconds, required to evolve this population 100 generations. The algorithmic complexity of the fitness function based on the data used here appears to very well fit the equation t=F(n)= c*n log2 n . In this case the constant c is between .1 and .2.

Figure 3

In order to determine whether the GA run time was dominated by changes in n or by changes in overall chromosome length, we ran an additional set of experiments in which n, the number of securities, remains constant but the chromosome length changes based on the field size

The number of bits in a field determines the minimum allocation proportion for any stock in the portfolio and the maximum number of securities in the portfolio with allocations greater than 0. Increasing the number of securities in the portfolio also makes it desirable to increase the field size to reduce the likelihood of index collision. So, in practice these two values should be correlated. But for our experiments, we used a constant number of fields (securities), n=32, and varied the size of each field from 3 to 7, effectively changing the chromosome length from 96 to 224 bits. With the exception of field size, all other parameters from Table 5 remained the same.

The averaged results from sets of 25 experiments in each configuration, summarized in the chart in Figure 4, shows that increasing the number of bits in each field has a more gradual effect on time complexity than increasing n, the number of fields. The graph shows constant linear growth in time with increasing chromosome length. For this population, t=a*c*(bits per field), with a=14.3 and c=0.49. The effect of increasing number of bits in the chromosome, without increasing n, may be to increase the time required for the mutation operation, which is dependent only on total number of bits.

Figure 4.

Therefore, the empirical data from these experiments confirms that overall algorithmic time complexity for this GA application is affected mainly by changes in the number of security in the portfolio, n, and that the order of expected time complexity for the fitness function and the GA is O(n log n) with small constants.

4. The Deme Modification