Notes: FACTSET Variable Code Selection and Alpha Tester Conventions

Notes: FACTSET Variable Code Selection and Alpha Tester Conventions

Duke Investment Analytics was the team from Global Asset Allocation and Stock Selection. Members: Claudio Aritomi, Sam Ding, Mak Pitke, Marcus Shaw, Brian Wachob

Fuqua Investment Analytics was the team from Quantitative Stock Selection. Members: Stefan Gertsch, Brian Wachob.

In-sample date range: 1/31/1987 – 11/31/2001

Out-of-sample date range: 12/31/2001 – 12/31/2004

(Note that using 31 as the last day of the month when specifying the date range is necessary—even when there is no 31st day of the specified month. If not used in this way, lagged variables may not work properly in Alpha Tester.)

(Also note that FactSet’s convention is to associate the returns from the following month with the data- and portfolio selected- for the final day of a given month. For example, the first in-sample portfolio is formed on Jan. 31, 1987 based on data available on Jan. 31, 1987 and the first return recorded in-sample is the return realized in February 1987 – though this return is of course associated with the date Jan. 31, 1987.)

In alpha tester, always select low fractile = low values. This convention will maintain consistency across our screens and alpha tests. (This convention was used in the work of Duke Investment Analytics for Global Asset Allocation and Stock Selection. For the second course, Quantitative Stock Selection, Fuqua Investment Analytics has chosen to associate the low numbered fractiles with high factor values.)

Also resolve ties in ranking factor by choose midpoint.

Standard universe screens:

US stocks only (P_COUNTRY_ISO = “US”).

***This should be double-checked. Does use of this variable introduce problems—for example, could a company that was historically a US company but recently reincorporated overseas be excluded from the historical dataset because this parameter, like so many others, may not update historically? ***

Because of what appear to be bad Compustat returns data, we also completely excluded AU, CXW, and TKC from our universe (for Duke Investment Analytics). Fuqua Investment Analytics figured out how to exclude these stocks only on the months (one month for each stock) for which they had what appear to be bad returns data. Look for more details towards the end of this document.

This screen was added by FIA

Exclude ETFs and other investment funds.

We are trying to model factors that affect firm returns, not index or investment fund returns. Additionally, to include these effectively double-counts component companies.

Find more info later in this document pertaining to implementation of this screen.

This screen was added by FIA

Minimum daily dollar volume

Average daily dollar volume over the past month must be greater than $500,000 in 2005.

(S&P averaged a 10.4% CAGR from year-end 1984 through year-end 2004. Thus, we have chosen to reduce our dollar volume screening threshold by 10.4% each year we go back into time. It is $500K for 2005 and $84K for 1987.)

(This threshold is scaled downward in earlier periods because a portfolio manager would have had lesser dollar-liquidity requirements because he would have been managing a lesser number of nominal dollars. Likening a minimum dollar-volume threshold to growth in the S&P500 may not be the best method of specifying this screen- perhaps a better scaling rate than 10.4% time-constant threshold growth can be determined.)

(CM_VOL(0)/(P_TRADING_DAYS(-1M,0M)-1))*((CM_PH(0)+CM_PL(0))/2) > 0.5*POWER(1.104,(INT(CM_DNC(0)/100)-2005))

Find more info later in this document pertaining to weaknesses in methods for estimating average daily dollar volume and market cap.

This screen was added by FIA, replacing a different market cap limit methodology used by DIA

Minimum market cap

Market cap must be greater than $200M in 2005.

We reduced our market cap screening threshold by 7% each year we go back into time. It is $200M for 2005 and $59M for 1987.

(This threshold is scaled downward in earlier periods because of currency inflation and growth in the stock market over time.)

MSHS(0)*MP(0) > 200*POWER(1.07,(INT(CM_DNC(0)/100)-2005))

Find more info later in this document pertaining to weaknesses in methods for estimating average daily dollar volume and market cap.

Code used to select US-traded stocks and to exclude ETFs and other funds and questionable returns data:

(((CA_CUSIP>"90011120" OR CM_DNC(0)>200109) AND (CA_CUSIP>"03512820" OR CM_DNC(0)>199806) AND (CA_CUSIP>"22025Y40" OR CM_DNC(0)>200008)) AND ((SUM(AVAIL(CA_SIC_CODE_HIST(0), G_SIC_CODE), 0)>6722 AND SUM(AVAIL(CA_SIC_CODE_HIST(0), G_SIC_CODE),0)>6726)=1) AND (P_COUNTRY_ISO="US"))=1

This screen was used by DIA, but flaws were discovered by FIA. Thus, this screen was abandoned.

NYSE, NASDAQ, AMEX only

Top 60% of these by market cap.

(Note that using this universe definition yields roughly 1100 firms in January 1987 and roughly 3500 firms by November 2001. The market cap thresholds correspond to roughly $50mil in January 1987 and $200mil in February 2005.)

*** The parameter used for this screen has turned out to be faulty. The values reported for G_EXCHANGE_NAME do not update historically. Thus, companies such as Enron who are presently delisted, but were on the NYSE historically are wrongly excluded from all backtests! ***

Also note that we investigated defining our universe by S&P or Russell index constituents, but it looks as if such data is not presently available with our subscriptions.

Standard formulas to use for common variables:

Exchange:

I could find no parameter that would give me historically updated exchange listing information. FactSet help gave me the following response:

Dear Brian,

This is George from FactSet. I have just found the problem, and it is two fold. First, we don't have historical constituents for the NYSE. Second, an additional subscription is required for historical constituents for the NASDAQ and AMEX. Because of this, you will be unable to see the list as it was back in, say, 1990. Even with the additional subscription, AMEX historical constituents only goes back until 1998.

Sorry for the bad news. Let me know if you have any other questions.

Sincerely,

George T. Hogan

George T. Hogan | Consultant | FactSet Research Systems

| | 0120.779.465 Consulting Services

Considered— G_EXCHANGE_NAME, IB_EXCHANGE, CA_EXCHANGE_NAME, CA_EXCHANGEN, IH_EXCHANGE

G_EXCHANGE_NAME gives only record of current exchange listing. IH_EXCHANGE appears to come the closest to working, but also has many errors and missing data among its records.

Exclusion of ETFs and other Funds:

(SUM(AVAIL(CA_SIC_CODE_HIST(0), G_SIC_CODE),0)>6722 AND SUM(AVAIL(CA_SIC_CODE_HIST(0), G_SIC_CODE),0)>6726)=1

I might have alternately used PTYPEN, but in looking at the selection of companies screened out/in, I thought this method did a better job (it was not totally clear which was the better method though).

Price per share:

MP(0)

This checks out ok on splits (all historical prices are revised when a split occurs; thus, they align with current price).

Also considered— CM_P, P(0), @AVAIL(P(0),MP(0))

P(0) gives all or almost all NAs

Shares Outstanding:

MSHS(0)

This checks out ok on splits (all historical values are revised when a split occurs; thus, they align with current shares outstanding).

Though this reflects as-of-fiscal-period-end share counts that were not necessarily available (ie not as-reported), it eliminates other problems with non-contemporaneous lagged data being used in the current period. For example, companies for which shares outstanding should have been NA were finding an old (lagged 45 days) value that, multiplied with their current day price, pushed them into the market cap consideration set of companies— this caused some 80000% returns to enter our data- problems.

Also considered— CM_SHS(0), CM_SHS(0 L45D)

For future investigation- IH_SHRS_OUT

Perhaps this variable could be combined with a non-split adjusted price (perhaps available via CM_P(xxx)?).

Market Capitalization:

MP(0) * MSHS(0)

Also considered— RI_MKTCAP(0)/1000, CM_MKT_VALUE(0), @AVAIL(P(0),MP(0)) * CM_SHS(0 L45D)

Average Daily Dollar Volume Over the Past Month:

(CM_VOL(0)/(P_TRADING_DAYS(-1M,0M)-1))*((CM_PH(0)*CM_PL(0))/2)

This checks out ok on splits (all historical prices and volumes are revised when a split occurs; thus, they align with current price and shares outstanding; volume and price metrics here are consistent for estimation of dollar volume).

This expression could still use a lot of improvement, but it’s the best we have so far (ideally, we would probably calculate the median daily dollar volume for the preceding 21 trading days—- which may now be possible with P_VOLUME and P_PRICE parameters because the Calendar setting problem has been rectified).

The parameter P_TRADING_DAYS(-1M,0M) was found to work (well enough) in Alpha Testing but not in Universal Screening Reports. It appears to give one more than the actual number of trading days in Alpha Testing. It appears to give one more than the actual number of calendar days (!) in Universal Screening Reports. Perhaps other parameters are similarly functional in Alpha Testing but not in Universal Screening Reports? Note that this calendar day versus trading day problem was later resolved—one must take care to ensure that under Options-> Calendar, the Other: US calendar is selected (not 7-day calendar for example).

???

SUM20(P_PRICE(0)*P_VOLUME(0))

???

Problem with P_VOLUME(-1), P_VOLUME(-2), etc. is that it appears to increment backwards by calendar days, not trading days as desired. Same problem with P_VOLUME(-1,0,1000), P_VOLUME(-2,0,1000), etc. These problems appear to occur in the Universal Screening Report output, but not necessarily in the Excel environment.

***I think that this was a result of the same problem:

{Note that this calendar day versus trading day problem was later resolved—one must take care to ensure that under Options-> Calendar, the Other: US calendar is selected (not 7-day calendar for example).}

The following (in blue) works in Excel, but not in Universal Screening

AVG(P_PRICE(-1M,0D,D)*P_VOLUME(-1M,0D,D))

Note that -1M bounds the date range at the last day of the preceding month. -1AM would have picked out the same day as the day referenced by 0D, but a month earlier. See FactSet Help PageID 1964.

I tested this code with regard to stock splits for one stock in a recent time period and found that historical P_PRICE and P_VOLUME numbers had indeed been accurately revised in the P_ database to align with the post-split share levels. …at least this was my initial conclusion when comparing historical volume quotes for NYSE:SM in Yahoo! Finance with those outputted from FactSet to an Excel spreadsheet. Note that, at least in the case of the ERICY reverse split (and also the recent SM split I believe), it is evident that the historical volumes listed in Yahoo! Finance ARE NOT changed to reflect the new number of shares outstanding. This matter requires further attention/review.

See FactSet Help PageID 614 for info on handling of Dividends, Stock Splits, and Spinoffs.

IB_VOL_1D does not appear to increment backwards through time (at least not in Universal Screening Report (i.e. IB_VOL_1D gives same value as IB_VOL_1D(-1) or IB_VOL_1D(-100)).

Note that median dollar volume would be preferable to mean dollar volume so that our estimated liquidity of a given stock is not thrown off by a one day spike in volume that enters the trailing window sample period.

Minimum Threshold for Average Daily Dollar Volume Over the Past Month:

0.5*POWER(1.104,(INT(CM_DNC(0)/100)-2005))

Note that using the same threshold for both NASDAQ and NYSE stocks is not ideal. Per Bernstein research, “Volume for NASDAQ stocks is inflated relative to NYSE and AMEX stocks due to double-counting of dealer trades.” Thus, a better screen would use different minimum dollar-volume thresholds for NASDAQ stocks than for NYSE and AMEX stocks.

(See “Standard Universe Screens” section for more.)

Book value per share:

AVAIL(CM_BK(0 L45D), G_BOOK_PS_USD(0 L45D))

An improvement?:

IF((SUM(0,IHLQEPSDNC(-12))=CM_DNC(-13) OR SUM(0,IHLQEPSDNC(-12)) = CM_DNC(-14)), AVAIL(CM_BK(0), G_BOOK_PS_USD(0)), AVAIL(CM_BK(0 L2M), G_BOOK_PS_USD(0 L2M)))

This checks out ok on splits (all historical book values per share are revised when a split occurs; thus, they align with current shares outstanding).

CM_BK(0) appears to update quarterly as-of-fiscal-period-end

CM_BK(0) appears to have less missing data (NAs) than G_BOOK_PS_USD(0)

G_BOOK_PS_USD appears to update annually as-of-fiscal-period-end

After finding problems with the G_ dividend variables using current rather than historical values in alpha tester, I re-checked these G_ variables and fortunately did not find this same problem.

Book Value-to-Price:

AVAIL(CM_BK(0 L45D), G_BOOK_PS_USD(0 L45D))/MP(0)

See immediately above for potential improvement

This checks out ok on splits (all historical values are revised when a split occurs. thus, they align with current price and are consistent for the ratio calculation).

It looks as if CM_BK and G_BOOK_PS_USD may only update annually in the 88-89 timeframe. Ideally, we would find a variable that updates quarterly.

Also considered— CM_PBK(0), G_PBK(0)

CM_PBK and G_PBK appear to contain the same data series

Trailing EPS:

AVAIL(IH_EPS_ACT_LTM(0), CM_EPS(0 L45D))

Use method shown for Book Value for improved timeliness of data?

This checks out ok on splits (all historical EPSs are revised when a split occurs; thus, they align with current shares outstanding).

Note that there are significant differences between these two options.

IH_EPS_ACT_LTM is evaluated based on analyst consensus methods of adjustment to earnings (to align with IBES consensus forecast earnings).

CM_EPS matches the sum of the numbers reported in the firm’s 4 most recent 10-Q’s. I believe these are basic EPS, not diluted.

Perhaps in a future analysis, I will use diluted EPS- this would probably be preferable.

Forward EPS Estimate:

IH_MEAN_NTM(0) (A)

Alternates:

AVAIL(IH_MEAN_NTM(0), AVAIL((G_IBES_FY1_MEAN_USD(0)+IH_MEAN_NTMYR(0))/2, AVAIL(IH_MEAN_NTMYR(0), G_IBES_FY1_MEAN_USD(0)))) (B)

AVAIL(IH_MEDIAN_NTM(0), G_IBES_FY1_MED_USD(0)) (C)

IH_MEDIAN_FY2(0) (D)

Use definition C, except for negative FwdEPS. For these, a combination/transform of IH_MEDIAN_FY2(0), IH_MEDIAN_FY3(0), and CQ_SALES_PS_LTM(0 L45D) (F)

AVAIL(IH_MEDIAN_NTM(0), IH_MEDIAN_NTM(-1), IH_MEDIAN_NTM(-2), IH_MEDIAN_NTM(-3), IH_MEDIAN_NTM(-4), IH_MEDIAN_NTM(-5), IH_MEDIAN_NTM(-6), IH_MEDIAN_NTM(-7), IH_MEDIAN_NTM(-8), IH_MEDIAN_NTM(-9), IH_MEDIAN_NTM(-10), IH_MEDIAN_NTM(-11), IH_MEDIAN_NTM(-12), G_IBES_FY1_MED_USD(0)) (G)

AVAIL(IH_MED_EPS_NTMA(0), IH_MEDIAN_NTM(0), G_IBES_FY1_MED_USD(0)) (H)

AVAIL(IH_MED_EPS_NTMA(0), IH_MEDIAN_NTM(0), G_IBES_FY1_MED_USD(0)) (J)

Note that in Definition J, I added an additional condition (screen) that excluded any stocks with forward earnings yield estimates greater than 1. There are only a couple stocks for which this is an issue (i.e. CUSIP 81600630 SEIBELS BRUCE GROUP INC in the late 80s). Clearly this seems to be some sort of data error because a stock cannot be expected to earn for in the coming year than it is worth!