Sample Size

Slide 1

The goal of this lecture on sample size is to discuss the basic issues associated with selecting a sample size and to discuss the basic approaches that one might take in determining an appropriate sample size.

Slide 2

Sample error is related to sample size. The returns to increasing sample size are diminishing. With a very small sample, the random sampling error will be quite large. If sample size is increased, then random sampling error drops quickly. As sample size increases further, random sampling error does not drop as quickly as it did initially. Hence, the opportunity to identify optimal sample size relative to error and budget.

Slide 3

As mentioned in the lecture on different types of samples, non-probability or non-scientific samples have no statistical properties. Therefore, researchers don’t worry about sampling error for a non-probability sample because they’re uninterested in extrapolating the results from the sample to a larger population. If they’re working with a probability sample, then they’re interested in extrapolating results to a larger population.

There are some practical issues regarding the reduction in random sampling error. The practical issues are threefold: financial, statistical, and managerial.

  • Financial, in the sense that data is a resource and each completed survey has a cost associated with it. The previous graph shows that sampling error declines at a declining rate as sample size increases. The first thing a marketing manager must consider is how much it will cost for each additional data point and how much reduction in error is associated with that cost. Any real time marketing project has a budget.
  • Statistical, in the sense that point estimates are assumed with error. Recall any recent election; poll results always are stated plus or minus some percent. For the leading candidate, newscasters often indicate if his/her lead is within the margin of error. If within that margin, then the results may have reversed had a different sample of voters been queried. From a statistical standpoint, knowing the point estimate could be lower or higher than the true score raises issues about the acceptable plus or minus range. The larger the sample, the closer the endpoints of the range to the point estimate. A point estimate may be +/- 10% for a small sample but only +/1 1% for a large sample.
  • Managerial, in the Bayesian sense ofthe preferred level of confidence about the outcome. How much does additional data reduce uncertainty? Is a high degree of confidence in the estimates necessary, or will a ball park estimate suffice?

Each of these considerations—budgetary, statistical, and managerial—all influence the appropriate sample size for a probability sample.

Slide 4

Here are several approaches that a manager might use to determine a sample size for a probability sample. Some are more suboptimal than others. A few especially poor ways:

  • The worst is the blind guess. Guessing 300 respondents sounds good is a terrible way to determine the sample size. Such a guess probably will be wrong.
  • Managers might also use an available budget. The manager might think he has $10,000 to conduct a study and $6,000 of that $10,000 should be spent on collecting data; therefore, the sample will be as large as $6,000 permits. That’s a terrible way to set a sample size. The point of advertising is to accomplish some goal,so advertisers should spend whatever is necessary to accomplish that goal. Spending too much is a waste and spending too little won’t accomplish the goal. The same is true in selecting a sample size. Using the available budget is a sub-optimal decision rule.
  • How much does one need their uncertainty reduced? A Bayesian-based approach may be reasonable but too complex for most marketing managers.
  • A fourth approach is to use basic rules of thumb. From a statistical standpoint, one rule of thumb is 100 cases for every main group and between 20 and 100 cases for every sub-group. A sample of that size should include enough respondents to avoid major random sampling error. For example, in a study on gender differences, the main group would be 100 males and 100 females. That same study also might examine difference by gender and age, so age would be the subgroup.

There are better ways to identify an appropriate sample size.

  • One easy way is to use conventional wisdom and follow the standards for comparable studies. Some clever people have already examined the statistical and cost implications of different sample sizes and have identified the appropriate size for different types of studies. I recommend this approach because it doesn’t require a great knowledge of statistics, making assumptions, or doing calculations.
  • A more sophisticated approach considers statistical precision; the acceptable plus or minus percent for point estimates. However, this statistical sophistication may be beyondmost managers’grasp, which makes its use problematic.

Slide 5

Here’s what I mean by typical sample size for studies. Test market penetration studies should include no fewer than 200 respondents, but preferably between 300 and 500 respondents. A TV commercial testshould include at least 150 respondents, but preferably 200 to 300 respondents per commercial. Such data is readily available and easy to access and use.

Slide 6

Assuming an appropriate level of statistical sophistication and the availability of certain types of information, the statistical precision approach would be preferred. Here’s the type of things that one must know to use this approach.

  • The variability of the total population and the individual stratum.The make the most efficient use of data collection dollars, one should oversample more variable strata and undersample less variable strata. Overall variability of the population also is important; to reduce random sampling error, the sample size should be larger if the population is more variable.
  • The acceptable level of random sampling error.This level could be high or low, depending on the needed level of confidence (+/- percent) in the estimates.
  • The way in which data are distributed. If data are normally distributed, then a sample of a certain size is needed to ensure a minimal random sampling error. If the data is non-normally distributed—for example, bi-modally or uniformly distributed—then a larger sample is needed, relative to normally distributed data, to ensure a minimal random sampling error.

Slide 7

If you’re uncomfortable with performing calculations, then you might consider online sample calculators like the one linked to in this slide.

Slide 8

The assumption in the statistical precision and sample calculator approach is that there’s one key variable on which to base sample size. That variable could be the most important question in a survey. If you would like to do the calculations yourself, the remaining slides suggest the appropriate formulas for determining sample size. The formula is relatively straightforward; n (the sample size) is equal to the square of the confidence interval (in standard area units) multiplied by the standard error of the mean, and then divided by the acceptable magnitude of error.

Slide 9

Here are two examples based on the formula. Suppose a survey researcher, studying expenditures on lipstick, wishes to have a 95% confidence level and a range of error less than $2.00. The estimate of the standard deviation is $29.00. That estimated standard deviation is possibly based on previous studies, or it could be a guess, but the quality of that guesstimate is critical to properly determining sample size. To apply the statistical precision approach, you must know certain things and feel confident that you know them.

Slide 10

Plugging in the values from the previous slide, remembering the Z score is 1.96; we run through this calculation and discover that the appropriate sample size is 80 respondents.

Slide 11

Let’s take this same example, but let’s double the range of the error from +/- $2to +/- $4. By how much is sample size reduced when the acceptable range of error is doubled?

Slide 12

By doubling the acceptable range of error, from +/- $2 to +/- $4, the necessary sample size shrinks from 808 to 202. It’s ¼ the original sample by doubling the acceptable range of error.

Slide 13

Instead of being 95% confident, assume 99% confident. Instead of a Z score of 1.96, it’s 2.57. Given the same set of calculations, instead of sample sizes of 808 and 202, they’ve grown to 1389 and 347 respectively. Going from a 95% confidence level to a 99% confidence level almost doubles the required sample size.

Slide 14

Suppose the key variable in our statistical precision approach is a proportion. Think about Presidential polls and the voters choosing one candidate versus another, which would be a proportion. If that was the key question, then the appropriate sample size formula would be the one shown here.

Slide 15

For a proportion sample size, the number of items in the sample is calculated as follows: Z squared is the square of the confidence interval, in standard area units. At a 95% confidence level, that’s 1.96 squared. P is the estimated proportion of successes, and q is the estimated proportion of failures (or 1 – p). E squared is the square of maximum error between the true proportion and the sample proportion.

Slide 16

Here’s an example of a calculation based on that previous formula. Assume that p is 0.6, which makes q equal to 0.4. Also assume the difference between the estimated proportion and true proportion is 0.035. Plugging those numbers into the formula produces the appropriate size for a probability sample of 753.

Slide 17

As an alternative to the last formula, one could use the sample size calculator shown here. Scale A indicates the percent favorable responses, which is the p-value. In scale C, what’s indicated is the percent error that’s acceptable at a confidence level, either 95% on the left side of the scale, or 99% on the right side of the scale. All that’s necessary isto take a straight edge, move it on the left-hand scale A to the desired p-level, move the other edge to the right-hand scale C, and line up with the percent error of favorable responses (where the straight edge crosses scale B). The point at which it crosses scale B indicates the appropriate sample size.

Page | 1