Research Questions, Variables, and Hypotheses

Jeff Harris

Draft 6/24/05

In the initial stages of designing research it is important to clearly define what to study. The research process starts with a research question, specifying concepts or constructs of interest. For instance, a research question might be, “do omega-3 fatty acids affect blood clotting?” A common mistake made by researchers is to stop here

and start planning a research design and appropriate methods for the study. It is more appropriate to follow the research question with defining the variables to be measured.

There are two general types of variables, continuous and discrete. Continuous variables are also called quantitative variables sometimes. They are variables measured on a scale in which a value could be placed between any two numbers. Serum cholesterol, blood urea nitrogen, systolic blood pressure, and waist circumference are examples of variables measured on a continuous scale.

Discrete variables are sometimes referred to as categorical variables. A

discrete variable consists of distinct categories either with an inherent order (ordinal)

or with no defined order (nominal). Examples of ordinal variables would be pain severity, amount of stress, or tastiness of a meal measured on a Likert Scale. Examples of nominal variables would be ethnicity, marital status, or geographic area.

As one defines variables to measure concepts specified in the research question,

the researcher needs to decide if the variables are discrete or continuous. Also, it needs to be determined how the concepts will be measured. So we have the concepts of omega-3

fatty acids and blood clotting in our original research question. What measurable form can be defined for these concepts? Will we measure fatty fish intake or omega-3 fatty acids administered in supplemental form (fish oil capsules)? For blood clotting we can use a biochemical parameter such as clotting time derived from a plasma sample?

The key is to specify the measurable variable versions of the concepts in the research question.

At this stage of planning research it is also important to brainstorm potential

confounding variables so they can either be controlled or measured in the research design. Confounding variables are those that can serve as alternative explanations for the results of the study. For the research question presented above a confounding variable could be the taking of a certain dosage of aspirin by study participants. Awareness of confounding variables will create a context for deriving conclusions from the research.

Finally, a measurable hypothesis must be developed to answer the question,

“What is really being studied? In the statement of the hypothesis it is important to

include the population being studied, the time frame, the alpha level for defining statistical significance, the type of relationship being examined, and the variables being studied. So for our previously stated research question an appropriate hypothesis might be, “There is no statistically significant difference at the p < 0.05 level of significance in plasma clotting times between 45-75 year old female American Indians taking either 3 grams of combined DHA and EPA in capsule form or a placebo.” Stated this way

researchers are much more likely to pick a valid research design, know what methods to choose to measure variables of interest , and be able to more specifically interpret the data derived from the study. It should be noted that hypotheses can be stated in the null

(there is no difference or relationship) or as an alternative hypothesis (there is a difference or relationship). It is very disappointing to see manuscripts or published articles that really do not answer the research question or have minimal basis for valuable interpretation because measurable hypotheses were not defined.

So to reiterate, there are three initial steps in deciding what issue will be

studied. The first is to state a research question including concepts of interest. Second,

brainstorm primary and confounding variables in addressing the question. Finally,

state a specific, measurable hypothesis from which research designs and methods can be defined.

Measures of Central Tendency and Standard Deviations

The most well-recognized measure of central tendency is the arithmetic mean

or average. For the following five serum LDL measurements (in mg/dl), 74, 94, 113, 121, and 135 the mean is 107. The five values are added together and divided by the number of values. In summarizing data it is appropriate to report the mean when the

data are normally distributed. Also, most parametric statistical tests (t-tests, ANOVA, etc.) assume that samples being compared are normally distributed because they utilize all the data, including the means. When the data are not normally distributed with skewed distributions or extreme values the mean is not a very good measure of central tendency. For example, in the LDL example given above if the last listed value is not 135 but 180 the mean changes dramatically to 116 mg/dl. So what other measure can we look at that is a better description of central tendency if we have data that aren’t normally distributed? The median can be reported.

The median is the data value that splits the data array in half. Half of the data values are below it and half above. For the LDL examples above the median is 113 mg/dl. Notice that it doesn’t change with the addition of an extreme value. When data are not normally distributed it is important to report medians not means, unless the data can be transformed to normality successfully by using log or trigonometric transformations. It is important to note that the mean of the transformed data must be reported in this case and not the mean of the raw, untransformed data. This transformed mean is often not very meaningful for the reader of a manuscript. It still might be advisable to report the median of the data.

If the data are not normally distributed and cannot be successfully transformed then parametric tests cannot be used. Nonparametric tests must be used such as the Mann-Whitney Test and the Kruskal-Wallis Test.

A common mistake researchers make is not to test their data for normality.

Another is that if the data aren’t normal often authors will report means and standard deviations and use parametric tests.

It is important to note that standard deviations also lose their relevancy if the data are not normally distributed. Standard deviations use all the data, including extreme values and means in the calculation. So it is not appropriate to report standard deviations for non-normal data.

Analysis of Variance (ANOVA)

There are four common types of analysis of variances that are run. They are

One-Way ANOVA, Multi-Way ANOVA, Repeated Measures ANOVA, and Multiple

ANOVA (MANOVA). Each are applied to different types of data. All types of ANOVA assume that the samples compared are normally distributed and variances between samples are equal. If there is too much deviation from these assumptions nonparametric versions of these tests must be used.

One-Way ANOVA is used when there is a situation in which a relationship is being examined between a continuous variable and a discrete variable with more than two categories. Typically, if the discrete variable has two categories an independent

t-test is used. One-Way ANOVA is used to examine whether there is a difference between more than two means for a continuous variable between categories related to a discrete variable. For example, if a comparison was being made of mean hemoglobin A1-C values between Type 2 diabetics receiving an exercise program, a low glycemic index diet, metformin, and a placebo, a One-Way ANOVA would be used. The ANOVA onlydetermines if there is a significant difference somewhere between the different groups, but not which groups are different from one another. If the F-test (the test statistic for an ANOVA) is significant thenpost hoc tests are run with names such as the Scheffe Test and Tukey Test to determine where the differences are between groups.

Multi-Way ANOVA is used when a relationship is being examined between a continuous variable and more than one discrete variable. For example, using the example given above, differences in HbA1-C might want to not only be looked at between different type of treatments but also between African-Americans and Caucasians. With this type of test the independent effects of the discrete variables as well as their joint effects can be examined. The independent effects are called main effects and the joint effects, interactions. It is important for those analyzing the data that if there are interactions the relevance of the main effects are mute. If it is discovered that metformin works in lowering HbA1-Cfor African-Americans, but not for Caucasians it is irrelevant to answer the general question, “Is metformin an effective treatment?” The answer would be, “it depends on whether you are African-American or not.” Similar to One-Way ANOVA, post hoc tests must be run to see where the specific differences actually are.

Repeated Measures ANOVA is used when a researcher is looking at changes in a continuous variable over time or changes in a group of subjects when different treatments are applied to them. Let’s say that a group of 50 people were housed in a metabolic ward to see the effects of different diets on systolic blood pressure. For two weeks each were put on a typical American diet, for two weeks a DASH diet,for two weeks a Mediterranean diet, and for two weeks a high animal protein, low carbohydrate diet. At the end of each diet period systolic pressure was measured in each person. Repeated Measures ANOVA would look at the differential effect on systolic blood pressure by the different diets. Since all subjects received all dietsthere is now a variation within given subjects because each subject received all the diets. When conducting One-Way ANOVA all subjects don’t get all treatments so there is noseparate within-subjects variation. For Repeated Measures ANOVA this variation must be factored in. Again, post hoc tests must be run after running the ANOVA to determine where the differences lie.

MANOVA is used when a relationship between one or more discrete variables and more than one continuous variable is being examined. Using one of the examples given above, let’s say it was the intention to examine the relationship between different

treatment modalities for Type 2 diabetes (exercise, metformin, etc.) and both HbA1-C and serum LDL levels. This is a multivariate statistical test allowing us to answer questions with one statistical test rather than running multiple tests. We could do separate One-Way ANOVA’s for HbA1-C and serum LDL separately. But by doing multiple tests on the same sample we increase the chances of committing a Type 1 error.

If we can do one test the increased risk of a Type 1 error can be decreased.

With MANOVA post hoc tests will need to be conducted to ferret out the specific categorical differences.

Nonparametric Tests

Typically researchers are most familiar with parametric statistical test (t-tests,

ANOVA, Pearson’s Correlation, etc.) which rely on specific assumptions. Samples used in parametric analysis must be normally distributed or able to be mathematically transformed into normally distributed data. Also, the samples should have equal variances. In addition,parametric tests are most effective when used with large sample sizes. When these assumptions are violated there are other kinds of tests, nonparametric that can be used to test research hypotheses.

For each parametric test there is a nonparametricalternative. For analyzing two dependent samples the Sign Test or the Wilcoxon Signed-Rank Test can be used rather than the paired t-test. For comparing two independent samples the Wilcoxon Rank Sum Test or the Mann-Whitney U Test can be used. For testing whether two continuous variables are correlated the Spearman Correlation can be used as an alternative to the Pearson Correlation. The Kruskal-Wallis Test is the alternative for the One-Way ANOVA. Finally, the Friedman Test is the alternative for Repeated Measures ANOVA.

As manuscripts are reviewed common errors seen are failure to test for normality of samples, the use of means when medians would be more appropriate, and failure to use nonparametric tests when parametric assumptions are violated. Manuscript reviewers would like to see nonparametric tests used more often. Often authors don’t like them because they are conservative, meaning that it is often more difficult to reject the null hypothesis than with parametric tests..