Lecture 14 - Meta-Analysis and Quasi-Experiments

From The American Heritage College Dictionary. 3rd Ed. (1997). Houghton Mifflin Company.

The analysis of analyses.

Study Results as unit of analysis.

Two major functions – identical to the functions of all research, but on a larger scale . . .

1) Descriptive: To summarize results (usually relationships between DVs and IVs) across collections of studies

Summaries of relationships across studies used to be done with narrative reviews. Problems: No systematic way of synthesizing the results of individual studies. Conclusions were variable, not reliable. Two reviewers might review the same studies and arrive at different conclusions.

Part of the problem was due to the tendency to report findings as “significant” or “not significant”. When several studies of the same phenomenon reported different findings, some “significant”, some “not significant” there was a lack of credibility in psychological research.

Example of a “summarize relationships” meta-analysis:

Connolly, J. J., Chockalingam, Viswesvaran. (2000). The role of affectivity in job satisfaction: a meta-analysis. Personality and Individual Differences, 29, 265-281.

This meta-analysis was related to the dispositional theory of job satisfaction – persons are satisfied on the job because that’s the way they are, not because of the type of job they have or the way they’re treated.

The researchers found mean correlation of .49 between a measure of positive affectivity and job satisfaction. Suggests that from 10 to 25% of variance in job satisfaction could be due to individual differences in affectivity.

This could have been done within a single organization, but it carries much more weight when done across multiple organizations.


2) Inferential: To examine moderation: factors that affect sizes of relationships from across collections of studies.

Example of a “moderation” meta-analysis:

Griffith, R. W., Hom, P. W., & Gaertner, S. (2000). A meta-analysis of antecedents and correlates of employee turnover: Update, moderator tests, and research implications for the next millennium. Journal of Management, 26, 463-488.

They found that correlations between Pay and turnover depended on whether the organization offered reward contingencies or not. High pay -> low turnover only when reward contingencies were offered. This study could not have been conducted in one organization with only one policy. It required multiple organizations, some with reward contingencies and others without them.

Why not have just one big study?

1. Because it is likely not possible for a single researcher to conduct a study as big as the combination of many other studies.

Some meta-analyses have Ns of 10,000 or more.

2. Because the use of multiple studies increases generalizability of the results.

From an interview with Frank Schmidt . . .

Those advocating it usually argue that a single large N study can provide the same precision of estimation as the numerous smaller N studies that go into a meta-analysis. But with a single large N study, there is no way to estimate SD-rho or SD-delta, the variation of the population parameters. This means there is no way to demonstrate external validity. A meta-analysis based on a large number of small studies based on different populations, measurement scales, time periods, and so on, allows one to assess the effects of these methodological differences.


6 Steps from Understanding Meta-Analysis. Joseph A. Durlack

1. Formulating the research question

Which effects do we wish to summarize?

What relationships do we wish to study?

Currently in I/O – the relationship of scores on the big five personality dimensions to job performance and to job satisfaction.

2. Performing the literature search for articles to include in the M-A.

A. Computerized searches of databases, such as PsychInfo

B. Manual Searches of journal tables of contents.

C. Searches of reference lists of relevant articles.

D. Calls or emails to people doing research in the area.

E. Being cognizant of the round-file effect – positive bias in rs due to looking only at published articles.

3. Coding each study.

For each study, identifying which value that study has on each independent variable being investigated .

e.g. Suppose the dependent variable is efficacy of therapy.

For each study, code Type of therapy: Behavioral or Talking

Group or individual

Professional therapist or lay person

4. Deciding on a measure of effect size for the studies and compute that measure for each study.

Most meta-analyses use either . . .

A. A mean difference measure analogous to d = (μ1-μ2)/σ

This measure or measures equivalent to it have been defined for many common research designs. Oftentimes, can compute from reports of t or F.

B. A correlation measure, equal to or analogous to Pearson r.

Must compute the same measure from each study.


5. Statistical analysis.

From each study, a measure of effect size is computed. May be more than 1 measure if more than 1 dv was computed.

Descriptive Studies

A weighted mean effect size is computed using the formula

(Illustrated assuming the effect size for study i is ri based on a sample of size Ni.

ΣNiri

r-bar = ------should be called the weighted r-bar

ΣNi

The variance of the effect sizes is computed using the formula

ΣNi(ri- r-bar)2

S2r = ------

ΣNi

This is an extension of the basic variance formula to allow differential weighting of each r based on the size of the sample from which it was obtain.

Adjustment of Mean and Variance

The mean and variance are adjusted for

A) Reliability of the independent variable

B) Reliability of the dependent variable

C) Range restriction of the dependent variable.

The variance is also adjusted for

D) Sampling error.

These adjustments are somewhat complex, but doable with a good spreadsheet program..

There are several meta-analysis programs available, some for free.

I have the Schmidt and Hunter program, available for use by students.

6. Testing for existence of systematic differences between studies

The Q statistic

N

X2K-1 = ------* S2r

(1-r-bar2)2

If the X2 is not significant, then it will be assumed that the effect sizes came from only one population of studies and the mean is an estimate of the mean effect size of that population of studies.

If the X2 is significant, then it must be assumed that there are two or more populations represented by the group of studies, and some attempt to distinguish among them must be made.

Finding sources of systematic differences between studies.

Individual studies are treated as rows of a data matrix.

Individual study effect sizes are the scores that are analyzed.

Regression analysis may be used to examine the relationship between ES’s and quantitative IV’s.

t-tests or the analysis of variance may be used to compare mean ES’s across subgroups defined by the independent variables. In this instance, however, the investigator should first make sure that the individual subgroups of studies are homogeneous using the Q statistic prior to conducting the group mean comparison tests.


Examples of results of a meta-analysis:

From . . .

Most people report the true operational validity when estimating a population correlation for selection situations.

They report the true-score correlation when estimating a population correlation in contexts other than selection.
Example 2 from . . .


THE ROLE OF EMOTIONAL INTELLIGENCE IN LEADERSHIP

EFFECTIVENESS: A META-ANALYSIS

A Thesis Presented for the

Master of Science Degree

The University of Tennessee at Chattanooga

Ashleigh D. Farrar

May 2009

Abstract

Leaders are an essential element of the business world. While good leaders can provide many benefits for an organization, unsuccessful leaders can be detrimental. The notion that emotional intelligence plays a part in whether a leader is effective or not effective has recently been introduced. This study sought to unify the literature evaluating the possible link between emotional intelligence and leadership effectiveness. Meta-analytic techniques were used to analyze this relationship. Results revealed that overall, there is a positive relationship between emotional intelligence and leadership effectiveness. Also, while the type of emotional intelligence measure used served as a moderator to this relationship, a second and third meta-analysis supported the overall positive relationship of emotional intelligence and leadership effectiveness for each type of EI.

Results

The central aim of the present study was to examine the overall relationship of EI and leadership effectiveness. The initial meta-analysis was conducted using all of the included studies. The results of this meta-analysis are provided in Table 1. A total of 20 correlations were used from 20 studies, with a total sample size of 3,295. After correcting for unreliability in both EI and leadership effectiveness measures, the sample-size-weighted mean rho linking the constructs was .458. The 80% credibility interval did not include zero, indicating that there was a relationship between EI and leadership effectiveness. These results supported Hypothesis 1.

Ashley’s data


Ashley’s results

Table 1: All Studies

k 20

Total Sample Size 3295

Mean Rho 0.457

Variance of Rho 0.028

80% Credibility .24-.67

Table 2: EI Mixed Model Measures

k 12

Total Sample Size. 2265

Mean Rho 0.427

Variance of Rho 0.030

80% Credibility .20-.65

Table 3: EI Ability Model Measures

k 8

Total Sample Size. 1030

Mean Rho 0.536

Variance of Rho 0.013

80% Credibility .39-.68


From Frank Schmidt’s presentation at the 2012 RCIO conference

Table 1. Selection methods for job performance

Selection procedures/predictors / Operational
validity (r) / Multiple R / Gain in
validity / % gain in validity / Standardized
Regression weights
GMA / Supple-
ment
GMA testsa / .65
Integrity testsb / .46 / .78 / .130 / 20% / .63 / .43
Employment interviews (structured)c / .58 / .76 / .117 / 18% / .52 / .43
Employment interviews (unstructured)d / .60 / .75 / .099 / 15% / .48 / .41
Conscientiousnesse / .22 / .70 / .053 / 8% / .67 / .27
Reference checksf / .26 / .70 / .050 / 8% / .65 / .26
Biographical data measuresg / .35 / .68 / .036 / 6% / .91 / -.34
Job experience h / .13 / .67 / .023 / 4% / .66 / .17
Person-job fit measuresi / .18 / .67 / .020 / 3% / .64 / .16
SJT (knowledge)j / .26 / .67 / .018 / 3% / .76 / -.19
Assessment centersk / .37 / .66 / .014 / 2% / .78 / -.19
Peer ratingsl / .49 / .66 / .013 / 2% / .55 / .16
T & E point methodm / .11 / .66 / .009 / 1% / .65 / .11
Years of educationn / .10 / .66 / .008 / 1% / .65 / .10
Interestso / .10 / .66 / .008 / 1% / .65 / .10
Emotional Intelligence (ability)p / .24 / .65 / .007 / 1% / .70 / -.11
Emotional Intelligence (mixed)q / .24 / .65 / .005 / 1% / .63 / .09
GPAr / .34 / .65 / .004 / 1% / .71 / -.10
Person-organization fit measuress / .13 / .65 / .004 / 1% / .64 / .07
Work sample testst / .33 / .65 / .003 / 0% / .69 / -.07
SJT (behavioral tendency)u / .26 / .65 / .000 / 0% / .64 / .03
Emotional Stabilityv / .12 / .65 / .000 / 0% / .64 / .02
Job tryout procedurew / .44 / .65 / .000 / 0% / .63 / .02
Behavioral consistency methodx / .45 / .65 / .000 / 0% / .64 / .02
Job knowledgey / .48 / .65 / .000 / 0% / .65 / -.01

Table 2. Selection methods for training performance

Selection procedures/predictors / Operational
validity (r) / Multiple R / Gain in
validity / % gain in validity / Standardized
Regression weights
GMA / Supple-
ment
GMA testsa / .67
Integrity testsb / .43 / .78 / .109 / 16% / .65 / .40
Biographical data measuresc / .30 / .74 / .073 / 11% / 1.04 / -.50
Conscientiousnessd / .25 / .73 / .061 / 9% / .69 / .29
Employment interviewse / .48 / .72 / .051 / 8% / .57 / .28
Reference checksf / .23 / .71 / .038 / 6% / .67 / .23
Years of educationg / .20 / .70 / .029 / 4% / .67 / .20
Interestsh / .18 / .69 / .024 / 4% / .67 / .18
Peer ratingsi / .36 / .67 / .002 / 0% / .70 / -.06
Emotional Stabilityj / .14 / .67 / .001 / 0% / .66 / .03
Job experience (years)k / .01 / .67 / .000 / 0% / .67 / .01

Note. Operational Validity estimates in parentheses are what is reported in Schmidt and Hunter (1998, Table 2). Selection procedures whose operational validity is equal to and greater than .10 are listed in the order of gain in operational validity.

Unless otherwise noted, all operational validity estimates are corrected for measurement error in the criterion measure and indirect range restriction (IRR) on the predictor measure to estimate operational validity for applicant populations.


Quasi-Experiments

True Experiment

Design in which individual participants are randomly assigned to conditions.

Quasi-Experiment

Anything else.

Most often: Designs in which different treatments are assigned (perhaps randomly) to already existing groups.

Always: Designs comparing Subject variables, such as gender, graduate program, age, any prior condition

Sometimes: Designs for which participants could be randomly assigned but are not for one reason or the other.

Some authors differentiate designs for which assignment of conditions to groups is possible (e.g., Training programs to different buildings) from those for which it is not (e.g., Gender).

Pretest-Posttest with Nonequivalent Groups Design

Also called the Nonequivalent Control Groups Design with Pretest (NECG with Pretest Design)

The most frequently discussed Quasi-Experimental Design

The design involves pretests on both groups, making one the experimental group and the other the control group, then taking a post observation of both.

Diagrammed as

Pre Condit Post

O1 XE O2

------The line signifies nonequivalence.

O1 XC O2

The true experimental counterpart is the Randomized Groups Design. (RG Design)

O1 XE O2