Lecture I

Basic Concepts in Statistics

(see Howell, 1992, Chap. 1-3)

·  Sample vs. population

·  Random sampling vs. random assignment

·  Discrete (categorical) variable vs. continuous (quantitative) variable

Basic Concepts: 2

·  Independent vs. dependent variable (IV vs. DV)

·  Descriptive vs. inferential statistics

·  Parameters vs. statistics

Measurement Scales

·  Nominal (Categorical)

·  Ordinal

·  Interval

·  Ratio

Plotting Data

For Categorical Data

·  Pie charts

·  Bar charts

For Quantitative Data

·  Histograms

To See Extreme Values

·  Boxplots

·  Stem and Leaf Plots

Summary Sample Statistics:

Central Tendency Estimators

For a sample of N observations on a variable x, where the ith observation is

xi:

MEAN: mx = (Si xi)/N

MEDIAN: 50th percentile; rank the observations and the 0.5*(N+1)th observation in order is the median

MODE: the most frequent observation

Summary Statistics:

Dispersion Estimators

·  Range

·  Interquartile Range

·  Average Deviation

·  Mean Absolute Deviation

·  Variance (Var)

Sample (s2)

s2 = (Si {xi – mx}2)/(N-1)

Population (s2)

s2 = (Si {xi – mx}2)/N

·  Standard Deviation (S.D.) is the square root of the variance i.e. Ö(Var)

Judging Estimators/Statistics: 1

There are 4 criteria …

·  Sufficiency: A sufficient statistic uses all the information in a sample that is relevant to the parameter being estimated

·  Unbiasedness: An unbiased statistic has an expected value that is equal to the parameter being estimated

Judging Estimators/Statistics: 2

·  Efficiency: (a relative property) an efficient statistic is more efficient than its rivals; the standard error of the statistic is smaller than that of its rivals

·  Resistance: Resistant estimators are those which are relatively uninfluenced by outliers in the data

Judging Estimates of

Central Tendency

Mean / Median / Mode
Suff. / Yes / No / No
Unbias. / Yes / Yes* / Yes*
Eff. / High / Lower / Low
Resist. / Low / High / High

*for symmetric distributions

Judging Estimates of Dispersion

VARIANCE (or S.D.) is a sufficient, unbiased and efficient estimator of dispersion (and the others are not)

·  Bias – relates to different calculation for sample and population variance

DEGREES OF FREEDOM (D.F.)

In a sample of N observations:

·  with a fixed mean of 100.0, how many observations are free to vary?

·  what are the degrees of freedom for a sample variance?

TRANSFORMING DATA

·  Linear: Xnew = b*Xold + c

·  Z-transform (Standardisation):

Xnew = (Xold – mx)/ sx

·  Effects of Linear Transformations

·  How do we convert raw test scores into say IQ on the usual scale (mean=100; s.d.=15)?

Effects of Linear Transformations

Mean / Var. / S.D.
Add/subtract
by constant (c) / Add/sub
by const. / No
change / No
change
Multiply/divide
by constant (b) / Mult/div
by
const. / Mult/div
by
(const.)2 / Mult/div
by
const.

THE NORMAL DISTRIBUTION

Properties:

-  symmetrical

-  bell-shaped

-  tails go to “asymptote” at plus/minus infinity

-  mean=median=mode

-  95% of distribution lies between + and – 2 s.d.

THE NORMAL DISTRIBUTION: CONTINUED

-  Standard normal distribution (Mean=0; s.d.=1)

-  Z to prob. conversion: p=CDF.Normal(z, mean, sd)

CDF=cumulative distribution function (also z-1)

-  prob. to Z conversion (z or “probit” function): z=IDF.Normal(p, mean, sd)

IDF=inverse (cumulative) distribution function

-  Areas under the normal curve

-  Confidence limits

Distortions to Distributions

SKEWNESS

Positive vs. Negative

KURTOSIS

Platykurtic – broad tails

Mesokurtic – normal

Leptokurtic – thin tails

4