Lecture I
Basic Concepts in Statistics
(see Howell, 1992, Chap. 1-3)
· Sample vs. population
· Random sampling vs. random assignment
· Discrete (categorical) variable vs. continuous (quantitative) variable
Basic Concepts: 2
· Independent vs. dependent variable (IV vs. DV)
· Descriptive vs. inferential statistics
· Parameters vs. statistics
Measurement Scales
· Nominal (Categorical)
· Ordinal
· Interval
· Ratio
Plotting Data
For Categorical Data
· Pie charts
· Bar charts
For Quantitative Data
· Histograms
To See Extreme Values
· Boxplots
· Stem and Leaf Plots
Summary Sample Statistics:
Central Tendency Estimators
For a sample of N observations on a variable x, where the ith observation is
xi:
MEAN: mx = (Si xi)/N
MEDIAN: 50th percentile; rank the observations and the 0.5*(N+1)th observation in order is the median
MODE: the most frequent observation
Summary Statistics:
Dispersion Estimators
· Range
· Interquartile Range
· Average Deviation
· Mean Absolute Deviation
· Variance (Var)
Sample (s2)
s2 = (Si {xi – mx}2)/(N-1)
Population (s2)
s2 = (Si {xi – mx}2)/N
· Standard Deviation (S.D.) is the square root of the variance i.e. Ö(Var)
Judging Estimators/Statistics: 1
There are 4 criteria …
· Sufficiency: A sufficient statistic uses all the information in a sample that is relevant to the parameter being estimated
· Unbiasedness: An unbiased statistic has an expected value that is equal to the parameter being estimated
Judging Estimators/Statistics: 2
· Efficiency: (a relative property) an efficient statistic is more efficient than its rivals; the standard error of the statistic is smaller than that of its rivals
· Resistance: Resistant estimators are those which are relatively uninfluenced by outliers in the data
Judging Estimates of
Central Tendency
Mean / Median / ModeSuff. / Yes / No / No
Unbias. / Yes / Yes* / Yes*
Eff. / High / Lower / Low
Resist. / Low / High / High
*for symmetric distributions
Judging Estimates of Dispersion
VARIANCE (or S.D.) is a sufficient, unbiased and efficient estimator of dispersion (and the others are not)
· Bias – relates to different calculation for sample and population variance
DEGREES OF FREEDOM (D.F.)
In a sample of N observations:
· with a fixed mean of 100.0, how many observations are free to vary?
· what are the degrees of freedom for a sample variance?
TRANSFORMING DATA
· Linear: Xnew = b*Xold + c
· Z-transform (Standardisation):
Xnew = (Xold – mx)/ sx
· Effects of Linear Transformations
· How do we convert raw test scores into say IQ on the usual scale (mean=100; s.d.=15)?
Effects of Linear Transformations
Mean / Var. / S.D.Add/subtract
by constant (c) / Add/sub
by const. / No
change / No
change
Multiply/divide
by constant (b) / Mult/divby
const. / Mult/div
by
(const.)2 / Mult/div
by
const.
THE NORMAL DISTRIBUTION
Properties:
- symmetrical
- bell-shaped
- tails go to “asymptote” at plus/minus infinity
- mean=median=mode
- 95% of distribution lies between + and – 2 s.d.
THE NORMAL DISTRIBUTION: CONTINUED
- Standard normal distribution (Mean=0; s.d.=1)
- Z to prob. conversion: p=CDF.Normal(z, mean, sd)
CDF=cumulative distribution function (also z-1)
- prob. to Z conversion (z or “probit” function): z=IDF.Normal(p, mean, sd)
IDF=inverse (cumulative) distribution function
- Areas under the normal curve
- Confidence limits
Distortions to Distributions
SKEWNESS
Positive vs. Negative
KURTOSIS
Platykurtic – broad tails
Mesokurtic – normal
Leptokurtic – thin tails
4