JMP Tutorial #3 Summaries for a Single Numerical Variable

JMP Tutorial #3 – Summaries for a Single Numerical Variable

Data File: / Sleep-time.JMP
Background: / These data come from a study comparing the time it takes for smokers and non-smokers to fall asleep.
Variables: / Sleep-time.JMP
> Smoking Status - smoker or non-smoker
> Sleep Time - time to fall asleep

There are three basic things needed to sufficiently describe a numerical variable. These things are measure of location, measure of variability, and measure of shape.
To begin, select Analyze > Distribution. In the Distribution window that appears, place Sleep Time in the Y, Columns box as follows.

To obtain an extended list of summaries, right click on diamond next to Moments and select Display Options > More Moments.

Measures of Location:

There are several measures of location.

Number / Description
1 / The total sample size (n = 84).
Note: JMP uses capital N to denote sample size.
2 / The sample mean for all individuals in the study is 20.48 minutes to fall asleep.
3 / The sample median is 20.45 minutes. This says 50% of the individuals in the study fell asleep in less than 20.45 minutes and 50% of the individuals to longer. Note also that the sample mean and sample median are very close in value indicating that the distribution of times to fall asleep is nearly symmetric.
4 / The smallest observed time to fall asleep was 15 minutes.
5 / The largest observed time to fall asleep was 25.8 minutes. The range is therefore 10.8 minutes (25.8 – 15.0).
6 / The first quartile or 25th percentile/quantile is 18.025 minutes which says that 25% of the individuals in our sample fell asleep before 18.025 minutes and 75% of the individuals took longer.
7 / The third quartile or 75th percentile/quantile is 23.0 minutes which says that 75% of the individuals in our sample fell asleep before 23.0 minutes and 25% of the individuals took longer. The interquartile range (IQR) is the difference between the third and first quartiles:
IQR = 23.00 – 18.025 = 4.975 minutes
This is the range of the middle 50% of the data.

Measures of Variability:

There are also several measures of variability which are described next

Number / Description
1 / The sample variance minutes2. This quantity has no real interpretation.
2 / The sample standard deviation s = 3.054 minutes.
We can use Chebyshev’s Theorem to say that at least 75% of individuals fall asleep between:

In actuality all individuals in our sample had times to fall asleep in this range. Thus the “at least” part of Chebyshev’s.
3 / The standard error of the mean is .
The standard error of the mean gives an estimate of the precision of our sample mean. As a rule of thumb the sample mean give or take two standard errors gives a range of values that is very likely to cover the true population mean () (chance to be precise). Here this would give the following:

4 /
Range = Maximum – Smallest = 25.8 – 15.0 = 10.8 min.
5 /
Inter-quartile Range = 75% Percentile - 25% Percentile = 4.975 min.
(IQR)

Measures of Shape:

There are two basic visual displays for shape -- histogram and boxplot. These things are usually displayed horizontal instead of vertically as done in JMP. Right click on Sleep Time, select Display Options > Horizontal Layout to change the graph to landscape view.

Number / Description
1 / This is a histogram of the times to fall asleep. The distributional shape is almost uniform.
You can change the number of bins/class intervals used to construct the histogram by changing the mouse to the hand mode, holding down the left mouse button, and moving the mouse up and down.
Disadvantage: Changing the bins may change your perception of shape. (** See hand tool below.)
2 / This is an outlier boxplot of the times to fall asleep. There does not appear to be any outliers in these data.
3 / This bracket highlights where the most densely packed 50% of the data lies.

Comment: You can change the number of bins in the histogram by clicking the hand button [] on the menu bar, placing the cursor over the graph. Moving the cursor up while holding down the left-mouse button will increase the number of bins, lowering it will decrease the number of bins. Click on the arrow icon [ ] to turn this feature off.

Before we discuss measures of shape, consider the following descriptions.

The most common distribution is the normal distribution which is bell-shaped. A picture of a normal distribution is given here.

The normal distribution is symmetric because the shape above and below the center is the same. A distribution that does not have the same shape above and below the center is a skewed distribution.

Pictures of Skewed Distributions
Skewed Right / Skewed Left

Kurtosis measures the steepness of the high point relative to the normal distribution.

Understanding Kurtosis
Negative Kurtosis / Positive Kurtosis

Interpreting these values for our dataset.

Number / Description
1 / We have slight skewness to the left because the skewness statistics is negative. However it is very near 0 so it is better in this case to say that the distribution is nearly symmetric.
2 / The kurtosis is negative indicating that this distribution is less peaked than the normal distribution.

Many statistical procedures require that the distribution of all measurements take on a certain form. For example, a common assumption to many procedures is that the measurements follow a normal distribution. JMP allows us to visually compare our estimated or empirical distribution to serveral common distributions.

To check how closely the data follows a normal distribution, right click on the header of the histogram/boxplot chart, select Fit Distribution > Normal.

The histogram now contains a red line for the best fitting normal distribution for that data.

How well does this data fit a normal distribution?

Not very well is this case. Our distribution is more spread out and less peaked than the normal ideal.

A smoothed histogram can also help in understanding distributional shape. To obtain a smoothed curve estimate select Smooth Curve from the Fit Distribution pull-out menu.

Here we see that there normal curve and smoothed histogram curve do not match very well. One would not characterize the times to fall asleep as having a normal distribution.

Comparative Displays:
Suppose the goal is to compare/contrast the time to fall asleep between smokers and non-smokers. This can be done fairly easily in JMP.

To begin, select Analyze > Distribution. Place Sleep Time in the Y, Columns box and place Smoking Status in the By box. Click OK.

JMP returns the following output

Notice that the histograms have different scaling on the horizontal axes. To put them on equal scaling we can change the range of the histogram for non-smokers to go from 14 to 26 minutes also. To do this in JMP select Distributions > Uniform Scaling as shown below.

The resulting histograms with uniform horizontal axis scaling are shown below.

One of the best ways to compare a single numerical variables (sleep time) across another categorical variable (smoking status) is to create side-by-side boxplots.

This is done in JMP by selecting Analyze > Fit Y by X. Place Sleep Time in the Y, Response box, place Smoking Status in the X, Factor box, and click OK.

JMP first gives just dotplot of the sleep times, to put boxes over the points, right click on the header of the graph, select Display Options, and click Box Plots and Points Jittered.

What do we see in this plot? What are the similarities? What are the differences?