Lecture Notes
Introduction and Chapter 1 (Moore)
(Italics = Handouts)
Introduction
What is statistics? Many definitions
Utts/Heckard: Statistics is a collection of procedures and principles for gathering data and analyzing information in order to help people make decisions when faced with uncertainty.
DeVeaux/Velleman: Statistics is a way of reasoning, along with a collection of tools and methods, designed to help us understand the world.
Agresti/Franklin: Statistics is the art and science of designing studies and analyzing the data that those studies produce. Its ultimate goal is translating data into knowledge and understanding of the world around us. In short, statistics is the art and science of learning from data.
Triola: Statistics is the science of planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data.
Moore: Statistics is the science of learning from data.
What does statistics do? Lots of things!
Statistics plays a role in making sense of the complex world in which we live today.
To be an effective citizen of the 21st century one must be able to make use of the data that is available to us and to critically analyze the statistics with which we are presented.
Transforms raw data to useful information.
How important is statistics? Some Quotes
1.Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.(H. G. Wells)
2.The major problem of man living in the 20th century is to learn to live with uncertainty.(Bertrand Russell)
3.To understand God’s thoughts we must study statistics, for these are the measure of His purpose.(Florence Nightingale)
The three main areas of “practical statistics”. (pg. xviii)
1)Data Analysis (also called Descriptive Statistics (displays and summaries)
2)Data Production
3)Statistical Inference (our goal, it’s what makes statistics so important in our modern world, making good decisions in the face of uncertainty)
READ THE BOOK 3
My approach is more conceptual than computational. There are several reasons that I take this approach:
1)Most of the time when students memorize formulas for a course they are quickly forgotten. One the other hand, once concepts are understood they are likely to be remembered for a long time.
2)We now have available technology that can do the computational work for us. What technology can’t do is decide which procedures to use and interpret the results.
3)Very few, if any, of you are ever going to actually have to “do” statistics but all of you will be faced with statistical data and results that you must use to make good decisions. We live a data driven world, if you are going to be an effective citizen of the 21st century you must be able make good use of data.
Moore states that “The key to learning is persistence” and that “The gain will be worth the pain.” I wholeheartedly agree.
Part 1: Exploring Data
1st question: What do the data say? (reveal the distribution)
1) Organize with charts, tables and graphs
2) Summarize with descriptive measures
Chapter 1: Organizing data and picturing distributions graphically
Individuals – objects of the study
Variable – a characteristic of an individual
Types of data: Categorical,Quantitative
Data Tables (often in a spreadsheet type array) (see fig. 1.1, pg. 4)
row = case (a single individual)column = variable
When looking at data themetadata (information about the data) is important and should be considered, one statistician described the W’s
Who – the individual about which we record characteristics
What – what we record about the individuals, variable(s)
categorical
quantitative
Why – the questions we ask of a variable (can affect type); why are we recording values of this variable
Where
When
hoW (e.g. how percentage body fat was measured)
Metadata adds context and meaning to our data.
Do exercises 1.1, 1.2 on page 5
Categorical Variables: pie charts and bar graphs
EDA, exploratory data analysis, is a set of statistical tools and ideas that help us examine data in order to describe their main features.
Distribution of a variable: what values it assumes and how often – one of the main goals of exploring data is to reveal the distribution. This can be done with graphs or by computing summary measures.
Pie Charts
Bar Graphs (values can be in alpha order, first-come order, or in descending order of frequency)
(Republican candidates: categ data Repub candidates 09Jan16.mtw)
Note: bar graphs are more versatile because they can be used for questions like what percentage of users “love” a device (ex. 1.3, pg 8, very dated data or “movie source”movie sources.mtw) where the total is more than 100%.
Do exercises 1.3, 1.4 on pages 8 and 9.
Quantitative Variables: histograms (e.g. example 1.6, scores on vocab test, use Moore data set eg01-6.mtp)
1) choose the classes (bins, class limits, cutpoints)
same size
no overlap, no gaps
not too many nor too few
2) get the counts (frequencies) for each class and make a frequency distribution table.
3) draw the histogram
Describing the distribution:
1) shape
2) center
3) spread
4) outliers or other unusual features
Shapes
symmetric
skewed (left or negatively, right or positively) the “tail tells the tale”
unimodal, bimodal
bell-shaped
rectangular (uniform)
Organizing or summarizing = some loss of information
Stemplots (stem-and-leaf plots) (pg. 16) (stem2 and hist example)
unordered/ordered leaves
rounding and/or trunctating
spliting stems
back-to-back (see exercise 1.32)
depth
leaf-units
Histograms and Stemplots (kittens)
Do exercise 1.6 (use cutpoints: 14, 16, …, 32, in Minitab notation 16:32/2) and 1.7, 1.9
Time plots (Minimum Wage Time Series.MPJ)
Chapter Exercises: 1.12, 1.13. 1.15, 1.17, 1.20, 1.22, 1.26, 1.32