Lab 1: Welcome to the Laboratory!

Summer, 2011

Objectives:

•Introduction to the lab environment.

•Log into your statistics account

•Learn about printing quota

•Get to know your directories

•Construction of groups

•Introduction to Software

•Excel worksheets

•Opening a data set

•Calculating basic summary statistics

1.1: Logging into the MSRC Lab

•Select the Windows 2000 prompt.

•Type your username and password in the Windows 2000 prompt. Your username consists of thefirst 8 letters of your registration name(first name, middle name, last name). For instance, if yourregistration name is Joan Ann Beckman, then your user name is joanannb (note that you must uselower-case letters).

•Use the Tab key to switch between dialog boxes. Your password is given by acapital S followed bythe first 7 digits of your student number. For instance, if your student number is 23567989, then yourpassword is S2356798.

•Change your default password. You should change your default password immediately after accessing your account, by pressing the keys Ctrl, Alt and Delete simultaneously. It is your responsibility to remember your password. However, in case you ever forget it, you should send an e-mail in order to let the System Manager know about the problem. Please provideyour full name, student number and lab section in your e-mail. You must wait for a day to receiveyour new password.

•In some cases, the names of more than one person lead to the same username. In these cases, some will not have a username as described above. If you have trouble logging in, the TA will be able to provide you with the correct username.

1.2: Printing information

The printer is located in this building, outside this room. We anticipating that your printing needs will only be for assignments, not for the lab activities.

You have a print quota of 35 two-sided pages for the term to print out homework assignments. If

you exceed your print quota, it will not be refilled automatically. You can renew your quota for $20/50

double-sided pages through the Main Office in the Department of Statistics (Room 333, LSK building).

1.3: Directories and Website

You will have access to two directories. The Share or S: directory is accessible by all students and contains folders for multiple courses. Naturally, the folder of interest for this course is titled “Stat200”. It is here that you will find data sets and other files of interest. The other directory is your personal one (Z;). It is here that you will save your work for future references. It is important that you don’t save your work in the S: directory.

The course website is found on the SLATE website at . SLATE has links to the websites of many Statistics Department courses. Find the Stat 200 website for the correct semester(2011Summer Term 1STAT 200).

2.1: General layout of Labs

The labs will account for a portion of your course grade. There will be a total of 8 lab activities, including this one, which will require work to be handed in for grading. So people later in the week are not at a possible advantage, the lab activities will change throughout the week (although the concepts being visited will remain the same).

Since labs are short, they have been divided into four steps to maximize efficiency (both in terms of the learning experience and time management).

  1. Pre-reading of material to help prepare for the lab. This will contain objectives, commands for relevant tasks in the software and may introduce the problem. Pre-reading material will be posted on the course website in the Labs section. Please read the material before your lab.
  2. Briefing: In the first five to ten minutes of the lab, the TA will explain the problem that is to be solved and go over useful tips/reminders.
  3. Problem Solving: The middle 35 minutes will be used to solve the problem at hand. The TA will be available to answer questions. Groups who finish early should work quietly while others finish up.
  4. De-Briefing: The last 5 - 10 minutes are used to recap what has been covered. You may be asked to summarize some aspect of the lab. The TA will also recap, explaining the solution to the problem and why it is the recommended approach.

2.2: Working Groups

To increase productivity and improve the learning experience, you will be paired into groups of three or four. These lab groups will remain the same throughout the semester and each group member will receive the same grade for the work handed in. During the de-briefing, the TA will randomly select students to share answers they have obtained. Wrong answers will lead to a penalty on the lab grade for that group.The purpose here is to ensure everyone participates in the labs, that each group works together as a group.

Working in a group can present challenges. But learning how to work in a group is a valuable life skill, one that you will need in the “real” working world. Here are some points to remember.

•If you are slower than the others or are shy, try to speak out. Participate! Watching others work can lead you to believe you understand when in fact you don’t.

•If you are faster than the others or tend to dominate, slow down, allow others to work to their potential, remember to allow others a chance to speak.

•If needed – and wanted - help other group members to do things. But don’t do things for them. Doing things for others might seem the fast way at the time, but it will not be fast in the long run. Everyone must learn to work in the lab.

•Treat your team-mates with respect.

•If there are problems, try to work them out (respectfully!) within your group. If you find that difficult, please talk to the TA or your instructor.

The TA will determine group membership during this lab, by some random mechanism. Your first task as a group is to come up with a group name. Next, you will each sit at a computer, but close to each other.

3.1: Navigate an Excel worksheet

Each group member will do everything in Section 3.1. Remember to help your fellow team-mates if they are having trouble (and they want your help).

Open the Excel program which can be found in the Start menu (or on the Desktop). If a workbook does notopen then go to File→New→Blank Workbook.

An Excel worksheet has the columns denoted by letters and the rows by numbers. Cell B6 is the cell in

the second column and in the sixth row. A single worksheet is limited to 65,536 rows and 256 columns.

Data can be structured in an array. In Excel an array is group on contiguous cells, for example A1:A12

is an array of twelve cells in a single column. An array can extend across columns as well, A1:C2. This arraycovers 6 cells: A1, A2, B1, B2, C1, and C2.

3.2: Open a data set in Excel

Each group member should do everything in Section 3.2. Remember to help your fellow team-mates if they are having trouble (and they want your help).

Here we will simply go step by step to open a file. This process will work in the Z: directory. We will open the monkey data set in Excel. The monkey data set is in the S: directory.

•Click on File; click on Open.

•Navigate to the Stat 200 folder in the S: directory. If you are unfamiliar with doing this, ask theinstructor for help (S:\Stat200\Lab 1).

•Double click on monkeys.xls. This is a very small data set on 10 monkeys trained to help handicapped people.

•Save it to your personal drive (Z:) by clicking on File, then Save and navigating to Z:.

NOTE: Files with the extension .txt, .csv, and .xls can be easily opened in excel. These are the most commontypes of data files that you may see.

Monkey data set

Some disabled people have access to trained monkey helpers that can perform household tasks likeswitching things on and off. This data set gives the number of tasks each of nine monkeys can perform alongwith the number of years the monkeys have been working with disabled people.

Variables

•Name: Name of monkey,

•Years: Number of years the monkey has worked with disabled people, and

•Tasks: Number of tasks the monkey can perform.

Data Source

StatSci.org ( OzDASL, Journal of Rehabilitation Research and Development 28, No.2, Spring 1991, pages 91-96. The data was contributed by Rex Boggs, Glenmore StateHigh School, Rockhampton, Queensland, Australia.

Take a look at the raw data

Whenever you are working with a new data set it is good practice to take a look at the raw data. This

may seem obvious, but many people make serious mistakes when they do not do this step. Scanning the

data set, looking at the first and last 10 observations, or randomly selecting some observations to look at

are all good ways to get an idea of what the data actually look like. This type of observation may reveal

any misunderstanding you may have about the data, evoke questions you need to have answered before youcan analyze the data, or it may reveal problems with the data (e.g. missing information) that need to beaddressed in your analysis.

Please look at your raw data.

3.3: Summary Statistics

Each group member will do a different component of this section. Some of the components can be done as a group. The specifics are indicated on the sheet distributed by the TA. You’ll hand in this sheet AS A GROUP near the end of the lab.

Once we have looked at the data, the next step is to calculate some summary statistics and visualize the

data. We will use the years of service variable.

Average (Mean)

There are two ways to obtain the average. The first is as follows,

•In cell B11 type ‘=sum(b2 :b10)’. The sum function will add up the cells in column B. This isequivalent to typing ‘=b2 +b3 +b4 +b5 +b6 +b7 +b8 +b9 +b10’.

•In cell B12 type ‘=count(b2 :b10)’. This function counts the number of cells that contain some information or data.

•In cell B13 type ‘=b11/b12’. This will calculate the average number of years that a monkey helped a disabled person.

The second method uses the Excel functionaverage. In cell B14 type ‘=average(b2 :b10)’. The result in

cell B14 will equal the result in B13, which is 4.26 years.

Median

In cell B15 type ‘=median(b2 :b10)’. The median number of years that a monkey has helped a handicappedperson is 5 years. We can confirm this by,

•selecting cells (or array) A2 to C10,

•selecting Data→Sort and then selectingYearsfrom the sort by drop down box,

•selecting OK (This will order all the rows by the years of service - 0.4 to 10),

•identifying the middle observation - the fifth - which is Jo who served for 5 years.

Quartiles, percentiles, and Excel Help

Excel has a function for quartiles and percentiles. The first quartile is the 25th percentile (25% of theobservations are below and 75% of the observations are above). If we did not know that Excel had a quartilefunction, we would look up the function in the Excel help.

•Select Help→Microsoft Excel Help.

•In the help wizard typequartile.

•Select QUARTILE worksheet function. Here is the relevant information that is needed to understandhow to use the function followed by a simple example.

•Find the 25th, 50th, and 75thpercentiles. What do you notice about the 50thpercentile and themedian?

Note: Excel uses a slightly different convention from the book when calculating quartiles.

Min, max, variance, and standard deviation

Find the minimum, maximum, variance, and the standard deviation for the years of service. If you do notknow these Excel functions, you can find them using Excel Help.