Up and Running with R

Ann Arbor Chapter,

American Statistical Association (ASA)

Instructor:

Brady T. West

()

September 28, 2010


1. Introduction: What is R?

The software package known as R is an interactive computing language and environment for statistical analysis, computing, and graphics. R is an open source software package: the source code behind the software is free for all to look at / modify / play around with, and R in fact grows by leaps and bounds as people from all fields develop new functions for use within R’s computing environment. This is part of what makes R extremely useful! Several extremely complex statistical routines not available in other software packages have been programmed in R, and these routines are freely available for use by anyone.

R is completely interactive; users type commands and program functions as they go. The software is extremely similar in many ways to the commercial software package S-Plus, and offers many of the same features. R, however, can be downloaded for free, while S-Plus is a commercial package that costs money. S-Plus may be slightly easier to use than R, but after this workshop, you should be familiar enough with R and how it functions to do pretty much anything that you would like to do without a hassle!

The software provides users with a wide array of powerful and enlightening graphical techniques, and this is why many researchers love using R; the graphical capabilities are tremendous, and easy to implement. Once you are able to grasp how to work with R’s graphical facilities, you will have a limitless supply of graphical tools at your fingertips that will enhance the appearance of your research presentations in many ways.

We strongly encourage you to visit the central web site behind the R project, which I will frequently refer to throughout this workshop:

http://www.r-project.org/

Here you will find links for downloading R, downloading additional packages for R, and everything else that you would like to know about the software or the people behind it.


2. How to Obtain R

The R Project Web Page

At the R Project Web Page, you will find a variety of information about the R Project, which you can peruse at your leisure. The most important link will appear at the left hand side of the screen, under the “Download” heading. Click on the CRAN link (Comprehensive R Archive Network), and after you choose one of the U.S. mirrors (http://cran.stat.ucla.edu/ is recommended), you will be taken to the page that you will use to download everything R-related.

Once you find the CRAN web page, take the following steps to obtain R:

1.  Click on the “R Binaries” link on the left-hand side of the page under the “Software” heading.

2.  Click on the folder that best describes your operating system.

3.  When using Windows, click on the “base” subdirectory. This will allow you to download the base R package.

4.  Click the “Download R 2.X.X for Windows” link. R is updated quite frequently, and the version number is always changing (at the time of this writing, Version 2.11.1 is available). Save the .exe file somewhere on your computer.

5.  Double-click on the .exe file once it has been downloaded. A wizard will appear that will guide you through the setup of the R software on your machine.

6.  Once you are finished, you should have an R icon on your desktop that gives you a shortcut to the R system. Double-click on this icon, and you are ready to go!

Adding Packages to R

At step 3 above, you also have the option of clicking on “contrib” subdirectory. Doing this will allow you to download additional contributed packages in R. So what exactly are “additional contributed packages”? As mentioned in the introduction, R is an open source software package, meaning users of R are free to explore the code behind the software and write their own new code. Several statisticians and researchers have written additional packages for R that perform complex analyses that are not very common, and in order to use these packages and the functions within them, you need to first download them. The base R package comes with several additional packages, but odds are that you will discover an uncommon analysis technique in your research that requires you to install an additional package that is not available with the base package. There are many additional contributed packages. Don’t hesitate to explore the contributed packages to see if someone has developed a package that will allow you to implement a technique that you are interested in!

To download contributed packages, follow steps 1 and 2 above, and then click on the “contrib” link. Then, follow these steps:

1.  Select the version of R that you are using (the newest version for Windows at the time of this workshop is Version 2.11.1).

2.  Scroll through the list of contributed packages (in .zip format), and click on the package that you would like to download. You can find descriptions of all of these contributed packages and the techniques implemented within them by clicking on the “Packages” link under the “Software” heading on the CRAN web page. This page will also have links to help manuals for the packages.

3.  Save the .zip file in a directory on your machine that you can remember.

4.  When using R, select Install package(s) from local zip files… from the Packages menu. Locate the .zip file for the package that you downloaded onto your machine, click on Open, and R will install that package so that it is ready for use.

5.  The package will now be ready to use when you start R!

FAQ’s on the CRAN Web Page

Under the “Documentation” heading on the left-hand side of the CRAN web page, click on the “FAQs” link. This will allow you to see an FAQ page that will answer many of the most commonly asked questions about R. You will find that this section will provide answers to many of your questions, whether they are simple or difficult.

Searching on the CRAN Web Page

Under the “CRAN” heading on the left-hand side of the CRAN web page, you can click on the “Search” link. Although there is no formal search engine on the CRAN web page, this will take you to a set of links allowing you to search the R archives (manuals, mail, help files, etc.) for anything that you would like. This is often useful if you are faced with a tough analysis question, and you want to see if another R user has addressed the question before.

Starting R / Loading Contributed Packages

At this point (if you haven’t already), you should be able to start R! If you asked for a shortcut to R to be created on your desktop, simply double click on the R icon to start R. This will open the RGui (Graphical User Interface). You should see a window inside the RGui containing the R Console. This is where you will specify all of your commands and programs interactively, at the red command prompt.

For an example command, we’ll load a contributed package into R for use. Let’s download the “quantreg” package from the CRAN mirror and save it to the desktop, and then install the package as described above. After the package has been installed, simply type library(quantreg) at the command prompt:

> library(quantreg)

Press enter after you type this command to submit the command to R. If you don’t see anything aside from another command prompt, the library was loaded successfully, and you can use all of the functions associated with it! If you see the error message

Error in library(quantreg) : There is no package called 'quantreg'

you did not extract the quantreg package correctly (see pages 4-5). A contributed package must be downloaded and extracted into the R library folder correctly in order for you to use it.

This is how you load contributed packages into R for your personal use. When you submit a command to R, you will either see nothing but another command prompt (good), a result (good), or an error message (bad).

An even quicker way to install packages is to simply select “Install package(s)…” from the Packages menu. You can pick a CRAN mirror, and then directly install a package and all of its related components. This is probably the quickest way to install a package. You would still need to load the package in order to use its functions.

At any point in a given R session, you can submit the command

> installed.packages()

to view packages that have been installed.

You are now ready to use R!


3. Help Tools

In most well-written statistical software packages, help is never far away. This holds true for R. Although the help is somewhat technical in nature and requires a good understanding of the R language, it is very easy to access.

Once you’ve gained some experience in working with the R language, 90% of your help questions will be directed at how particular functions in R work, what arguments they take, etc. For help on ANY function in R that is a part of a package that has already been loaded, simply type and submit

> help(function.name)

in the R console, where function.name is the name of the function that you would like to see a help window for. Try typing help(lm)!

If you have typed the correct name of a function which belongs to a package that has already been loaded into R, you will see a help window pop up that describes the function, its arguments, what the function returns, and also presents some examples of using the function. Often times, there will be contact information for the person/people who wrote the function.

If you would like to see a list of all of the functions that come with the base R package, including brief descriptions of each, you can simply type

> library(help = “base”)

to generate the list.

Hint: Don’t forget, R is an open source language! If you want to see exactly how a given function has been written, simply type

> fix(function.name)

to see the code in a program editor. You can copy it, update it, and do whatever else you would like with it. Just make sure not to save any changes to the code behind a function unless you know that they will work!!!

Another easy way of obtaining help via the Internet is to type and submit

> help.start()

Doing so will open up a web-based help system that is very easy to navigate.

A third and obvious way to obtain help is via the Help menu when you are working in the R Console. Here you will find FAQ’s, help with navigating the console, and most importantly the official R manuals from the authors of R themselves. Again, these are somewhat technical in nature, but very useful once you have been working with R a lot. I would recommend the “Introduction to R” manual very highly.

Finally, don’t hesitate to contact the Center for Statistical Consultation and Research, or CSCAR (734.764.7828; , ) if you need further assistance with performing your analyses in R!


4. Importing / Exporting Data Sets

The “bank2” Data Set

All of the following examples of using R will be demonstrated using a data set that appears in a variety of formats in the archive at www.umich.edu/~bwest. The “bank2” data set contains a variety of information on each of the 474 employees that work for a large bank. The most important first step in using R for statistical analysis is of course to import a data set!

Objects in R

Before you can successfully import a data set, you need to know about objects in R. The entire R computing environment is based on objects. What exactly is an object? Objects take numerous forms:

·  Numbers

·  Vectors of numbers

·  Matrices of numbers

·  Results of analyses

·  Data sets

·  Many others!

The operator <- is used in R to assign numbers / vectors / matrices / data sets / results of analyses to objects with names of your choice when you are interactively typing commands at the R command prompt. For example, typing

> nine <- 9

will create an object called nine. In this case, we have an object that is nothing more than a number (9). You can “look” at an object by simply typing the name of the object at the command prompt:

> nine

[1] 9

R, in this case, returns the value of the object (a number). Many objects (such as results of analyses) are much more complex, and there are ways to look at specific aspects of objects. Fields within objects (e.g. variables within data set objects, or parts of result objects, such as the estimated regression coefficients that come from a regression analysis) can be accessed using the object$field command. Suppose we run a regression analysis, and then want to investigate the resulting coefficients. Submitting the command

> fit <- lm(mo.fail ~ lc, data)

tells R to fit a simple linear regression model to two variables in a data set object named data, using the lm() function for general linear modeling. We are trying to predict the mo.fail variable with the lc variable, and storing the results of the regression analysis in an object called fit. This object contains the results of a regression analysis. So what does it look like?

> fit

Call:

lm(formula = mo.fail ~ lc, data = data)

Coefficients:

(Intercept) lc

64.139 -9.195

This is not very exciting on the surface. But there is more to this object than meets the eye. For example, suppose we wanted to see the regression coefficients that are a part of this “results” object: