Introduction to R

Any entity that exists in R is called an object. Some examples of an object are a vector, a matrix, and a function.

To create a vector , the following could be used:

> x = c(5,3,14)

> x

[1] 5 3 14

Here = is the assignment operator that assigns the vector (5, 3, 14) to the object x. The c function is a function that “combines” similar objects into a vector.

Generating patterned data

Using sequence operator :

> y = 1:8

> y

[1] 1 2 3 4 5 6 7 8

Using the seq function:

> x = seq(from=-1,to=1,by=0.2)

> x

[1] -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0

The argument by=0.2 in the function specifies the increment size of the sequence. Alternatively, one could use a different argument in the seq function:

> seq(from=-1,to=1,len=10)

[1] -1.0000000 -0.7777778 -0.5555556 -0.3333333 -0.1111111 0.1111111

[7] 0.3333333 0.5555556 0.7777778 1.0000000

The argument len= specifies the length of the sequence.

To generate repeated values, you may use the rep function. The first argument specifies the number or object to be repeated and the second argument determines the number of repetitions.

> rep(4,7)

[1] 4 4 4 4 4 4 4

> rep(1:5,3)

[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

> rep(1:5,c(2,1,3,2,2))

[1] 1 1 2 3 3 3 4 4 5 5

In the previous command, the vector c(2,1,3,2,2)) in the second argument tells R to produce a vector with two 1’s, one 2, three 3’s, two 4’s and two 5’s. This vector must be of the same length as the vector in the first argument.

To find the length of a vector, use the function length

> x = 5:12

> x

[1] 5 6 7 8 9 10 11 12

> length(x)

[1] 8

Creating matrices

To create a matrix, use the matrix function. The first argument in the function is a vector, with the additional arguments specifying the number of rows or columns, and other attributes.

> x = c(3,5,1,2,9,7)

> my.matrix = matrix(x,nrow=2)

> my.matrix

[,1] [,2] [,3]

[1,] 3 1 9

[2,] 5 2 7

> my.matrix = matrix(x,ncol=2)

> my.matrix

[,1] [,2]

[1,] 3 2

[2,] 5 9

[3,] 1 7

To transpose a matrix, use the t function:

> t(my.matrix)

[,1] [,2] [,3]

[1,] 3 5 1

[2,] 2 9 7

Note that using the vector given in the first argument, R will by default form a matrix column by column from left to right. To form a matrix row by row from top to bottom using the given vector, set the byrow= argument to TRUE, i.e., byrow=T.

> my.matrix = matrix(x,ncol=2,byrow=T)

> my.matrix

[,1] [,2]

[1,] 3 5

[2,] 1 2

[3,] 9 7

Arithmetic operators

Operator function

+ addition

- subtraction

* multiplication

/ division

%*% matrix multiplication

^ exponentiation

%% modulus

> x = 1:5

> x*3

[1] 3 6 9 12 15

> 4-x

[1] 3 2 1 0 -1

> x^2

[1] 1 4 9 16 25

> x%%3

[1] 1 2 0 1 2

> y = c(3,0,7,1,5)

> x-y

[1] -2 2 -4 3 0

> y/x

[1] 3.000000 0.000000 2.333333 0.250000 1.000000

> z = matrix(1,nrow=3,ncol=2)

> z

[,1] [,2]

[1,] 1 1

[2,] 1 1

[3,] 1 1

> u=c(4,7)

> z%*%u

[,1]

[1,] 11

[2,] 11

[3,] 11

Mathematical functions

Function description

abs absolute value

exp exponential (e to a power)

gamma gamma function

log logarithm

log10 logarithm (base 10)

sqrt square root

cos cosine

sin sine

tan tangent

acos arc cosine

> log(2)

[1] 0.6931472

> gamma(0.5)

[1] 1.772454

> sqrt(16)

[1] 4

A few mathematical functions for matrices

Function description

diag create diagonal matrix or extract diagonal values

solve solve system of linear equations; find inverse

t transpose

> diag(c(2,3,1))

[,1] [,2] [,3]

[1,] 2 0 0

[2,] 0 3 0

[3,] 0 0 1

> z = matrix(log(1:9),ncol=3)

> z

[,1] [,2] [,3]

[1,] 0.0000000 1.386294 1.945910

[2,] 0.6931472 1.609438 2.079442

[3,] 1.0986123 1.791759 2.197225

> diag(z)

[1] 0.000000 1.609438 2.197225

> solve(z)

[,1] [,2] [,3]

[1,] -5.973389 13.88403 -7.84961

[2,] 23.995963 -67.36519 42.50270

[3,] -16.581171 47.99193 -30.27953

> z%*%solve(z)

[,1] [,2] [,3]

[1,] 1.000000e+00 1.414147e-14 2.435552e-15

[2,] 2.435552e-15 1.000000e+00 -3.632511e-15

[3,] 2.706169e-16 -2.720046e-15 1.000000e+00

Note that because of the finite accuracy in the numerical computations, the matrix product of is not exactly equal to the identity matrix, but it is very close.

Functions for simple statistics

Function description

mean arithmetic mean

median median

min smallest value

max largest value

quantile quantiles

range min and max of a vector

sample random sample

sum arithmetic sum

var variance and covariance

> x = c(2,8,5)

> mean(x)

[1] 5

> var(x)

[1] 9

> quantile(x)

0% 25% 50% 75% 100%

2.0  3.5 5.0 6.5 8.0

Functions for Probability Distributions

All the functions for probability distributions begin with one of the letters d, p, q, r, followed by the name of distribution (which is abbreviated in R).

Density (d)

These functions evaluate the p.d.f. or p.m.f. of the specified distribution. The first argument is the value of x; the other arguments specify the parameters of the distribution.

Probability (p)

These functions evaluate the c.d.f. of the specified distribution. The first argument is the value of x; the other arguments specify the parameters of the distribution.

Quantile (q)

These functions provide the desired quantile (percentile/100) of the specified distribution. The first argument is the value of probability between 0 and 1; the other arguments specify the parameters of the distribution.

Random sample (r)

These functions generate a vector of random sample from the specified distribution. The first argument is the desired size of the random sample; the other arguments specify the parameters of the distribution.

Note: R uses slightly different forms of parameters or variables for some distributions. For example, see the negative binomial distribution below.

To find where X has a binomial distribution with n = 7 and p = 0.2, i.e., X ~ Bin(7, 0.2)

> dbinom(4,7,0.2)

[1] 0.028672

Note that the 2nd argument is the number of trials n, and the third argument is the probability of success p.

Recall that for the negative binomial distribution, X = number of trials required for the rth success. However, R uses a different variable: Y = number of failures in the sequence of trials where the last trial ends in the rth success. So to find the probability for a negative binomial random variable X with parameters r = 4 successes and p = 0.6,

> dnbinom(2,4,0.6)

[1] 0.20736

Note that the 1st argument is the value of , followed by r and p.

The p.d.f of gamma distribution, whose shape parameter is 0.5 and scale parameter is 3, at is

> dgamma(1,shape=0.5,scale=3)

[1] 0.2333993

The c.d.f. of the uniform distribution over the interval (2, 6) evaluated at x = 3 is

> punif(3,2,6)

[1] 0.25

The 0.6 quantile, i.e., the 60th percentile, of the exponential distribution with parameter is

> qexp(0.6,1/3)

[1] 2.748872

Note that the 2nd argument of the function is .

To generate a random sample of size from the normal distribution with parameter , and standard deviation

> rnorm(12,20,3)

[1] 20.39824 25.20233 18.39812 19.65888 26.37264 20.22977 19.52098

[8] 21.34009 20.21618 12.73368 18.79826 19.99685

The abbreviations of other special distributions are given in your textbook.

Functions for graphing

In order to plot a graph, the set of values of x and the set of values of y, both of which are in vector form, must be specified. The lengths of the vector x and vector y must be the same.

For example, to plot the graph of on the interval (-2, 2), first create a vector of 50 x values between -2 and 2, inclusive; then create the vector y and use the vectors in the plot command:

> x = seq(from=-2,to=2,len=50)

> y = exp(x)

> plot(x,y)

The command plot(x,y) gives a dot plot, i.e., the points specified by the vectors x and y are not connected by a smooth curve. To get a smooth curve of , add the argument type =”l”, where l stands for line.

> plot(x,y,type=”l”)

To add or overlay another (lined) graph on the plot, say the graph of , use the lines function:

> u = x^2

> lines(x,u)

> lines(x,u,lty=2)

The argument lty specifies the line type; lty = 1 gives a solid line (which is the same as that for plot function with type =”l”), lty = 2 or lty = 3 and so on give different types of dotted and dashed lines.

To overlay a set of points on the graph, use the points function:

> plot(x,y,type=”l”)

> u = x^2

> points(x,u)

To specify the dimensions of the graph (rather than settling for the default values determined by R), the following can be used:

> plot(x,y,type="l",xlim=c(-3,3),ylim=c(-1,10))

The arguments xlim=c(-3,3)and ylim=c(-1,10))set the limits of the x-axis to be -3 and 3, and the limits of the y-axis to be -1 and 10.

1