Francis, UNL, 16 April 2015

Computational Statistics in MATLAB

By: Francis Ayimiah – Nterful

Department of Statistics - UNL

Outline

Part I

Introduction

What is MATLAB?

The MATLAB System

MATLAB Online Help

MATLAB Development Environment

Starting and Quitting MATLAB

MATLAB Desktop and Desktop Tools

Manipulating Arrays And Matrices In MATLAB

MATLAB Graphics

Programming With MATLAB

Part II

Sampling From Random Variables

Sampling from the standard distributions

Sampling from non-standard distribution

-Inverse Transform Sampling

Part I

1.Introduction

In this presentation, we will discuss the basics of how to use MATLAB along with the basic statistical aspects of it, including inverse transform sampling.

1.1What Is MATLAB?

  • MATLAB is a high-performance language for technical computing.
  • It is an integrated computing environment for numeric computation, visualization and programming. Typical uses include

-Math and computation

-Algorithm development

-Data acquisition

-Modeling, simulation, and visualization

-Scientific and engineering graphics

-Application development, including graphical user interface building.

  • MATLAB is an interactive system whose basic data element is a matrix. Programming features in MATLAB are similar to those of other computer languages; examples are functions, IF statements and FOR loops.
  • MATLAB provides GUI tools so the user can develop applications.
  • The name MATLAB stands for matrix (MAT) laboratory (LAB) for the reason that it was originally written to provide easy access to matrix software developed by the LINPACK and EISPACK projects.
  • Latest version of MATLAB releasedon February 12, 2015.
  • MATLAB features a family of add-on application-specific solutions called toolboxes. Toolboxes allow you to learn and apply specialized technology. They are comprehensive collections of MATLAB functions (M-files) that extend the MATLAB environment to solve particular classes of problems.
  • Areas in which toolboxes are available include:

-Signal processing

-Optimization

-Communication

-Control system

-System identification

-Neural networks

-Statistics (this include)

  • Probability distribution
  • Descriptive statistics
  • Hypothesis tests
  • Cluster analysis
  • Linear models
  • Nonlinear models
  • Multivariate statistics
  • Statistical plots
  • Design of experiment, and
  • Statistical process control

-Image processing, and many others.

  • There are numerous individuals that offer third party toolboxes in MATLAB. These can be found at the MATLAB webpage:

1.2The MATLAB System

  • The MATLAB system consists of five main parts. These are:

1.Development Environment – This is all the components that help you use MATLAB functions and files, see your results, and interact with data from other sources. It includes the MATLAB desktop and Command Window, a command history, an editor and debugger, browsers for viewing help, the workspace, files, and the search path.

The MATLAB development environment is available for the following operating systems:

-Microsoft-Window

-Macintosh

-UNIX / Linux

2.The MATLAB Mathematical Function Library – MATLAB contains a vast collection of computational algorithms that comprise of simple functions like sine, cosine, sum, etc, as well as sophisticated functions for matrix computation and manipulations, statistical analysis, signal processing and many others.

3.The MATLAB Language–At the heart of MATLAB is a high-level programming language with matrix as its basic entity. All the usual programming features such as control flow statements, functions, data structure, input-output, and object-oriented features are available.

4.MATLAB Graphics – MATLAB has extensive tools for displaying vectors and matrices as graphs, as well as annotating and printing these graphs. Handle Graphics is available through the MATLAB Language for MATLAB programmers to build custom user interfaces and applications.

5.The MATLAB Application Program Interface (API) - This includes a library of functions that allows you to write programs that interact with MATLAB. With this library, your C and Fortran programs,for example, can access MATLAB functions, access MATLAB data files, and call MATLAB as a computational engine.

1.3MATLAB Online Help

  • To view the online documentation, select MATLAB Help from the Help menu in MATLAB.(Further detail in appendix 1)

2.MATLAB Development Environment

As discussed above, the MATLAB development environment is the set of tools and facilities that help you use MATLAB functions and files.

2.1Starting MATLAB

  • On Windows platforms, double-click the MATLAB shortcut icon;, on your Windows desktop to start MATLAB.
  • On UNIX platforms, to start MATLAB, type matlabat the operating system prompt.
  • After starting MATLAB, the MATLAB desktop opens – see Figure 1 below:

Figure 1: MATLAB Desktop

2.2 Quitting MATLAB

  • To end your MATLAB session, select Exit MATLAB from the File menu in the desktop, or simply type quit in the Command Window and hit enter.
  • To execute specified functions each time a MATLAB quits, such as saving the workspace, you can create and run a finish.m script.

2.3MATLAB Desktop

  • The MATLAB desktop contains a number of tools. The tools include

-Command Window: Used to enter variables and run functions and M-files. You can run external programs from the MATLAB Command Window.

-Command History:Statements you enter in the Command Window are logged in the Command History. So, you can view previously run statements, copy and execute selected statements in the Command History.

-Help Browser: Use Help Browser to search and view documentation and demos for all MathWork products. You can also typedocor help in the Command Window to view documentation (e.g. doc plot).

-Workspace Browser: This consists of the set of variables (named arrays) built up during a MATLAB session and stored in memory. You add variables to the workspace by using functions, running M-files, and loading saved workspace.

-Editor/Debugger:Use the Editor/Debugger to create and debug M-files, which are programs you write to run MATLAB functions. The Editor/Debugger provides a graphical user interface for basic text editing, as well as for M-file debugging.

3.Manipulating Array and Matrices

3.1Array/Matrix operations

  • Like other computer languages, MATLAB provides high-level operators and functions for creating and manipulating arrays.
  • Arithmetic operations on arrays are done element-by-element. This means that addition and subtraction are the same for arrays and matrices, but that multiplicative operations are different. The list of operators includes

- Subtraction

+ Addition

.* Element-by-element multiplication

. / Element-by-element division

. \ Element-by-element left division

.^ Element-by-element power

.’Unconjugated array transpose

Examples

  • Addition and Subtraction

> a = 1:5 % vector containing the integers 1 to 10

> b = 3:7 % vector containing the integers 3 to 7

a+b =

4 6 8 10 12

  • Squaring a vector

t = [ 1 2 3 4 5 ]; % vector of values to square

m = t.^2; % square all the values

m=

1 2 9 16 25

  • Reshape an array

> a=1:6

a =

1 2 3 4 5 6

reshape(a,2,3) % reshape vector a to 2 x 3 matrix

ans =

1 3 5

2 4 6

3.2Building Tables

  • Array operations are useful for building tables. Similar to other familiar languages, MATLAB uses column-oriented analysis for multivariate statistical data. Each column in a data set represents a variable and each row an observation.
  • The (i,j)th element is the ith observation of the jth variable.

Example: Consider a dataset, D, with three variables: v1, v2, and v3 (making up the columns). For 5 observations, the array is given as follows

> D = [72 134 3.2; 81 201 3.5; 69 156 7.1; 82 148 2.4; 75 170 1.2];

D = 72 134 3.2

81 201 3.5

69 156 7.1

82 148 2.4

75 170 1.2

  • To obtain the mean and standard deviation of each column, we have

>mu = mean (D), sigma = std(D)

mu =

75.8 161.8 3.48

Sigma =

5.6303 25.499 2.2107

  • So, mean (v1) = 75.8, mean (v2) = 161.8, and mean (v3) = 3.48. Similarly, the standard deviations are std (v1) = 5.6303, std(v2) = 25.499, and std(v3) = 2.2107
  • Can also find correlation among variables, andmany others.

4MATLAB Graphics

4.1 2-D Plots

  • The basic 2-D plotting function in MATLAB is plot.(Further details in appendix 2)

Example 1: using the plotmatrix command

> x =randn(50,3); %Normally distributed random values

> y = x*[-1 2 1;2 0 1;1 -2 3]';

figure(1);

plotmatrix(y) % createsa matrix of subaxes; same as plotmatrix(y,y)

title('Matrix plot')

Example 2: Multiple plots in one figure

% subplot (nrows,ncols,plot_number). This is multiple plots in one figure

figure(2);

x=0:.1:2*pi; % x vector from 0 to 2*pi, dx = 0.1

subplot(2,2,1); % plot sine function

plot(x,sin(x)); title('sin(x)')

subplot(2,2,2); % plot cosine function

plot(x,cos(x)); title('cos(x)')

subplot(2,2,3) % plot negative exponential function

plot(x,exp(-x)); title('exp(-x)')

%Put all above curves together to form the 4th plot

subplot(2,2,4);

plot(x, sin(x),'k-', x, cos(x),'b+',x, exp(-x),'ro');

legend('sin(x)','cos(x)','exp(-x)')

4.23-D plot

Example

t=0:pi/10:2*pi; % a vector of t values from 0 to 2pi

% in increment of pi/10

[X,Y,Z]=cylinder(4*cos(t));% returns the x-, y-,and

%z-coordinates of the cylinder

figure(3);

subplot(2,2,1); mesh(X);

subplot(2,2,2); mesh(Y);

subplot(2,2,3); mesh(Z);

subplot(2,2,4); mesh(X,Y,Z);

% mesh produces wireframe surfaces that color only the lines connecting the defining points

5.Programming With MATLAB

MATLAB provides extensive programming features, with just a few mentioned here. This include

  • Flow control constructs such as if, for, while, continue, and break.
  • Scripts and Functions, which are called M-files. While Scripts do not accept input argument or return output arguments, Functions can accept input arguments and return our arguments.
  • Demonstration programs

Part II

6 Sampling from Random Variables

6.1Sampling from the standard distributions

  • The MATLAB Statistics Toolbox supports about 20 probability distributions. For each distribution, there are 5 associated functions. These are

-Probability distribution function (pdf)

-Cumulative distribution function (cdf)

-Inverse of the cumulative distribution function

-Random number generator

-Mean and variance as a function of the parameter

  • Table 6.1 lists some of the standard distributions supported by MATLAB and how to sample random values from them.
  • The MATLAB documentation lists many more distributions that can be simulated with MATLAB.
  • Using online resources, it is often easy to find support for a number of other common distributions.

Table 6.1: Examples of MATLAB functions for evaluating probability density, cumulative density and drawing random numbers

Distribution / PDF / CDF / Random # Generation
Normal
Uniform(continuous)
Beta
Exponential / normpdf
unifpdf
betapdf
exppdf / normcdf
unifcdf
betacdf
expcdf / norm
unifrnd
betarnd
exprnd
Uniform (discrete)
Binomial
Multinomial
Poisson / unidpdf
binopdf
mnpdf
poisspdf / unidcdf
binocdf
poisscdf / unidrnd
binornd
mnrnd
poissrnd
  • The Statistics Toolbox has functions for computing parameter estimates and confidence intervals of these data driven distributions.
  • As an illustration for some of these functions, we can use the MATLAB code below (Code 6.2) to visualize the Normal (µ, σ) distribution where µ = 100 and σ = 15.

-Let us assume that this distribution represents the observed variability of IQ coefficients in some population.

-The code shows how to display the probability density and the cumulative density.

-It also shows how to draw random values from this distribution and how to visualize the distribution of these random samples using thehist function.

Code 6.2: MATLAB code for visualizing the normal distribution

%% Explore the Normal distribution N(mu , sigma)

mu = 100; % the mean

sigma = 15; % the standard deviation

xmin = 70; % minimum x value for pdf and cdf plot

xmax = 130; % maximum x value for pdf and cdf plot

n = 100; % number of points on pdf and cdf plot

k = 10000; % number of random draws for histogram

% create a set of values ranging from xmin to xmax

x = linspace(xmin , xmax , n );

p = normpdf( x , mu , sigma ); % calculate the pdf

c = normcdf( x , mu , sigma ); % calculate the cdf

figure(4 ); clf; % create a new figure and clear the contents

subplot( 1,3,1 );

plot( x , p , 'k' );

xlabel('x' ); ylabel( 'pdf' );

title('Probability Density Function' );

subplot( 1,3,2 );

plot( x , c , 'k' );

xlabel('x' ); ylabel( 'cdf' );

title('Cumulative Density Function' );

% draw k random numbers from a N(mu, sigma)distribution

y = normrnd( mu , sigma , k , 1 );

subplot( 1,3,3 );

hist( y , 20 );

xlabel('x' ); ylabel( 'frequency' );

title('Histogram of random values' );

  • The code produces the output shown below (Figure 6.3).

Figure 6.3: Illustration of the Normal(µ, σ) distribution where µ = 100 and σ = 15

6.2Sampling from non-standard distribution

  • Here, we want to sample from a distribution that is not one of the standard distributions that is supported by MATLAB.
  • Let ; denote any cumulative distribution function (cdf) (continuous or not). (Properties of F in appendix 3)
  • Our objective is to generate (simulate) random variables X distributed as F; that is, wewant to simulate a random variable X such that
  • Define the generalized inverse of F, , via

6.2.1Inverse transform sampling

Theorem: (Inverse Transform Method)

  • Let be the cumulative density function (cdf) of our target variable X (continuous or not). Let be the inverse of this function, assuming that we can actually calculate this inverse. Define where has the continuous uniform distribution over the interval (0; 1). Then X is distributed as F, that is, .

Proof of theorem is in appendix 4.

  • Therefore, in order to generate a random variable X~F, we can generate U according to U(0,1) and then make the transformation X=F − 1(U)
  • The algorithm is

1. Draw U ∼Uniform (0, 1)

2. Set X = F−1 (U)

Example 1:

  • Suppose we want to sample random numbers from the exponential distribution. When the cumulative density function is Using some simple algebra, one can find the inverse of this function, which is .
  • This leads to the following sampling procedure to sample random numbers from an Exponential (λ) distribution:

1. Draw

2. Set

Code 6.4: MATLAB code for inverse transform sampling from exponential (

seed=12; rand('state',seed);

r=1000; % Let's take r samples

u=unifrnd(0,1,r,1); %Uniform (0,1);or can use rand

figure(5);

hist(u)

% this generates a fairly uniform diagram

Figure 6.5: Uniform distribution.

% let

x=-log(1-u)*2; %inverse cdf

%Alternative approach for inverse cdf

x2 = icdf('Exponential',u,2);

cd=cdf('Exponential',x2,2);

%Plotting histogram

figure(6); clf

subplot( 1,3,1 );

hist(x);

xlabel('x' ); ylabel( 'frequency' );

title('Histogram of random values-EDF');

% Actual distribution

Z=exprnd(2,r,1);

subplot( 1,3,2 );

hist( Z , 20);

xlabel('x' ); ylabel( 'frequency' );

title('Histogram of random values CDF');

%Plot CDF and EDF on the same graph

subplot( 1,3,3 );

tt=sort(x); %Sort random sample

mm=(1:r)/r; %Prob of rth sample

g=linspace(0,20);

w=expcdf(g,2);

plot(g,w,'k',tt,mm,'r');

legend({'CDF', 'EDF'})

xlabel('x' ); ylabel( 'Probability' );

title('CDF vs. EDF of random values ' );

Figure 6.6: Distribution graphs for the exponential(2)

Example 2: Bomial random variables with n = 9, p = 0.60;

  • ,
  • Let with linear search. Then in MATLAB, we have the following

Code 6.7: Generating Binomial RVs

n = 9; p=0.60; q = 1-p; r(1) = q^n; js=[1:n+1];

for j=1:n, r(j+1) = r(j)*p*(n-j+1)/(j*q); end

F = cumsum(r); %cumulative sum of r

K=10000; for k=1:K, X(k)=min(js(F>=rand))-1; end

figure(7);

subplot(1,2,1), hist(X),xlabel('x'),ylabel('Frequency')

title('Histogram of x')

subplot(1,2,2), plot(sort(X),[1:K]/K),xlabel('x'),ylabel('Probability')

title('EDF of x')

for k=1:K, X(k)=sum(rand(1,n)<p); end

Exercise 1 (Part I)

1. Adapt the Matlab program in Code 6.2 to illustrate the Beta(α, β) distribution where α = 2 and β = 3. Similarly, show the Exponential(λ) distribution where λ = 2.

2. Adapt the matlab program above to illustrate the Binomial(N, θ) distribution where N = 10 and θ = 0.7.

3. Write a demonstration program to sample 10 values from a Bernoulli(θ) distribution with θ = 0.3. Recall that there are only two possible outcomes, 0 and 1. With probability θ, the outcome is 1, and with probability 1 − θ, the outcome is 0. In other words, p(X = 1) = θ, and p(X = 0) = 1 − θ. In Matlab, you can simulate the Bernoulli distribution using the binomial distribution with N = 1. However, for the purpose of this exercise, please write the code needed to randomly sample Bernoulli distributed values that does not make use of the built-in binomial distribution. Using a seed of 21, verify that the first four observations agree with 1, 1, 0 and 1 respectively. Your MATLAB code could use the following line for the seeding:

seed=21; rand(’state’,seed); randn(’state’,seed);

Exercise 2 (Part II)

1.In this exercise, we want to generate the random variable x=(x1, x2) from Beta by inverse transform method.
a)Write down the algorithm for generating x.

b)Use MATLAB to construct two sets of N=1000 observations each from Beta (1,4). Set seed =131. Plot histograms of both sets on the same graph and print out the first 5 observations from each set. Observe that the first observations are x1=0.0251 and x2 = 0.0348.

c)Find the mean of each set generated.

7.References

1.

2. Computational Statistics Handbook with MATLAB; Wendy L. Martinez & Angel R. Martinez

1