Computer Assignment 1: the Random Walk

Physics 301

Computer Assignment 1: The Random Walk

Winter 2012

A “classic” problem in statistical physics is the random walk problem. It is directly related to many real world problem ranging from physics (diffusion of gases, electron motion in solids) to biology (various problems in genetics and ecology) to economics (price fluctuations).

Although the basic random walk problem can be completely and exactly analyzed using probability theory, we’re going to study the problem “empirically” with the use of random numbers. Many physics problems that lack an exact solution can nonetheless be well understood by looking at the statistics of many randomly generated events. This statistical approach to understanding complex situations is called the Monte Carlo method, named for the Monte Carlo Casino in Monaco where fortunes are won and loss in games of chance. For those of you who return for Physics 401, we’ll use the Monte Carlo method to understand ferromagnetism.

Part 1: The random walk in one dimension

The traditional statement of the random walk problem refers to a drunk trying to make his way down a narrow street. He’s so inebriated that he’s equally likely to take a step to the right (one way along the street) or the left (the opposite way). If he starts underneath the lamp post, what can we say about where he’s likely to end up? (Other than “in the gutter.”)

Let’s state the problem more precisely. Consider a particle that can move only on the x-axis, starting at x = 0. It can move only in steps of length 1, and for each step it has equal probability (50-50) of moving in the positive or negative direction. Where is it after N steps?

Clearly x0 = 0 before any step is taken (N = 0). After N = 1, the particle has a 50% chance of being at x1 = +1 and a 50% chance of being at x1 = –1. From each of these, it has a 50% chance of increasing x by 1 and a 50% chance of decreasing x by 1. The particle at x1 = +1 can go to +2 or 0; the particle at x1 = –1 can go to –2 or 0. You should convince yourself that x2 can be only +2, 0, or –2, and the probabilities for these are 25%, 50%, and 25%, respectively.

Note: Make sure you notice that even values of N allow only even values of xN. That is x100 might be 0, 2, 4, 6, ... but cannot be an odd number such as 7.

While it’s straightforward to use pencil and paper to track the possibilities for the first few steps, that quickly becomes unfeasible as N increases. Hence the role of the computer. We will attack the random walk program using Excel. The basic computation task is to generate a series of xi values, i = 0 to N, which we’ll place in one column of the spreadsheet from row 1 (i = 0) to row N+1 (i = N). Each cell is calculated from the cell above it using (assuming Column B)

B(i) = B(i–1) ±1

where a +1 and a –1 each have a 50% chance of occurring.

It’s easy to generate random numbers in Excel using the function RAND(). You have to have the parentheses, because this is a function, but there’s nothing inside the parentheses. Now RAND() generates a random number between 0 and 1. We need a random number that is +1 50% of the time and –1 50% of the time. First, note that 50% of the numbers generated by RAND are in the range 0.0–0.5, and 50% are between 0.5–1.0. Second, the function ROUND(x, 0) rounds whatever x is to the nearest integer (0 means zero decimal places – i.e., an integer). With a little thought, you should now be able use a combination of functions (that is, everything fits in 1 cell in the spreadsheet) to generate ±1 with 50% probability. You’ll want to generate something like 1000 random numbers this way and check that you really are generating equal numbers of 1 and –1.

Now let’s do something with it. Generate the series of integers 0 to 1000 in Column A. (One way is to use the Fill/Series command, under the Edit menu). Put a 0 in B1 as the starting point of the random walk, then insert a formula in cells B2 to B1001 to randomly add ±1 to the cell above it.

Now you have a 1000-step random walk. But looking at 1001 numbers isn’t easy. Further, this is just one of 21000 possible 1000-step walks. That’s ≈1000100 walks, so we can’t look at all of them. But we can look at a random sample of random walks. Copy Column B to the next 9 columns. Now you’ve got ten 1000-step random walks. And a nice way to look at the results is with a graph. So make an xy-graph with Column A as the x-axis and with 10 graphs displayed. It should look something like

As a general rule of thumb for making Excel graphs for physics:

· Turn OFF the display of the legend; we hardly ever need it.

· Turn OFF the horizontal grid lines.

· Format the plot area to No Fill rather than the default gray background.

For this graph, set the horizontal axis to end at 1000 and the vertical axis to go –100 to 100.

There’s one more thing to learn: How to recalculate the worksheet. With the calculations you might normally do in Excel, there’s only one answer. But with random numbers, you can get entirely different answers with different random numbers. You would like to recalculate the random walks with different random numbers, but without having to change anything in the worksheet. The command for doing that is Manually Calculate, which calculates a new random number every time it encounters RAND(). On a Mac, you recalculate the worksheet by pressing COMMAND + =. That is, pressing = while holding down COMMAND (the key with the 4-leaf-clover symbol). I don’t know on a PC, but find out by searching “manually calculate” in Excel Help.

After finding the Manually Calculate key, keep recalculating while watching the graphs. The goal is to get some awareness of how much variability there is. What’s changing every time? What features are always present?

Q1: Looking at the 10 graphs you get each time, and hitting Manually Calculate several times, do you see any graphs that end exactly at x1000 = 0? If so, does this occur frequently or rarely?

Q2: Looking at the 10 graphs you get each time, what would you estimate the average value of x1000 is? Give your reasoning. (Note: Your reasoning should be based on your data, not on what you think “ought” to happen.) Can you also explain why this “ought” to be the average?

Q3: As N increases, does the spread of the random walks – that is, the variation from most negative to most positive – increase, decrease, or stay about the same?

Q4: Does the spread increase linearly with N? Faster than linear? Less than linear? To answer this question, compare the spread – over many hits of Manually Calculate – at N = 1000 to the spread at N = 100. Because there’s no spread at N = 0, a linear increase in spread would mean that the N = 1000 spread is 10 times the N = 100 spread.

Note: Q1–Q4 are asking for your conclusions based simply on LOOKING at the graphs. You shouldn’t be doing any calculations. Drawing conclusions from graphs is an important skill.

For Part 1, turn in 1 graph showing the ten random walks and the answers to questions Q1 – Q4.

Part 2: The spread of random walks in one dimension

In Part 1 you should have found that the average value of x, denoted is zero, which is not surprising, but that individual particles wander farther and farther from zero as N increases. This is the process of diffusion, where a substance initially highly concentrated at one point “spreads out” as time passes. What can we learn about diffusion from studying the random walk?

To answer this question, we’re going to need lots more data. You’ve already used 10,000 cells (1000 x 10) to track 10 particles. It’s clearly not feasible to extend this to hundreds or thousands of particles. Fortunately, Excel has some powerful tools that will let us substantially increase the power of our calculations – and help make you a “power user” of Excel.

The calculation in Excel of a series of integers with B(i) = B(i–1) + 1 is an iteration, the same calculation done over and over. The random walk calculation of Part 1, B(i) = B(i–1) ±1, was also an iteration. In normal spreadsheet calculations, each iteration –each turn of the crank – gets its own cell. Thus very large iterative calculations require very large – often unfeasible – numbers of cells. Suppose we don’t want to know all the intermediate values, only the very last value of a series of iterations. As an example, suppose we want to know the sum of all the integers up to 50: 1 + 2 + 3 + ... + 50? You could easily do this in Excel, but taking up quite a few rows when all you really want to know is the very last value. Summing integers to 10,000 would be quite a pain.

Open a new spreadsheet, and in cell A2 type the formula = A2 + 1. This instantly generates an error message that you have a circular reference, that the formula in A2 refers to A2. Interestingly, we can get around this restriction. I’ll explain the procedure for a Mac; a PC will be similar, but you PC users will have to figure it our. On a Mac, go to Excel Preferences (under the Excel menu) and select Calculation. You’ll see that Excel, by default, is set to Automatic, meaning that the spreadsheet automatically recomputes each time you change anything. Change this to Manual calculation, meaning that Excel will now not do any calculations until you press the Manually Calculate command you learned about in Part 1. Also check the Iteration box in the same window, and set Maximum Iterations to 50. Close the Preferences window.

Back to your spreadsheet. First, put a 0 in cell A1. Then change the cell A2 formula to

= IF(A1=0, 0 , A2+1)

Cell A1 is now the switch. If A1=0, cell A2 will be initialized to 0. If A1=1 (or any other value), cell A2 executes “A2+1”. With the Iterate box checked, what was a circular reference will now be calculated as “take what’s in A2, add 1 to it, but the result back into A2 as a new or updated value.” It will do this for the number of times you’ve set as Maximum Iterations.

To “run” this procedure, first type 0 in A1, then issue the Manually Calculate command. A2 should be 0. Change A1 to 1, and Manually Calculate again. By iterating, you’ve added 1 to the initial 0 fifty times, so A2 should now be 50. You’ve made an Iteration Counter! Change Maximum Iterations to 500, reset the counter with A1=0 and Calculate Now, then run it with A1=1 and Calculate Now. A2 will count to 500 faster than you can see.

Note: If you know a programming language, you’ll recognize that we’re doing the same thing you would do in a DO loop, FOR loop, WHILE loop, or some other loop structure. Excel is not a language, but the Iterate procedure allows you to do some of the things you might do with a loop.

To sum integers, we want to add the current value of the iteration to an existing sum. We need a third cell. In A3, enter the formula

= IF(A1=0, 0 , A3+A2)

The A1 switch initializes A3 to 0. Then as A2 increases 1, 2, 3, ..., that value is added to the growing sum in A3. Verify that the sum 0 + 1 + 2 + ... + 50 = 1275.

Hint 1: To better see what’s happening, set Maximum Iterations to 1. Initialize the counter, then set A1=1 and press Manually Calculate. It will do 1 iteration, adding 0 + 1 to get 1. Without reinitializing, press Manually Calculate again. Excel will do one more iteration, increasing A2 to 2 and adding it to the 1 that was already in A3, giving 3. Another Manually Calculate will increase A2 to 3 and A3 to 6, and so on. When building a spreadsheet that uses iteration, it’s often good to iterate one step at a time to verify that things are behaving properly.

Hint 2: If you type something wrong, things can go horribly wrong in an iteration, sometimes getting stuck inside the iteration. To “break out” and halt the iterations on a Mac, type COMMAND + .. That is, hit the period key . while holding down COMMAND. It will be something similar on a PC, but you might want to experiment to find out

Q5: Build a 3-cell spreadsheet to calculate factorials. Report the formula you enter in each cell. Calculate 50! and 120!. You can check the first on your calculator, but the second is probably a larger number than your calculator can handle.

What does this have to do with random walks? The random walk calculation that took a 1001-row-long column in Part 1 can be collapsed into 1 cell that reports the final value x1000! Or, by simply changing the value of Maximum Iterations to N, the final value xN for any N.

Build a spreadsheet with

· Cell A1 being the 0 (to initialize) or 1 (to run) switch.

· Cell A2 being an iteration counter, as above.

· Cells B1 to B1000 calculating 1000 random walks by adding ±1 to the previous value. That is, each cell uses the power of iteration to compute the final position of an N-step random walk. The 1000 cells will compute xN for 1000 independent random walks.

· Cell A3 calculating as AVERAGE(B1:B1000).

· Set A3 to show 2 decimal places, the other cells to show 0 decimal places (integers).

Now there are so many calculation being performed, slowing down the spreadsheet, that you’ll be able to see the iteration counter increasing. The cell is a check that things are OK; the average of 1000 random walks should never stray very far from 0.00. If it does, something is wrong.