COMPUTER LAB 6

EARTH SYSTEMS SCIENCE II

PG251, Spring 2011

Lab 6. Data

In this lab we will look at data and metadata from meteorological stations in the Russian Arctic, and do some calculations with that data in EXCEL.

Part 1. Introduction

1a. Data.Data, along with theoretical analysis and modeling, is one of the cornerstones of science. Another word that is usually synonymous with data in earth and environmental sciences is “observations.” These refer to pieces of information that are obtained about something in the environment. The observation can be made using an instrument (e.g. measuringthe temperature using a thermometer) or without any instrument (e.g. measuring the percentage of the sky covered with cumulus clouds by estimating it visually).

A dataset is a group of data that go together. How groups of data are combined is subjective – it depends on who is organizing the data, and what the purpose is. For example, one dataset that we will use for this lab contains observations from many meteorological stations across the Russian Arctic. The data from any one of these stations could have been considered a dataset in its own right.

1b. Missing Data. In many environmental data sets, there are missing data (also called missing observations or missing values). This can occur for a number of reasons, such as instrument failure, (human) observer failure, weather conditions, etc. In the file with the data, missing data should not just be left blank. If one did so, it would be impossible for anyone looking at the data file to know whether the data are truly missing, or the data file was damaged during transmission over the internet, or someone made a typo. There is usually a “missing value” code that is used to represent missing values. The code is usually a numerical value that can not possibly be mistaken for an actual value. For example, if one has a data set of daily precipitation measured at a weather station in some town, in units of inches per day, a reasonable choice for the missing value code would be “999.9”. This is reasonable because “999.9” can not possibly occur as an actual value, and because it is easy to see such a number when looking at the data file.

1c. Metadata. This word may be less familiar to you, but to folks in science it is very familiar. Metadata is AS IMPORTANT AS DATA!Metadata is information, or data, about the data. Here’s an example of a type of metadata that is obvious: If someone tells you a temperature that was measured using a thermometer, that data is useless without the metadata telling you where and when that measurement was made.

Other types of metadata that may be less obvious are often very important, such as what type of instrument was used, what the uncertainty range of the instrument is, what height above the ground the measurement made, what type of surface is present at the measurement site, or what types of trees or buildings are near the measurement site. Depending on what is being measured, there are many possible types of information that could be included in metadata.Metadata can help the user decide whether the data are “good.” What is “good” data? If the data are sufficiently accurate for the user’s needs. For example, the temperature data from the roof at HunterCollege differ from the data in Central Park because of the difference between the surface material on the roof compared to the environment in the park. So, on sunny days (even in winter), the daytime temperatures on the roof are higher than those in Central Park (night time, dawn, and dusk temperatures are probably similar). This may preclude the use of our roof data for certain purposes.

Also, if and how things have changed over time can be important. Was the station, or measurement site, moved? Was the measurement technique changed? Was the type of instrument changed? Did the surrounding trees, buildings, or surface change over time? If any of these answers are “yes”, when and how often did these changes occur?Such changes can introduce “artificial trends” into the data. For example, if there is a temperature record for a site going back to 1950, if the weather station moved to a lower elevation spot(even if it’s nearby) in 1982, this may introduce a temperature increase in the record. Someone analyzing that record might misinterpret the warming as a climate change signal, when in fact it has absolutely nothing to do with any environmental change at all.

Another common problem with “artificial trends” is related to the urban heat island effect. There are many cases where temperature records go back more than 50 years. Many places have grown from very small villages to small cities during that time. Even if the weather station remains in the same location during that time, it might show that temperatures have been warming. But, the warming may be due to the growth of the urban heat island effect, rather than any real climatic change.

Additional problems may be associated with non-instrumental observations. How many different observers were there? Were they all trained the same way? One person’s estimate of 80% cloud might be observed as 70% cloud cover by another. If the observer changed at one particular time in the past, there might be an artificial trend introduced into the record. If there are different observers during different days of the week (or different seasons), then the data might appear to show that some days (or some seasons) are cloudier than other days.

In the business of global climate change analysis, a lot of resources are put towards trying to eliminate these sorts of “artificial trends” in the data, so that the data used by researchers represent actual climatic fluctuations. Procedures used to identify and eliminate questionable data values are called “quality control”, or “quality assurance/quality control”, or “QA/QC.”

Part 2. Working with Meteorological Observations from the Russian Arctic available from the National Snow and Ice Data Center (NSIDC)

1.a. Data. URL for the data and metadata:

(you will not have to download the data, I will provide it to you. However, you will have to use this URL to look at metadata).

For this exercise, we will be looking at the data, and metadata, from a dataset called “Meteorological Data from the Russian Arctic, 1961-2000.” This is a dataset provided by the National Snow and Ice Data Center (NSIDC) ( which is also the location of the WorldDataCenter for Glaciology ( World Data Centers were created as a way for all researchers across the globe to have easy access to quality-controlled data. NSIDC houses a lot of data related to the cryosphere and to polar regions. They are the professionals when it comes to data distribution and metadata, and so we will be looking at their web site as an example of excellent metadata. Figure 1 shows a map of Asia with blue squares at the locations of stations for which data is available in this data set (I copied and pasted this from the NSIDC web site).

Figure 1. Station Locations. Map from NSIDC web site.

2a. Metadata. To look at the metadata for this dataset, go to the URL given at the top of part 2 of this lab. On this page there is a brief summary of the metadata. There are also three additional links, which are labeled “Documentation”, “Access Data”, and “View Metadata Record.” For this lab we will not link to “Access Data”, because I have already done that and will provide the data to you. If you like, you can briefly link to “View Metadata Record” to see how NSIDC set up its standard metadata form.

Exercise 2a (45%). For this part of the assignment you will link to “Documentation.” Here you will find a wealth of information about this data set. You are to write a description of the metadata that is provided here for this data set. Include in your response descriptions of the following sections of the web page: Summary, Overview Table, Detailed Data Description, Table 4, and the Quality Control Methods. Also, mention what the missing value code is for air temperature.The length of your response should be 400-600 words(600 words is approximately 1-page, single spaced, using 12-point font). You should hand in this portion of the exercise in a MS WORD file (or equivalent).

2b. Data Analysis (45%). In this part of the exercise you will import data from one station into EXCEL, calculate monthly mean temperatures, and make a chart.Each student will be assigned one station. Your assignment is to make a chart that looks similar to figure 2 below (you are not required to include the standard deviation bars). I made this chart from one of the stations, but I am not revealing it’s name.

Figure 2. Annual (ANN) and monthly mean temperatures from “sta-name”. Lat = ??, Lon - ??. Bars show mean values, and vertical lines on the bars show the standard deviations.

Here are step-by-step instructions for making the chart. You may know another way to do it. If you do, feel free to go ahead and do it, but make sure it gives the correct results!

  1. Get the data file. Save the data file that I sent to you via Email on your disc someplace. Normally you would have to download the file from the website yourself, but I am saving you the trouble.
  2. Import Data into EXCEL. Open a blank EXCEL spreadsheet, and select File/Open. In the window, where it says “files of type” select “all files.” Go to the folder where you saved the file, and open it using the “fixed width” option. The data should appear in your spreadsheet.
  3. Look at the data. With relatively small files such as these, it is possible to just briefly “eyeball” the data to see if anything major went wrong. Do all the columns have numbers? Does anything stand out indicatingthat something is wrong just by scrolling through this file? There should not be any problems here, but sometimes a brief look can help you spot a problem in a file, and can sometimes save you hours of wasted time trying to analyze data from a bad file.
  4. Enter column names. Using the information in the NSIDC metadata, you should be able to figure out what information is contained in each column. The columns you are interested in are: year, month, latitude, longitude, and air temperature. Insert 5-10 lines at the top of your EXCEL file, and in one of the empty rows type in the column headers in the correct columns. Include at least the ones mentioned here, but you can include others if you like.
  5. Count missing air temperature values. In one of your inserted rows, which should be blank at this point, choose a cell in which you will type in an EXCEL formula that counts how many missing values there are in your monthly air temperature record.The formula that I used in a sample file looks something like this:“=COUNTIF(I6:I485,">999")”. Then, in a nearby cell, calculate the fraction of values that are missing. Here’s a hint on how to do that: the formula to count how many total values there are is “=COUNT(I6:I485).”
  6. Make 12 new columns, one for each month. To the right of the air temperature column, select the next 12 columns, and then click on “insert/columns”. This should give you 12 empty columns following the air temperature. Type in the name of each month, or just the first letter of each month, in the 12 new columns, so you will have one column for each month. In each column you will use an EXCEL formula to copy only the temperature values that have data for a particular month, and to keep all other rows blank. I used the following formula in my sample spreadsheet:“=IF(AND($C6=1,$I6<999),$I6,"")”. The part of the code that looks like “AND($C6=1,$I6<999)” chooses only rows that have column 6 (month) value equal to 1 (for January) and air temperature value less than 999. This is important to make sure that no missing value codes will be included in the monthly means. The “IF” statement copies the value in cell I6 for those rows that meet the criteria specified in the “AND” statement; for all other rows, the IF statement will put a NULL value, which is specified by two double-quotation marks (“”). Enter the formula in the first row of each column so that each column has the data for its own month, and then copy that row of formulas down to the last row of the data file.
  7. Calculate monthly mean values. In one of the blank rows at the top of the data file, use an EXCEL formula to calculate the mean of all the values in each monthly column. My formula looks like this: “=AVERAGE(J6:J485)”.
  8. Calculate the annual mean value. Use the “AVERAGE” function to calculate the annual mean value from all the monthly mean values. Put this value in the cell immediately before (to the left of) the January monthly mean value.
  9. Make a chart of annual and monthly mean values similar to figure 2 above. Select the row that has the month names, and the row that has the monthly mean values, and then click on the chart wizard. Make a column chart, and then fix it up so it looks nice.
  10. (Extra Credit) Make the chart show the monthly standard deviations, similar to what I did in figure 2 above.(I am not providing instructions for this.)
  11. (Extra Credit) Make the chart be a double-column chart that shows both the mean, and the median, for each month. (I am not providing instructions for this.)
  12. (Extra Credit) Make the chart be a double-column chart that shows the means for two periods: 1961-1980 and 1981-200 (see my example in figure 3). Explain whether this chart tells you anything about temperature changes at your station.

Extra credits up to 3% each

Figure 3. Annual (ANN) and monthly mean temperatures from “sta-name”. Lat = ??, Lon - ??. Comparison of mean from 1961-1980 versus 1981-2000

You are to hand in your EXCEL spreadsheet, and also copy and paste a copy of your chart into the WORD file that you hand in.

2c (10%). Answer the following questions in your WORD file.

2c1. What is the station name, latitude, and longitude of your station?

2c2. In which of the three climatic categories does your station fit, according to the following definitions?If you think that your station fits into an ecotone, explain why.

(a) Tundra / Arctic (mean temperature below 10 Cduring all months)

(b)Taiga / Subarctic (mean temperature above 10C for a maximum of 4 months, and below freezing for at least one month)?

(c) Temperate (neither Tundra nor Taiga)

1

ESS II, Spring 2011, Lab 6