Student Lesson

Lesson Title:Sampling and Bias

Lessons Summary: You will use a map to gather some population data about Oregon for three samples. You will place the data in a spreadsheet to calculate basic statistics about each population set. Finally, you will identify potential bias in each sample by comparing your results to those of Oregon.

Lesson Objective:

  • Students will describe the effect a sample has on calculated statistics.
  • Students will describe geographic patterns
  • Students will explain geographic bias in a sample
  • Students will list potential biases when using a sample

Before you begin using this module, you will need to know about using a Web-based GIS viewer. You can do this by watching the tutorial video or working through the tutorial. The tutorial video, student activity, and Web-based GIS Tutorial Viewer can be found at “Modules” tab  “Tutorial” link. The activity works best with a high speed Internet connection.

Prior Skills: You will need to know how to turn layers on and off, use the ID tool and, zoom in and out of the map, toggle from layers to the legend, and perform a search (Boolean) query.

Remember, computer steps are indicated by a symbol and questions you need to answer are numbered.

  • 1) Open the Oregon Reference map in Internet Explorer:

(Note: Some of the needed functionality is not present in Firefox.)

  • 2) On the right of the screen (in the layers column) uncheck Lakes, Rivers, National Parks, Federal Lands, and County Names. Also make “Cities” the active layer. Press the Refresh Map button when you are finished.
  • 3) On the left of the screen, turn off the map inset.
  • 4) Note that all of the purple blobs represent incorporated cities. Based on your observations:

Q1) Describe the spatial pattern of most cities in Oregon. Where are most of the cities and what regions appear to have no cities?

  • 5) Let’s locate the large cities using a query:

A) Choose the query button from the left (It’s the one with the question mark.)

B) Fill out the query for to find cities with a total population of greater than 50000 by filling out

the boxes in the top row and pressing the “Add to Query String” button. Press “Execute” to run the query.

  • 6) On the bottom of the screen, you will see the list of cities. On the map, they will be highlighted in blue.

Q2) Describe the location of most of the large cities in Oregon. Are there any anomalies?

  • 7) Open a spreadsheet. Copy and paste the information provided about the cities from your query into the spreadsheet. Include the column titles.

  • 8) Perform a new query. Instead find all cities with a population less than or equal to 145.

Q3) Describe the patterns associated with the location of these cities. Are there any anomalies?

  • 9) Copy and paste the information provided about these cities into your spreadsheet. Leave at least four rows of space below your previous data.
  • 10) Perform one more query. This time, however, we want all of the cities in Lane County. Be careful to type in Lane (properly capitalized). Don’t forget to ‘Execute’ your query to get results.

Q4) Describe the location of LaneCounty. How does it compare to the location of our other samples?

  • 11) Copy and paste the information provided about these cities into your spreadsheet. Like before, leave at least four rows of space below your previous data.
  • 12) As is often the case, we have more data than we are able to understand by just looking at the table . To trim it down, we are going to delete many columns. (In Excel, right-click on the column letter and choose delete.) You need to DELETE the following columns. Go carefully!

Rec / FID / OBJECTID / FIRST_CLAS
STATE / AREANAME
MALE / FEMALE
MALES18_ / FEMALE18_ / MALE65_ / FEMALE65_
INDIAN_A / CHINESE / FILIPINO / JAPANESE / KOREAN / VIETNAMESE / HAWAIIAN / GUAM__CHAM / SAMOAN
ALLWHITE / ALLBLACK / ALLNATAM / ALLASIAN / RACE_OTHER / TOTALPOP / HISPANIC_A
PUERTO_RIC / CUBAN / TOTAL_POPU / LIVEALONE / HSEHOLD65_ / HSEHOLDW18 / HSEHOLDW65
TOTAL_HOUS / VACANT_HOU / OWNER_OCCU / RENTER_OCC / Shape_Leng / Shape_Area
  • 13) Here is a list of columns that you should have still have. Make the columns wide enough to easily read them.

FIRST_NAME / TOTAL_POP / MEDIANAGE / WHITE / BLACK / AMERICAN_I / ASIAN / MEXICAN / FAMILYSIZE / COUNTY
  • 14) In the rows directly below each set of data…

A) Calculate TOTALs for Population Data.

B) Calculate AVERAGEs for median age and Family Size[1]

C) Calculate the PERCENTs of population for each population ethnicity.[2]

Remember that all formulas begin with “=” in spreadsheets, and that you can click on cells rather than re-type numbers.

[SEE SCREEN SHOT NEXT PAGE…shows what you should get for the first set]

  • 15) Compare the results that you calculated for the three different data sets.

Q5) How are they (median age, diversity, number persons living together in a single house) different? How are they similar? Based on the location of the different samples, what does this tell you about Oregon?

  • 16) Think about the potential bias of each sample if this group of cities was used to represent Oregon in a study knowing that the statistics for the ENTIRE state of Oregon are as follows:

Population: 3,432,399

Median Age: 36.3

% White: 86.6

% Black: 1.6

% American Indian: 1.3

% Asian: 3.0

% Mexican: 6.3

Family Size: 3.02

Q6) List at least two potential biases that would have to be addressed for each sample if it was to be used to represent Oregon in a research study. One of your stated biases for each sample should be based on the statistics that you calculated. The other should be based on the geography of the sample.

  • 17) Turn in your responses to all six prompts and a copy of your spreadsheet.

Lesson Extension:

Find out what makes something a “city” in Oregon. What portion of Oregon’s population was not included in the map? Why might this be a problem for statisticians? Why do you think that they were not included?

What about where you live? If the general area where you live was used as a sample to represent your entire state, what biases would you expect to find? Age? Income? Race? Employment?

Career Extension:

A field biologist is one of the many careers in which scientists regularly complete research projects that involve samples. If a team of biologists need to determine how many Douglas Fir trees are in a particular 10,000 acre forest, how should the researchers proceed? Give at least three ideas for samples together with their potential drawbacks.

Career Extension Option:

  • Go to the website
  • Find a career from the list that is of interest to you. For example: In the Natural Resources list, there is a link to forestry. In the forestry link, there are several job descriptions.
  • List four ways GIS is used in the career you choose.
  • Conduct an internet search to find information about salary ranges and possible job locations.

1

GEOSTAC NSF-ATE # 0903330

[1] Technically, we might want to use a weighted average (so that cities with larger populations were ‘worth more’ in the results, but we will just use a straight average for the purpose of this activity.

[2] Note that the percentages will not add up to 100% because some people have more than one race and others have races not included in our subset.