17-Incorporating Outside Data

  1. Open excel. Click Open, and navigate to the text or delimited file. We are using the file: Ozone_Data_Indio - Jackson Street_2008-2010.xlsx
  2. Use Autofilter to look through the data. This file contains hourly values by date, and we will only incorporate the MAX column into our database. Look carefully at the columns we will import (date and MAX) for error codes and blanks, just so you know they are there.
  3. Since we will be relying on the Date field, ensure that
  4. there are no duplicates (there will not be in AirNowTech data, but there might be in other outside data). The easiest way to check for duplicates is to use Excel, click on Data, Check for Duplicates, make sure the box is checked next to My data has headers, then Uncheck All (because we only want to check for duplicates in the Date field), then check the Date column. Excel will tell you that there are no duplicates. If you think there MIGHT be duplicates, refer back to the Data Cleaning exercise, because this tool will just remove rows and tell you how many duplicate rows it removed rather than asking you first.
  5. Check for blanks using Autofilter (check the box next to Blanks). You will see that there is one blank row that contains the hourly maxima. This is ok as long as we know what it is, as since we will use Date as the key field it will not import this row, which is fine as it is not a date row.
  6. Make a copy of the database you have been using and rename it 17-Incorporating_Outside_Data (or similar) and open it. Click External Data, Import, Excel

Click Browse, and navigate to the excel file. Click import to a NEW table. Click OK.

  1. Click First Row Contains Column Headings. You can import all the fields and delete columns later in Access, or click Do Not Import (Skip) next to every field except date and MAX. (To make it faster to go through the columns and check the Skip box, move the mouse slightly to the right and click to select each column then click Alt-S until you get to the MAX field.
  2. Tell Access that the Date field is a Date type that is Indexed, No Duplicates:
  3. and that the MAX field is a double type.
  4. Tell Access to use Date as a primary (key) field:

Click OK.

  1. Tell Access to name the new table Indio-Jackson. Click Next.
  2. You will see this error box, which is okay! It is because of the last blank row that contains the max hourly values. If you want to look at that row, refer back to the excel file.
  3. Open the new table, and you will see that it contains two columns: date, and MAX. the first rows contain -999 which is fine, as we can eliminate them in any queries. If you click on the MAX field, you can see the range of values and they look valid except for the -999:

  1. Open the query qry_2008>MDL_by_month, and then Save As qry_2009-Indio&MySite>MDL_by_month (because we will modify it to add the new data). Change to Design View. In the query design pane, right click and Show Tables, and double click Indio-Jackson to add it.
  2. Change the second column first field to MySiteMax2008_O3: O3
  3. Change the Avg to Max in that same column
  4. Change the date range to 2009 data, because as you have seen, there are only -999 values in 2008. Uncheck the Show box because this is a Where column.
  5. Add a Join by clicking in Date Time in the original table and dragging the cursor to the Date field in the new table.
  6. Drag down the MAX field twice, and add a label to the first MAX column as follows: IndioMAX_2008: MAX
  7. Change the 2nd MAX field to show max instead of avg, change Expression to Where, and add the criteria >=5 to ensure that there are no -999 values (or values less than the MDL). Uncheck the Show box because this is a Where column.
  8. Your query should now look like:

  1. Click Run to see the results:
  1. Copy this data into excel. Run the query again for 2010 data, and copy these into excel.
  2. Create a scatterplot graph as you did in the Aggregated Data excel file for the 2009 and 2010 monthly max.

1 (9/7/12)