Tufts OpenCourseware
Author: Barbara Parmenter, Ph.D.

1. Assignment 5 - GIS Data Quality Assessment (10 points)

Introduction to GIS for Urban and Environmental Analysis

Barbara Parmenter, PhD, Tufts University, 2007

Format : Text or PDF document.

The goal of this assignment is for you to gain a deeper understanding of existing GIS data layers and to start to learn how to assess the appropriateness of a data set is for a given project.

Because we can only assess the appropriate of any given data set based on what a particular project's needs are, you need to think about data quality in terms of those needs. For this assignment, pretend that you are working on a project and need to assess potential data for it. You define the overall project and its accuracy needs, and then answer the data quality questions below in terms of this defined project (note: this does not have to be your class project, although it can be).

Collect at least 9 GIS data layers (the five required below, plus four more) for the area in which you live or an area with which you are very familiar. This should be at the scale of a few blocks so that you can take a very detailed look at the data and if necessary, go out and visit the area on foot.

The following data layers are required and must be compared to each other and to your other data layers - see the Data Notes at the end of this page for more information and tips:

Roads :

Hydrography :

  • Hydrography from a city, state, or the National Hydrography Data set (http://nhd.usgs.gov/

  • Hydrography from the Census TIGER data set (from same source as for TIGER road centerlines)

If there is no hydrography in your area, then use the MassGIS land use data layer and compare it with the National Land Cover Data set (NLCD), and if available, a parcel-level data layer with land use classes.

The following data layers are possible options for the other four layers but you can use others if you like:
Parks or other open space, floodplains (called FEMA Q3 Flood on MassGIS), schools, libraries, hospitals, land use, land cover, parcels, building footprints, other transportation modes, orthophotos, impervious cover. Note:  it would be good practice to use a layer created by geocoding addresses or adding tabular data that already has XY coordinates - you can use Reference USA (http://www.referenceusa.com ) to generate a list of addresses for businesses within a zip code (go to Reference USA to search - use custom search and search by a certain NAICS code and Zip Code)

Address the following questions, again using a pretend project as a benchmark where appropriate. If you can find orthophotos of your area, use them as well, especially to estimate how far off the data layers are from the orthophoto (which may itself be wrong). If metadata exists (e.g., in ArcCatalog or on the MassGIS web site), use the metadata to help you understand the data set and the adequacy of the data for your proposed project (note: it may help to put these answers in a tabular format):

  1. Provide one-paragraph description of the project you are using as a benchmark to assess the data and what positional accuracy it will require (or what is good enough - think about how far off the position could be and still work for the project needs)

  2. Briefly discuss the three different road centerline data sets in terms of their positional relation to each other (look at how far apart they are at different points using the measure tool in ArcGIS, and if there is consistency in the differences. Include some graphic examples to illustrate your points. Which data set would be best for your project?

  3. Do the same as above for the two hydrography layers.

  4. Can you provide a quantitative assessment of positional accuracy for each of your data layers (e.g., +/- 20 feet)? Why or why not?

  5. Give a qualitative assessment of positional accuracy of each of the four optional layers relative to the other layers (e.g., do streets run through buildings? are schools in the correct location along a road?).

  6. Are these optional layers appropriate for your project in terms of their positional accuracy?

  7. Completeness: Is each data set complete? (Does it cover the area question, are all relevant features present, and is the attribute information complete for all features?)

  8. Currency: Are the data up to date? How do you know the answer to this?

  9. Attribute accuracy: provide a qualitative assessment of attribute accuracy for critical attribute items (e.g., land use codes, street names and address ranges, school names, etc). How adequate is the attribute information for your project needs?

Data Notes :

See Exploring Data Quality demo sheet

Road Centerlines :

Hydrography Data

  • MassGIS

  • National Hydrography Data Set - High Resolution - (use Internet Explorer - Mozilla causes problems - you can get the data in a couple different ways from the National Hdyrography Data web site - http://nhd.usgs.gov/data.html - geodatabase files are prestaged by subbasin  at the FTP site indicated - you need to use EPA's Surf Your Watershed to determine your subbasin, as directed at this site. Alternatively you can go to the NHD Viewer, but you need to read the directions for Extracting Data there. Going to the FTP site is quicker

  • Tiger hydrography (lines and water polygons) - for Massachusetts, you can get it from MassGIS). For the rest of the US, get by county from ESRI -  http://www.esri.com/data/download/census2000_tigerline/index.html - note you need to define the projection for these  ESRI TIGER water lines. The projection is "geographic coordinate system - North America - North American Datum 1983.prj - see the Define Projection Tip Sheet for instructions on how to define the projection.

Tip for reducing your data layers to just your area:

  • Add all the data you will be comparing to an ArcMap session (make sure you have defined the projection for the TIGER data if necessary)

  • Zoom in to your area of interest 

  • Click on the Select Tool

  •  Drag a box over the area you want to study to select all the features there - it will look like a mess because every feature in the box in every layer is selected, but that's a good thing!

  • One layer at a time, right-click on the data layer and choose Selection - Create Layer from Selected Features (or to create a new separate shape file, click on Data - Export)

  • Click off the original layer as you create the subset

  • Do this for each layer you will be studying, and eventually you will have just the subset layers.

Other methods of creating smaller data sets from larger data sets can be found on the Tip Sheets