Tye Parzybok, Wayne Gibson, Christopher Daly and George Taylor
Oregon State University, Corvallis, Oregon
A 99-year dataset of monthly precipitation and monthly mean minimum and maximum temperature was developed for the VEMAP (Vegetation/ Ecosystem Modeling and Analysis Project) ecological model intercomparison. The data set consisted of about 8500 precipitation stations and about 5500 temperature stations for the period 1895-1993. The PRISM model was used to prepare GIS-compatible coverage on a latitude-longitude grid for each calendar month during the period. The resulting dataset is the first temporally complete and geographically realistic representation of the historical climate record ever produced.
As work began on the data processing for the modeling effort, it became clear that significant data quality problems existed within the data set. Recognizing that the quality of the final products depends heavily on quality of the input data, a systematic quality assurance program was designed in order to deal with these problems. Quality problems were especially troublesome in this project because of the high resolution (4 km) of the product.
A graphical interface was developed for assessing data quality on a month-by-month basis. The GRASS GIS system was used; a C-shell script was developed to display a map of areas whose observations are inconsistent with either climatology or nearby stations. The user selects a region via point-and-click; thereupon a table listing actual station observations appears on screen. Evaluations of data quality are made using this table. A second script allows the user to change the values in the master data set. After all changes are made, PRISM is rerun.
In the western one-fourth of the United States, the 100-year data set
contained about 35 values that were judged to be erroneous. Although this
represents a small fraction of the total number of observations, these data
problems produced significant errors in the resultant precipitation field.
Of the 35 errors, about 15 represented grossly bad or missing data; 12 were
due to mislocated stations; and 8 were incorrect station elevations.