Connecticut DEP Property

From OpenStreetMap Wiki
Jump to navigation Jump to search

Connecticut DEP Property is a semi-manual import/check of existing OSM data against the 'DEP Property' shapefile available from Connecticut Department of Energy and Environmental Protection.

Goals

Connecticut state-owned recreation lands, as of 2019-07-05, are mapped rather irregularly. While all but four of the lands designated as 'State Park' are mapped, the mapping is much spottier with other classes of state-owned recreation land. None of the State Fish Hatcheries is mapped. Only three of the Flood Control Areas (open to public recreation when not flooded), one of the Natural Area Preserves, a minority of the State Forests, and only three of the Water Access Areas are mapped. There is thus ample opportunity for adding public recreation lands. These are often difficult or impossible to map accurately without importing, because of inacessibility of the borders; nevertheless, they are features of intense public interest.

An earlier import of Connecticut recreation land boundaries was conducted by OSM user BJD between July and October of 2010. This import credits the Connecticut Department of Environmental Protection (now the Department of Energy and Environmental Protection), and has a URL that purports to link to the data source. The link, as of July of 2019, resolves to the department's main GIS portal, and there is no indication which among several data sets of public land was the source of the import. An examination of the imported data suggests that it was the one entitled Protected Open Space Mapping. This set of shapefiles has numerous issues:

  • It appears to be an abandoned project, not having been updated in the last eight years.
  • It is lacking data for numerous townships; indeed, it contains polygons that identify the townships for which no data are provided.
  • The map for different townships appears to be on inconsistent datums, with a few townships having boundaries that are about 35 m off (probably due to NAD27/NAD83/WGS84 inconsistencies).

Because of these issues, particularly the incompleteness of the data, a semi-automatic import of the data set entitled DEP Property is proposed. This file contains all of Connecticut's State Parks and State Forests (and many other public-access facilities with other protection titles). Some spot checks suggest that the data quality is sound - parcels are topologically correct, and boundaries often align with highways and waterways as mapped in OSM. (Of course, further validation will be conducted during the import.)

Schedule

The current plan is to start a discussion of this import in the online fora (notably the 'imports' and 'talk-us' mailing lists) in the second week of July, 2019. Once there has been enough time for interested parties to weigh in, this page will be updated with a detailed schedule.

Import Data

Background

Data source site: http://cteco.uconn.edu/guides/DEP_Property.htm

Data license: Embedded in metadata at http://cteco.uconn.edu/metadata/dep/document/DEP_PROPERTY_FGDC_Plus.htm Quoting from that page:

Access constraints: None. The data is in the public domain and may be redistributed.

Use constraints: None. There are no restrictions or legal prerequisites for using the data. Once acquired, any modification made to the data must be noted in the metadata. When printing this information on a map or using it in a software application, please acknowledge the State of Connecticut, Department of Environmental Protection as the original source for this information.

The requirement to annotate modifications is satisfied by the OSM changeset mechanism. The acknowledgement is already made in the Contributors page and will also be part of the tagging of any imported features.

Type of license: Public domain.

OSM attribution: Contributors#Connecticut

ODbL Compliance Verified: Metadata explicitly state that the data are in the Public Domain.

OSM Data Files

OSM data files for the project will be provided at a later time.

Import type

This import is initially conceived to be a one-time import. Since it consists of manually conflating a relatively small number of features (a few hundred areas at most, comprising about 1500 polygons), and the features are relatively stable (state parks and similar features seldom change very much), repeated imports using the same techniques are thought to be feasible. The user proposing the import has experience with handling regular updates to a couple of other datasets of public recreational land and has been able to cope with an annual schedule of updates.

Data Preparation

Data reduction and simplification

The data in the file are all thought to be worthy of import if they are not already in OSM, so no particular selection process is involved. The importer has at hand a PostGIS database created from a GeoFabrik extract of North America, updated nightly. This database can be used to identify OSM features that are named alike to the Connecticut state parks and have nearly identical geometry. A criterion of 99.9% overlap (computed by comparing the area of the intersection divided by the area of the union) appears to skip all areas with 'correct' geometry and identify those whose geometry requires attention. In any case, all areas will need examination for protected area tagging, as described below.

The data to be imported have all the usual issues of most imported data, in that they have topologically inconsistent multipolygons, small slivers where parcels were supposed to merge but had inconsistent borders, and trivial overlaps between abutting features. The correction of these faults is discussed under Data transformation below.

Parcels that appear to be mapped with an excessive number of nodes may be simplified either with JOSM or with the ST_Simplify feature of PostGIS.

Tagging plans

Tagging the land use of state parks is a controversial topic, and resolving the conflict is explicitly out of scope for this import. For any feature that already exists in OSM, the plan is to leave any amenity=*, landuse=*, leisure=*, park:type=*, or tourism=* tags exactly as they were found, with the exception of obvious errors (for instance, there are three campgrounds that are currently tagged as either leisure=nature_reserve or landuse=recreation_ground, and these should be retagged tourism=camp_site).

Since there are only two parcels of state land with boundary=protected_area tagging, this tagging will be added to all features that are conflated. In addtion, new features will have both this tagging and an appropriate principal land use designation. This designation will be determined by the AV_Legend attribute in the input data set, which shall also be copied to the protection_title=* tag of the object. The mapping shall be as given in the following table:

Protection and land use, by title
protection_title protect_class protection_object landuse tags
DEEP Owned Waterbody None. DO NOT IMPORT.
Fish Hatchery 4 fish landuse=aquaculture
Flood Control 15 floodwater leisure=nature_reserve
Historic Preserve 22 heritage historic=yes heritage=4 and appropriate tagging determined by consulting resources such as the

National Register of Historic Places.

Natural Area Preserve 5 leisure=nature_reserve
Other Tagged on a case-by-case basis. The features range from office=government to boundary=aboriginal_lands
State Forest 6 forestry landuse=forest leisure=nature_reserve
State Park 21 recreation leisure=park park:type=state_park [1]
State Park Scenic Reserve 5 landscape leisure=nature_reserve
State Park Trail 21 recreation leisure=park park:type=trailway [1]
Water Access 21 recreation leisure=nature_reserve [2]
Wildlife Area 4 hunting;fishing leisure=nature_reserve
Wildlife Sanctuary 4 wildlife leisure=nature_reserve

[1] It is recognized that the park:type=* tag is deprecated. Nevertheless, it remains in common use in Connecticut, and tagging these features consistently with existing ones is expected to simplify future edits.

[2] Some water access areas may be more accurately tagged as leisure=fishing or leisure=marina and will be so tagged if a local mapper identifies them.

All features will receive the following tagging:

A diligent search will be made for appropriate website=* and wikipedia=en:* tags.

Changeset tags

For each import changeset, the comment=* will include at least the name of the imported parcel(s); the created_by=* will be a dedicated import user; and source=ftp://ftp.state.ct.us/pub/dep/gis/shapefile_format_zip/DEP_Property_shp.zip will be included.

Data transformation

The data transformation begins by importing the shapefile into PostGIS with a command like:

ogr2ogr -t_srs EPSG:3857 -f PostgreSQL \
    "PG:dbname=gis" DEEP_Property/DEP_PROPERTY.shp \
    -nln conn_dep_property -nlt MULTIPOLYGON -lco PRECISION=NO 

As mentioned above, some of the imported data contain topological errors or faults such as polygon intersections, ring self-intersections, and tiny slivers where parcels align imperfectly. Virtually all of these may be tidied with a PostGIS query like:

        select ST_Buffer(ST_Buffer(
		    ST_CollectionExtract(ST_MakeValid(wkb_geometry), 3), 2.5), -5.0)
	          as geom,
               property,
               av_legend
        from conn_dep_property

This query forces the geometry to be valid, casts out any degenerate polygons, then dilates the boundaries by 2.5 metres followed by shrinking them by 5.0 metres. The trivial shrinkage this gives to the parcel boundary is almost unnoticeable for these typically very large parcels, and the result is that slivers are removed and tiny gaps are filled. The plan is to apply this transformation only to those parcels that need it.

The data transformation above is simple enough that relatively straightforward SQL code can populate a table with columns corresponding to the OSM tags. This SQL code can be used as input to the Ogr2osm script to produce the OSM XML.

The OSM user who has proposed this import has at hand some scripts written in the Tcl programming language that can assist with conflation, by using the JOSM API to download existing objects, select those areas that are candidates for conflation, and present the imported areas in a separate layer. Using JOSM functions such as 'Replace geometry' makes fairly quick work of an uncomplicated conflation.

Care must be exercised not to undo the hard work of other mappers; when in doubt, do not import or conflate. Fully automated conflation is appropriate only where either a feature does not yet exist in OSM (in which case there is nothing to conflate) or the existing feature's geometry was the result of the earlier import of Connecticut DEP data and is unmodified since the import.

Data transformation results

Sample OSM XML has been produced and is available at Kevin Kenny's personal site.. These data are somewhat raw, and produced by an automated process. Tagging of historic reserves, and of facilities in the 'Other' category, has not been attempted. In addition, there is a list, in a plain-text file, of the features to be processed. This list includes the OSM ID's and names of other features that should be checked and either conflated or have boundary conflicts resolved. In some cases multiple State of Connecticut features will be included in the same group, because they have one or more OSM features in common.

If a tasking manager is used to parcel out these data, each group (groups are separated with lines consisting of four hyphens) should be a single task, since data inconsistencies are virtually certain to result if all the features in a group are not tackled at once.

Data Merge Workflow

Team approach

The relatively small number of features to be imported makes this import feasible as a solo project. Nevertheless, as a matter of due diligence and of promoting community ownership, direct messages have been sent via openstreetmap.org to most[1] users who have edited area features that are to be conflated or that overlap or abut features to be imported, inviting them both to review this proposal and to participate in the conflation. It is anticipated that at least a few features will have complicated conflation that requires local knowledge to resolve. (These features will remain untouched if appropriate local mappers cannot be found.) About a dozen mappers responded actively in the first few days after sending the messages. Most expressed general approval. Some warned that they had put in work editing geometry, and that an import must not damage the local mapper's changes. Since this caution is a general principle of imports, it does not change the import procedure, except that the users in question have also expressed willingness to help with conflation. None of the users contacted has expressed any opjection to the project as a whole.

Workflow

The intent is to produce individual OSM XML files for each of the features to be imported or conflated. These can be operated on one at a time, each forming its own changeset, and thereby keeping the changesets to a manageable size. After a small number (~10) of parcels have been processed, the import will be paused so that comments can be solicited on the mailing lists. After a few days, if no obvious problems are detected, the import may resume.

Conflation

All parcels will be compared automatically against an OSM export to detect other area features that overlap, and have one of the tags, amenity=*, boundary=national_park, boundary=protected_area, landuse=*, leisure=*, park:type=*, or tourism=* . Any such area having an identical name becomes a candidate for conflation; any other such area with a different name is a possible conflict that must be reviewed and accepted or resolved manually.

Reversion plan

Reversion should be straightforward, since the import consists entirely of adding/modifying small sets of tagged multipolygons. Using the JOSM/Plugins/Reverter plugin is expected to be adequate in the unlikely case that a revert is required.

Quality assurance

It is hoped that the multiple checks:

  • have this proposal and the associated OSM file reviewed by the community before beginning imports
  • automatically check all the lands for collisions with other land uses
  • automatically check data validity using the JOSM validator
  • have the first changeset reviewed by the community before importing in bulk
  • enlist local mappers with 'skin in the game' to aid in cases where conflation is complicated

will provide enough assurance that the import will be an improvement over what is already there. The proponent is willing to impose other reasonable controls, should readers suggest them.

  1. The list is possibly incomplete, because the PostGIS database that was used to obtain it was subject to the same issue documented in this ticket.The missing features will be reviewed, and other users will be contacted as needed.