Import: NYCDEP Watershed Recreation Areas

From OpenStreetMap Wiki
Jump to: navigation, search

About

An import of watershed protection lands owned by the city of New York, New York that are open to public recreational use.

Current status

June 11, 2016: The initial import is complete. The importer will continue to monitor the New York City website periodically for updated data.

Goals

New York City Department of Environmental Protection (NYC DEP) owns about four hundred parcels of land in the Catskill Mountains and the Croton River valley that are maintained as conservation lands for the protection of its watershed and the quality of its drinking water. Most of these lands are open to public recreation, either to all comers or with a permit that is available free of charge from NYC DEP.

Geospatial data for these parcels, together with a table of permissible uses, can be reconstructed from the data available in the file, [http://www.nyc.gov/html/dep/pdf/recreation/open_rec_areas.pdf Recreation Areas and Use Designations by County]. This project proposes to import the boundaries of these areas into OSM so that maps intended for hunters, anglers, hikers, and environmentalists can display them.

At least one reviewer has raised the concern that everything in OSM should be, at least in principle, observable with boots on the ground. I can assure that these parcels are. Along the roads and trails, they're typically posted with small signs looking like the picture in [1]. In the backcountry, they may still be signposted, but are often marked with witness trees, cairns, and survey pins, the same way as any other survey line. They are certainly recoverable, and users entering and leaving the areas by established routes are generally aware that they're doing so.

Schedule

May 27, 2016: OSM file for the entire data set constructed and validated. Formal proposal written and posted to the Wiki, and feedback solicited on imports, imports-us and talk-us.

June 4, 2016: Assuming that one round of discussion is sufficient to implement all needed changes, upload the first set of parcels and solicit review a second time.

June 11, 2016: Once again, if no major objections are raised, complete the upload of the remaining parcels.

When updated: New York City has only fairly recently released the data in a georeferenced form. I understand that the agency's plan is that the data will be periodically updated as existing sites are resurveyed and new sites are purchased and posted. Until and unless such updates begin appearing, it will be next to impossible to develop any semi-automated workflow for keeping OSM in sync with the government's files. The proponent does plan to do so as the opportunity arises.

Import data

Background

Data source site

The attribute table that describes the parcels is obtained by "screen scraping" the published list that appears on NYC DEP's web site. This list embeds links to individual maps of each parcel of public watershed land. These maps are all georeferenced PDF files. They are obtained en masse by a script that downloads from each PDF link that is embedded in the attribute table.

Data license

Per Local Law 11 of 2012, all NYC government data is to be provided "without any registration requirement, license requirement or restrictions on their use" (23-502 d), effectively putting it in the public domain.

OSM Data Files

The entirety of the import totals about three megabytes when converted to .OSM format. While the import is in progress, it may be inspected at https://kbk.is-a-geek.net/nycdep-import/nyc_bws_rec_area.osm.

Import type

This import is planned as a one-time automated import at present. The data are static enough that manual reimport may be satisfactory; nevertheless, when a new version emerges, the proponent plans to review the feasibility of automated synchronization.

Data Preparation

Data reduction and simplification

There is very little redundancy between the planned import and what is already in OSM. Only two parcels out of nearly four hundred have been placed in OSM, and those two have significant discrepancies between what is mapped in OSM and what is reported by NYC DEP. (The proponent has reached out to the user who initially added the parcels, but have not yet heard a reply.)

Like many boundary data sets, the data in this set are highly variable. Some of them appear to be quite precisely curated lines, with interagency coordination where the parcels adjoin other parcels of government-owned land. Others appear to be simply the GPS tracks of someone walking the parcel boundaries, including the usual amount of GPS noise.

The data are reduced by transforming them in PostGIS (as described under #Data transformation below), restoring a (mostly) consistent topology and reducing the number of nodes by nearly an order of magnitude.

Tagging plans

There are a number of competing proposals for tagging conservation lands. What is proposed here is a hybrid: include leisure=nature_reserve for legacy renderers, and include a fuller set of tags chosen from the list that support boundary=protected_area.

A few nonstandard keys and tags are worthy of mention, since they're otherwise undocumented.

wildlife_management_unit is ascribed to New York State Department of Environmental Conservation (NYSDEC) rather than NYCDEP because that agency is responsible for wildlife management and administers the hunting regulations.

hunting and fishing already appear with yes and no values elsewhere in the database, even though they are not called out on the Wiki. No other choice for the keys seems entirely natural. trapping is an entirely new key, but nowhere else does the database appear to represent the concept.

foot=hunting and foot=fishing seem to be reasonable choices when access is restricted to those specific purposes.

permit as a value for foot appears in the database. license appears also, about the same number of times. Both are rare uses, but appear appropriate here. private seems to be too strong a term when permission is easily obtained free of charge.

The specific list to be included at present is:

Tag Comments
name=* Name of the unit, as shown in the data table with the word, 'Unit' appended.
NYSDEC:wildlife_management_unit=* Alphanumeric code identifying the wildlife management unit to which the parcel belongs. This information is useful to hunters and anglers, since it determines the seasons, bag limits, and license requirements.
NYCDEP:last_modified=* ISO8601 time string identifying the modification time of the PDF file. Included against the possibility of future automated updating.
foot=yes Indicates that the parcel is open to foot travel by the general public.
foot=permit Indicates that the parcel is open to foot travel by permit holders. The website=* link refers to a site where permit information is obtainable.
foot=hunting
foot=fishing
Indicates that the parcel is open to foot travel only for the purpose of the designated activities.
fishing=yes Indicates that the parcel is open to fishing by the general public. (State laws regarding fishing licenses and catch limits must, of course, be observed.)
fishing=permit Indicates that the parcel is open to fishing by permit holders only. Once again, the website=* link refers to a site where permit information is obtainable.
fishing=no Indicates that fishing is forbidden on the parcel.
hunting=yes Indicates that the parcel is open to hunting by the general public. (State laws regarding hunting licenses, seasons, and bag limits must, of course, be observed.)
hunting=archery Indicates that hunting is permitted on the parcel, but firearms are forbidden. Large game may be taken by bow only.
hunting=permit Indicates that the parcel is open to hunting by permit holders only. Once again, the website=* link refers to a site where permit information is obtainable.
hunting=no Indicates that hunting is forbidden on the parcel.
trapping=yes Indicates that the general public is permitted to trap furbearers such as otter, beaver, fisher and marten on the parcel when otherwise in compliance with State and Federal law.
trapping=permit Indicates that the parcel is open to trapping by permit holders only. Once again, the website=* link refers to a site where permit information is obtainable.
trapping=no Indicates that setting of traps of game is forbidden on the parcel.
website:map=* URL from which the PDF map was obtained
website=* http://www.nyc.gov/html/dep/html/recreation/index.shtml
leisure=nature_reserve All of these sites are classed as 'nature reserves.
boundary=protected_area All of these sites enjoy legal protection for watershed conservation.
protect_class=12 Meaning: resource-protected area for water.
protection_object=water Self-explanatory
protection_title=* "Watershed Recreation Area"
related_law=* "NYCDEP Rules for the Recreational Use of Water Supply Lands and Waters http://www.nyc.gov/html/dep/pdf/recrules/recrules.pdf"
operator=* "New York City Department of Environmental Protection, Bureau of Water Supply"
governance_type=* government_managed
site_ownership=* municipal

Changeset Tags

The current plan is to process the parcels grouped by township (a fairly manageable amount of data) from JOSM. The comment=* and created_by=* tags will include the township name and the dedicated import user ID. The source=* tag will have the URL of the published list of parcels.

Data transformation

Getting from a pile of PDF files, even ones that are already georeferenced, to a coherent set of multipolygons is a bit of an adventure. There is a script written in the Tcl programming language that performs most of the tasks. (All scripts referenced in this proposal may be downloaded from the project repository.)

It is worth discussing the process in some technical detail, since scraping map information from ArcGIS-generated PDF files is somewhat arcane.

The general workflow is:

  1. Retrieve the published list of parcels and all the PDF's to which it links
  2. Construct an attribute table from the published list

Then, for each parcel, one at a time:

  1. The script retrieves the PDF file from the link in the attribute table. (This operation is gated by an HTML HEAD operation that bypasses retrieval if a current copy is already on the local machine.)
  2. The script constructs the tags according to the table in #Tagging Plans
  3. The script uses Ogr2ogr to import the parcel outline from the PDF file, into a PostGIS database table.
  4. The parcel arrives as a set of multi-linestrings that may have been split in ArcGIS. The hierarchy of this set is flattened to a set of inner and outer rings using the PostGIS functions ST_Collect, ST_CollectionHomogenize, ST_LineMerge and ST_Dump.
  5. Linestrings shorter than four points are degenerate polygons and are deleted from the table. This must be done before the next step, or the next step fails with a PostGIS error.
  6. The rings are promoted to polygons with the PostGIS function ST_MakePolygon.
  7. Some of the polygons, because of noisy GPS data, include self-intersections. The topology of the polygons is normalized by applying the PostGIS function ST_MakeValid. The result is a heterogeneous collection, which may include line segments, points, and empty objects as well as polygons. Only the multipolygons are extracted by applying the PostGIS function ST_CollectionExtract.
  8. The now-valid multipolygons are flattened back to a set of rings using the PostGIS functions ST_Dump and ST_ExteriorRing.
  9. There is now a consistent and connected set of boundaries for the parcel. The ST_Collect and ST_BuildArea functions identify inner and outer rings, and yield a single multipolygon to be imported.
  10. GPS noise is reduced by calling ST_Buffer to shrink the parcel by 2.5 metres all around, and ST_Simplify, also with 2.5 metre precision, to reduce the number of points.
  11. The parcel is added, with all tags, as a single row in a PostGIS table.

Conflation and consistency checking

The proponent has at hand a PostGIS database containing an export of the North America OSM data, updated daily from GeoFabrik. This database provides the basis for an initial quality and conflation check. The parcels are compared with other area features in OSM, after being contracted by 7.5 metres to allow for a certain amount of slop in the digitization. The following query appears to be sufficient to exclude most false positives, and to isolate the parcels from the NYS DEC Lands import. (There is a separate plan to reimport that file, and the collisions will change substantially).

select (defined(tags, 'NYDEC_Lands:LANDS_UID') 
        or defined(tags, 'NYDEC_Lands:FACILITY')) as NYS,
       ST_Area(ST_Intersection(a.wkb_geometry, b.way)) as collision,
       a.name, b.osm_id, b.name from nyc_bws_rec_area a
join na_osm_polygon b on ST_Overlaps(a.wkb_geometry, b.way)
where (b.boundary is null or b.boundary <> 'administrative')
and (b.natural is null or b.natural not in ('water', 'wetland'))
and (b.landuse is null or b.landuse not in ('wetland', 'reservoir'))
and ST_Overlaps(ST_Buffer(a.wkb_geometry, -7.5), b.way)
order by nys, collision desc

Using the extract of 26 May 2016, this query identifies a total of ninety collisions, with only twenty-three against areas that are not already flagged for reimport. Of these:

  • Two NYC DEP parcels are already in OSM and must be conflated.
  • Two parcels overlap conflicting areas with landuse=forest. (This is an actual inconsistency, since the NYC DEP areas are not managed for timber production.) This inconsistency will most likely be resolved by replacing the offending tag with natural=wood.
  • There is a boundary inconsistency between the Cave Mountain Unit and the Ski Windham resort. This inconsistency will be left in place.
  • There is a boundary inconsistency between the Sackrider Cemetery and the Shaw Road unit. Again, this inconsistency will be left in place. It is possible that a portion of the cemetery has a dual land use and is also watershed protection land.
  • There are parcels of the Kaaterskill and Sundown Wild Forests that have no tags indicating that they participated in the NYS DEC import and are in conflict. The reimport of the NYS DEC data will have to address these. They are ignored for the present.
  • Three or four parcels contain building polygons. These are regarded as false positives.

Separately, a similar query was used to compare the areas with the latest version of the NYC DEP Lands shapefile. In the newer shapefile, the number of collisions is reduced from 68 to 25. Only ten of these exceed one hectare in extent. In all of these ten, the NYC DEP map is more plausible than the NYS DEC one, because its lines adhere quite closely to the lines of the eighteenth-century land allocations. (Cadastral research in this part of the world is greatly facilitated by a 1970-vintage map of the lots in the Catskill Park.) The plan is to ignore these inconsistencies in the import; the interagency coordination work that has been in progress for the last few years is likely to repair them.

Export to OSM format and quality control

With all this preliminary work done, the ogr2osm tool exports the parcels to OSM format. A final quality control check consists of loading them by themselves into JOSM and running the validator.

There is the expected cascade of warnings of duplicated nodes, since none of the steps above removes duplicates. After JOSM fixes the duplicates automatically, a second validation is run.

This final check results in only three warnings:

  • There is a boundary inconsistency between the north and south Warner Creek Units. Inspecting the boundary manually, it appears that the intent is that it should follow the nearby stream, but neither linestring actually does so and both have large jumps, as if from a handheld GPS unit temporarily losing satellite coverage. Having neither field data nor an independent reference, the plan is simply to import the inconsistent data as it stands.
  • A couple of parcels contain multipolygons consisting of two outer rings sharing a single node. JOSM doesn't like this, and the proponent has not found an easy way to work around the issue. As they are constructed, they render acceptably in Mapnik, QGIS and GRASS, so this issue is likely a case of JOSM being overly fussy. If anyone has an easy fix, these two parcels will be repaired manually.

The resulting OSM XML file may be inspected at https://kbk.is-a-geek.net/nycdep-import/nyc_bws_rec_area.osm.

Data Merge Workflow

Team Approach

With only a few hundred multipolygons to import, there's no real need to recruit a large team. The import will take place under a single dedicated account ke9tv-NYCDEP-import.

Workflow

  1. One township at a time, use ogr2osm to make an OSM file of just the parcels that lie within the township's borders. This keeps each import to a manageable size.
  2. Load the OSM file into JOSM, download the data for its bounding box from the OSM server, revalidate, and upload with the appropriate changeset tags.

After the first township is uploaded, the process will pause for a few days to solicit feedback on the mailing lists. If no major problems are detected, the rest of the data should be uploaded in rapid succession, since very little manual work is required to process them.

Conflation

The objects that require conflation are already identified, and are few enough in number that they can be readily handled manually.

Reversion plan

Since the import consists solely of adding a set of tagged multipolygons, reversion should be equally simple. Using the JOSM/Plugins/Reverter plugin is expected to be adequate in the unlikely case that a revert is required.

Quality Assurance

It is hoped that the multiple checks:

  • have this proposal and the associated OSM file reviewed by the community before beginning imports
  • automatically check all the lands for collisions with other land uses
  • automatically check data validity using the JOSM validator
  • have the first changeset reviewed by the community before importing in bulk

will provide enough assurance that the import will be an improvement over what is already there. The proponent is willing to impose other reasonable controls, should readers suggest them.