Import: NYCDEP Watershed Recreation Areas

From OpenStreetMap Wiki
Jump to navigation Jump to search

About

An import of watershed protection lands owned by the city of New York, New York that are open to public recreational use.

Current status

June 11, 2016: The initial import is complete. The importer will continue to monitor the New York City website periodically for updated data.

March 19, 2017: An update conducted today brought in 10 new areas, redrew 25, and revised tagging for changed access restrictions on 6.

July 1, 2018: Another update added a handful of new areas, and redrew/retagged several dozen.

June 16, 2019: Another update brought in a few dozen areas. Some areas have problems with the remote data source. The following exceptions were noted:

Name Notes Action
Airport Road Map is GeoPDF but has topologic inconsistencies Extracted boundary from GeoPDF and imported manually
Ashokan Brook Listed in the table of recreation areas, but the map is 404 Not in OSM
Ashokan North Map is downloadable, but is not a GeoPDF Retained out-of-date copy
Forest Road Listed in the table of recreation areas, but the map is 404 Not in OSM
Hubbell Hill Listed in the table of recreation areas, but the map is 404 Not in OSM
Shavertown Bridge Map is GeoPDF but has topologic inconsistencies Extracted boundary from GeoPDF and updated manually
Watson Hollow Map is GeoPDF but has topologic inconsistencies Extracted boundary from GeoPDF and updated manually

In addition, several 'Day Use Area' and 'Fishing Access Area' units are included in the list of recreation areas, but have 'No' listed under every permitted use. It is suspected that these areas do indeed provide public access under a different regulatory regime from the areas already imported, but no attempt has yet been made to import them.

May 10, 2020: Another update added another few dozen areas. The reimport had no trouble with the Airport Road, Ashokan Brook, Forest Road, Hubbell Hill Hollow, Shavertown Bridge or Watson Hollow units. The PDF for the Ashokan Brook unit is once again a GeoPDF, and neither its boundary nor its tagging is different from what is already in OSM. Many areas either had land added, or had their geometry redrawn. Over two hundred areas had updates to change from http to https URL's. The following exceptions are noted:

Name Notes Action
Bushkill Map is GeoPDF but has topologic inconsistencies Inspected manually. Once topology is corrected, no significant difference from the copy already in OSM. Existing copy was retained.
Crescent Valley Listed in the table of recreation areas, but the map is 404 Existing copy in OSM was retained.
East Neversink Map is GeoPDF but has topologic inconsistencies Inspected manually. Neither tagging nor geometry are significantly different from the copy already in OSM. Existing copy was retained.
Hickory Nut Hill Map is GeoPDF but has topologic inconsistencies Inspected manually. Neither tagging nor geometry are significantly different from the copy already in OSM. Existing copy was retained.

March 27,2021: The tagging was revised: protect_class=12 was replaced with protect_class=6 and landuse=reservoir_watershed on all areas brought in by this import.

January 27, 2023: All updates since the last round were applied. The following areas had PDF maps from which geometry could not be extracted; for all of them, the existing geometry in OSM was retained. Bushkill, Croton Falls Ouutlet, East Neversink, Manorkiill, Mount Royal, Old Road, Sands Creek, Speedwell Mountain. Visual comparison of the NYCDEP maps with the boundaries in OSM did not reveal any glaring inconsistencies. Tagging was updated to reflect any changed access constraints and similar attributes.

Goals

New York City Department of Environmental Protection (NYC DEP) owns about four hundred parcels of land in the Catskill Mountains and the Croton River valley that are maintained as conservation lands for the protection of its watershed and the quality of its drinking water. Most of these lands are open to public recreation, either to all comers or with a permit that is available free of charge from NYC DEP.

Geospatial data for these parcels, together with a table of permissible uses, can be reconstructed from the data available in the file, [http://www.nyc.gov/html/dep/pdf/recreation/open_rec_areas.pdf Recreation Areas and Use Designations by County]. This project proposes to import the boundaries of these areas into OSM so that maps intended for hunters, anglers, hikers, and environmentalists can display them.

At least one reviewer has raised the concern that everything in OSM should be, at least in principle, observable with boots on the ground. I can assure that these parcels are. Along the roads and trails, they're typically posted with small signs looking like the picture in [1]. In the backcountry, they may still be signposted, but are often marked with witness trees, cairns, and survey pins, the same way as any other survey line. They are certainly recoverable, and users entering and leaving the areas by established routes are generally aware that they're doing so.

Schedule

May 27, 2016: OSM file for the entire data set constructed and validated. Formal proposal written and posted to the Wiki, and feedback solicited on imports, imports-us and talk-us.

June 4, 2016: Assuming that one round of discussion is sufficient to implement all needed changes, upload the first set of parcels and solicit review a second time.

June 11, 2016: Once again, if no major objections are raised, complete the upload of the remaining parcels.

When updated: New York City has only fairly recently released the data in a georeferenced form. I understand that the agency's plan is that the data will be periodically updated as existing sites are resurveyed and new sites are purchased and posted. Until and unless such updates begin appearing, it will be next to impossible to develop any semi-automated workflow for keeping OSM in sync with the government's files. The proponent does plan to do so as the opportunity arises.

Import data

Background

Data source site

The attribute table that describes the parcels is obtained by "screen scraping" the published list that appears on NYC DEP's web site. This list embeds links to individual maps of each parcel of public watershed land. These maps are all georeferenced PDF files. They are obtained en masse by a script that downloads from each PDF link that is embedded in the attribute table.

Data license

Per Local Law 11 of 2012, all NYC government data is to be provided "without any registration requirement, license requirement or restrictions on their use" (23-502 d), effectively putting it in the public domain.

OSM Data Files

The entirety of the import totals about three megabytes when converted to .OSM format. While the import is in progress, it may be inspected at https://kbk.is-a-geek.net/nycdep-import/nyc_bws_rec_area.osm.

Import type

This import is planned as a one-time automated import at present. The data are static enough that manual reimport may be satisfactory; nevertheless, when a new version emerges, the proponent plans to review the feasibility of automated synchronization.

Data Preparation

Data reduction and simplification

There is very little redundancy between the planned import and what is already in OSM. Only two parcels out of nearly four hundred have been placed in OSM, and those two have significant discrepancies between what is mapped in OSM and what is reported by NYC DEP. (The proponent has reached out to the user who initially added the parcels, but have not yet heard a reply.)

Like many boundary data sets, the data in this set are highly variable. Some of them appear to be quite precisely curated lines, with interagency coordination where the parcels adjoin other parcels of government-owned land. Others appear to be simply the GPS tracks of someone walking the parcel boundaries, including the usual amount of GPS noise.

The data are reduced by transforming them in PostGIS (as described under #Data transformation below), restoring a (mostly) consistent topology and reducing the number of nodes by nearly an order of magnitude.

Tagging plans

There are a number of competing proposals for tagging conservation lands. What is proposed here is a hybrid: include leisure=nature_reserve for legacy renderers, and include a fuller set of tags chosen from the list that support boundary=protected_area.

A few nonstandard keys and tags are worthy of mention, since they're otherwise undocumented.

wildlife_management_unit is ascribed to New York State Department of Environmental Conservation (NYSDEC) rather than NYCDEP because that agency is responsible for wildlife management and administers the hunting regulations.

hunting and fishing already appear with yes and no values elsewhere in the database, even though they are not called out on the Wiki. No other choice for the keys seems entirely natural. trapping is an entirely new key, but nowhere else does the database appear to represent the concept.

foot=hunting and foot=fishing seem to be reasonable choices when access is restricted to those specific purposes.

permit as a value for foot appears in the database. license appears also, about the same number of times. Both are rare uses, but appear appropriate here. private seems to be too strong a term when permission is easily obtained free of charge.

The specific list to be included at present is:

Tag Comments
name=* Name of the unit, as shown in the data table with the word, 'Unit' appended.
NYSDEC:wildlife_management_unit=* Alphanumeric code identifying the wildlife management unit to which the parcel belongs. This information is useful to hunters and anglers, since it determines the seasons, bag limits, and license requirements.
NYCDEP:last_modified=* ISO8601 time string identifying the modification time of the PDF file. Included against the possibility of future automated updating.
foot=yes Indicates that the parcel is open to foot travel by the general public.
foot=permit Indicates that the parcel is open to foot travel by permit holders. The website=* link refers to a site where permit information is obtainable.
foot=hunting
foot=fishing
Indicates that the parcel is open to foot travel only for the purpose of the designated activities.
fishing=yes Indicates that the parcel is open to fishing by the general public. (State laws regarding fishing licenses and catch limits must, of course, be observed.)
fishing=permit Indicates that the parcel is open to fishing by permit holders only. Once again, the website=* link refers to a site where permit information is obtainable.
fishing=no Indicates that fishing is forbidden on the parcel.
hunting=yes Indicates that the parcel is open to hunting by the general public. (State laws regarding hunting licenses, seasons, and bag limits must, of course, be observed.)
hunting=archery Indicates that hunting is permitted on the parcel, but firearms are forbidden. Large game may be taken by bow only.
hunting=permit Indicates that the parcel is open to hunting by permit holders only. Once again, the website=* link refers to a site where permit information is obtainable.
hunting=no Indicates that hunting is forbidden on the parcel.
trapping=yes Indicates that the general public is permitted to trap furbearers such as otter, beaver, fisher and marten on the parcel when otherwise in compliance with State and Federal law.
trapping=permit Indicates that the parcel is open to trapping by permit holders only. Once again, the website=* link refers to a site where permit information is obtainable.
trapping=no Indicates that setting of traps of game is forbidden on the parcel.
website:map=* URL from which the PDF map was obtained
website=* http://www.nyc.gov/html/dep/html/recreation/index.shtml
landuse=reservoir_watershed Indicates that the land is used for watershed protection.
leisure=nature_reserve All of these sites are classed as 'nature reserves.
boundary=protected_area All of these sites enjoy legal protection for watershed conservation.
protect_class=12protect_class=6 Meaning: resource-protected area for water.
protection_object=water Self-explanatory
protection_title=* "Watershed Recreation Area"
related_law=* "NYCDEP Rules for the Recreational Use of Water Supply Lands and Waters http://www.nyc.gov/html/dep/pdf/recrules/recrules.pdf"
operator=* "New York City Department of Environmental Protection, Bureau of Water Supply"
governance_type=* government_managed
site_ownership=* municipal

Changeset Tags

The current plan is to process the parcels grouped by township (a fairly manageable amount of data) from JOSM. The comment=* and created_by=* tags will include the township name and the dedicated import user ID. The source=* tag will have the URL of the published list of parcels.

Data transformation

Getting from a pile of PDF files, even ones that are already georeferenced, to a coherent set of multipolygons is a bit of an adventure. There is a script written in the Tcl programming language that performs most of the tasks. (All scripts referenced in this proposal may be downloaded from the project repository.)

It is worth discussing the process in some technical detail, since scraping map information from ArcGIS-generated PDF files is somewhat arcane.

The general workflow is:

  1. Retrieve the published list of parcels and all the PDF's to which it links
  2. Construct an attribute table from the published list

Then, for each parcel, one at a time:

  1. The script retrieves the PDF file from the link in the attribute table. (This operation is gated by an HTML HEAD operation that bypasses retrieval if a current copy is already on the local machine.)
  2. The script constructs the tags according to the table in #Tagging Plans
  3. The script uses Ogr2ogr to import the parcel outline from the PDF file, into a PostGIS database table.
  4. The parcel arrives as a set of multi-linestrings that may have been split in ArcGIS. The hierarchy of this set is flattened to a set of inner and outer rings using the PostGIS functions ST_Collect, ST_CollectionHomogenize, ST_LineMerge and ST_Dump.
  5. Linestrings shorter than four points are degenerate polygons and are deleted from the table. This must be done before the next step, or the next step fails with a PostGIS error.
  6. The rings are promoted to polygons with the PostGIS function ST_MakePolygon.
  7. Some of the polygons, because of noisy GPS data, include self-intersections. The topology of the polygons is normalized by applying the PostGIS function ST_MakeValid. The result is a heterogeneous collection, which may include line segments, points, and empty objects as well as polygons. Only the multipolygons are extracted by applying the PostGIS function ST_CollectionExtract.
  8. The now-valid multipolygons are flattened back to a set of rings using the PostGIS functions ST_Dump and ST_ExteriorRing.
  9. There is now a consistent and connected set of boundaries for the parcel. The ST_Collect and ST_BuildArea functions identify inner and outer rings, and yield a single multipolygon to be imported.
  10. GPS noise is reduced by calling ST_Buffer to shrink the parcel by 2.5 metres all around, and ST_Simplify, also with 2.5 metre precision, to reduce the number of points.
  11. The parcel is added, with all tags, as a single row in a PostGIS table.

Conflation and consistency checking

The proponent has at hand a PostGIS database containing an export of the North America OSM data, updated daily from GeoFabrik. This database provides the basis for an initial quality and conflation check. The parcels are compared with other area features in OSM, after being contracted by 7.5 metres to allow for a certain amount of slop in the digitization. The following query appears to be sufficient to exclude most false positives, and to isolate the parcels from the NYS DEC Lands import. (There is a separate plan to reimport that file, and the collisions will change substantially).

select (defined(tags, 'NYDEC_Lands:LANDS_UID') 
        or defined(tags, 'NYDEC_Lands:FACILITY')) as NYS,
       ST_Area(ST_Intersection(a.wkb_geometry, b.way)) as collision,
       a.name, b.osm_id, b.name from nyc_bws_rec_area a
join na_osm_polygon b on ST_Overlaps(a.wkb_geometry, b.way)
where (b.boundary is null or b.boundary <> 'administrative')
and (b.natural is null or b.natural not in ('water', 'wetland'))
and (b.landuse is null or b.landuse not in ('wetland', 'reservoir'))
and ST_Overlaps(ST_Buffer(a.wkb_geometry, -7.5), b.way)
order by nys, collision desc

Using the extract of 26 May 2016, this query identifies a total of ninety collisions, with only twenty-three against areas that are not already flagged for reimport. Of these:

  • Two NYC DEP parcels are already in OSM and must be conflated.
  • Two parcels overlap conflicting areas with landuse=forest. (This is an actual inconsistency, since the NYC DEP areas are not managed for timber production.) This inconsistency will most likely be resolved by replacing the offending tag with natural=wood.
  • There is a boundary inconsistency between the Cave Mountain Unit and the Ski Windham resort. This inconsistency will be left in place.
  • There is a boundary inconsistency between the Sackrider Cemetery and the Shaw Road unit. Again, this inconsistency will be left in place. It is possible that a portion of the cemetery has a dual land use and is also watershed protection land.
  • There are parcels of the Kaaterskill and Sundown Wild Forests that have no tags indicating that they participated in the NYS DEC import and are in conflict. The reimport of the NYS DEC data will have to address these. They are ignored for the present.
  • Three or four parcels contain building polygons. These are regarded as false positives.

Separately, a similar query was used to compare the areas with the latest version of the NYS DEC Lands shapefile. In the newer shapefile, the number of collisions is reduced from 68 to 25. Only ten of these exceed one hectare in extent. In all of these ten, the NYC DEP map is more plausible than the NYS DEC one, because its lines adhere quite closely to the lines of the eighteenth-century land allocations. (Cadastral research in this part of the world is greatly facilitated by a 1970-vintage map of the lots in the Catskill Park.) The plan is to ignore these inconsistencies in the import; the interagency coordination work that has been in progress for the last few years is likely to repair them.

Export to OSM format and quality control

With all this preliminary work done, the ogr2osm tool exports the parcels to OSM format. A final quality control check consists of loading them by themselves into JOSM and running the validator.

There is the expected cascade of warnings of duplicated nodes, since none of the steps above removes duplicates. After JOSM fixes the duplicates automatically, a second validation is run.

This final check results in only three warnings:

  • There is a boundary inconsistency between the north and south Warner Creek Units. Inspecting the boundary manually, it appears that the intent is that it should follow the nearby stream, but neither linestring actually does so and both have large jumps, as if from a handheld GPS unit temporarily losing satellite coverage. Having neither field data nor an independent reference, the plan is simply to import the inconsistent data as it stands.
  • A couple of parcels contain multipolygons consisting of two outer rings sharing a single node. JOSM doesn't like this, and the proponent has not found an easy way to work around the issue. As they are constructed, they render acceptably in Mapnik, QGIS and GRASS, so this issue is likely a case of JOSM being overly fussy. If anyone has an easy fix, these two parcels will be repaired manually.

The resulting OSM XML file may be inspected at https://kbk.is-a-geek.net/nycdep-import/nyc_bws_rec_area.osm.

Data Merge Workflow

Team Approach

With only a few hundred multipolygons to import, there's no real need to recruit a large team. The import will take place under a single dedicated account ke9tv-NYCDEP-import.

Workflow

  1. One township at a time, use ogr2osm to make an OSM file of just the parcels that lie within the township's borders. This keeps each import to a manageable size.
  2. Load the OSM file into JOSM, download the data for its bounding box from the OSM server, revalidate, and upload with the appropriate changeset tags.

After the first township is uploaded, the process will pause for a few days to solicit feedback on the mailing lists. If no major problems are detected, the rest of the data should be uploaded in rapid succession, since very little manual work is required to process them.

Conflation

The objects that require conflation are already identified, and are few enough in number that they can be readily handled manually.

Reversion plan

Since the import consists solely of adding a set of tagged multipolygons, reversion should be equally simple. Using the JOSM/Plugins/Reverter plugin is expected to be adequate in the unlikely case that a revert is required.

Quality Assurance

It is hoped that the multiple checks:

  • have this proposal and the associated OSM file reviewed by the community before beginning imports
  • automatically check all the lands for collisions with other land uses
  • automatically check data validity using the JOSM validator
  • have the first changeset reviewed by the community before importing in bulk

will provide enough assurance that the import will be an improvement over what is already there. The proponent is willing to impose other reasonable controls, should readers suggest them.