Import: NYCDEP Watershed Recreation Areas
- 1 About
- 2 Current status
- 3 Goals
- 4 Schedule
- 5 Import data
- 6 Data Preparation
- 7 Data Merge Workflow
- 8 Quality Assurance
An import of watershed protection lands owned by the city of New York, New York that are open to public recreational use.
June 11, 2016: The initial import is complete. The importer will continue to monitor the New York City website periodically for updated data.
New York City Department of Environmental Protection (NYC DEP) owns about four hundred parcels of land in the Catskill Mountains and the Croton River valley that are maintained as conservation lands for the protection of its watershed and the quality of its drinking water. Most of these lands are open to public recreation, either to all comers or with a permit that is available free of charge from NYC DEP.
Geospatial data for these parcels, together with a table of permissible uses, can be reconstructed from the data available in the file, [http://www.nyc.gov/html/dep/pdf/recreation/open_rec_areas.pdf Recreation Areas and Use Designations by County]. This project proposes to import the boundaries of these areas into OSM so that maps intended for hunters, anglers, hikers, and environmentalists can display them.
At least one reviewer has raised the concern that everything in OSM should be, at least in principle, observable with boots on the ground. I can assure that these parcels are. Along the roads and trails, they're typically posted with small signs looking like the picture in . In the backcountry, they may still be signposted, but are often marked with witness trees, cairns, and survey pins, the same way as any other survey line. They are certainly recoverable, and users entering and leaving the areas by established routes are generally aware that they're doing so.
May 27, 2016: OSM file for the entire data set constructed and validated. Formal proposal written and posted to the Wiki, and feedback solicited on imports, imports-us and talk-us.
June 4, 2016: Assuming that one round of discussion is sufficient to implement all needed changes, upload the first set of parcels and solicit review a second time.
June 11, 2016: Once again, if no major objections are raised, complete the upload of the remaining parcels.
When updated: New York City has only fairly recently released the data in a georeferenced form. I understand that the agency's plan is that the data will be periodically updated as existing sites are resurveyed and new sites are purchased and posted. Until and unless such updates begin appearing, it will be next to impossible to develop any semi-automated workflow for keeping OSM in sync with the government's files. The proponent does plan to do so as the opportunity arises.
Data source site
The attribute table that describes the parcels is obtained by "screen scraping" the published list that appears on NYC DEP's web site. This list embeds links to individual maps of each parcel of public watershed land. These maps are all georeferenced PDF files. They are obtained en masse by a script that downloads from each PDF link that is embedded in the attribute table.
Per Local Law 11 of 2012, all NYC government data is to be provided "without any registration requirement, license requirement or restrictions on their use" (23-502 d), effectively putting it in the public domain.
- Local Law of 2012 establishing the public status of NYC data
- NYC Open Data Policy and Technical Standards Manual
OSM Data Files
The entirety of the import totals about three megabytes when converted to .OSM format. While the import is in progress, it may be inspected at https://kbk.is-a-geek.net/nycdep-import/nyc_bws_rec_area.osm.
This import is planned as a one-time automated import at present. The data are static enough that manual reimport may be satisfactory; nevertheless, when a new version emerges, the proponent plans to review the feasibility of automated synchronization.
Data reduction and simplification
There is very little redundancy between the planned import and what is already in OSM. Only two parcels out of nearly four hundred have been placed in OSM, and those two have significant discrepancies between what is mapped in OSM and what is reported by NYC DEP. (The proponent has reached out to the user who initially added the parcels, but have not yet heard a reply.)
Like many boundary data sets, the data in this set are highly variable. Some of them appear to be quite precisely curated lines, with interagency coordination where the parcels adjoin other parcels of government-owned land. Others appear to be simply the GPS tracks of someone walking the parcel boundaries, including the usual amount of GPS noise.
The data are reduced by transforming them in PostGIS (as described under #Data transformation below), restoring a (mostly) consistent topology and reducing the number of nodes by nearly an order of magnitude.
There are a number of competing proposals for tagging conservation lands. What is proposed here is a hybrid: include leisure=nature_reserve for legacy renderers, and include a fuller set of tags chosen from the list that support boundary=protected_area.
A few nonstandard keys and tags are worthy of mention, since they're otherwise undocumented.
wildlife_management_unit is ascribed to New York State Department of Environmental Conservation (NYSDEC) rather than NYCDEP because that agency is responsible for wildlife management and administers the hunting regulations.
hunting and fishing already appear with yes and no values elsewhere in the database, even though they are not called out on the Wiki. No other choice for the keys seems entirely natural. trapping is an entirely new key, but nowhere else does the database appear to represent the concept.
foot=hunting and foot=fishing seem to be reasonable choices when access is restricted to those specific purposes.
permit as a value for foot appears in the database. license appears also, about the same number of times. Both are rare uses, but appear appropriate here. private seems to be too strong a term when permission is easily obtained free of charge.
The specific list to be included at present is:
|name=*||Name of the unit, as shown in the data table with the word, 'Unit' appended.|
|NYSDEC:wildlife_management_unit=*||Alphanumeric code identifying the wildlife management unit to which the parcel belongs. This information is useful to hunters and anglers, since it determines the seasons, bag limits, and license requirements.|
|NYCDEP:last_modified=*||ISO8601 time string identifying the modification time of the PDF file. Included against the possibility of future automated updating.|
|foot=yes||Indicates that the parcel is open to foot travel by the general public.|
|foot=permit||Indicates that the parcel is open to foot travel by permit holders. The website=* link refers to a site where permit information is obtainable.|
|Indicates that the parcel is open to foot travel only for the purpose of the designated activities.|
|fishing=yes||Indicates that the parcel is open to fishing by the general public. (State laws regarding fishing licenses and catch limits must, of course, be observed.)|
|fishing=permit||Indicates that the parcel is open to fishing by permit holders only. Once again, the website=* link refers to a site where permit information is obtainable.|
|fishing=no||Indicates that fishing is forbidden on the parcel.|
|hunting=yes||Indicates that the parcel is open to hunting by the general public. (State laws regarding hunting licenses, seasons, and bag limits must, of course, be observed.)|
|hunting=archery||Indicates that hunting is permitted on the parcel, but firearms are forbidden. Large game may be taken by bow only.|
|hunting=permit||Indicates that the parcel is open to hunting by permit holders only. Once again, the website=* link refers to a site where permit information is obtainable.|
|hunting=no||Indicates that hunting is forbidden on the parcel.|
|trapping=yes||Indicates that the general public is permitted to trap furbearers such as otter, beaver, fisher and marten on the parcel when otherwise in compliance with State and Federal law.|
|trapping=permit||Indicates that the parcel is open to trapping by permit holders only. Once again, the website=* link refers to a site where permit information is obtainable.|
|trapping=no||Indicates that setting of traps of game is forbidden on the parcel.|
|website:map=*||URL from which the PDF map was obtained|
|leisure=nature_reserve||All of these sites are classed as 'nature reserves.|
|boundary=protected_area||All of these sites enjoy legal protection for watershed conservation.|
|protect_class=12||Meaning: resource-protected area for water.|
|protection_title=*||"Watershed Recreation Area"|
|related_law=*||"NYCDEP Rules for the Recreational Use of Water Supply Lands and Waters http://www.nyc.gov/html/dep/pdf/recrules/recrules.pdf"|
|operator=*||"New York City Department of Environmental Protection, Bureau of Water Supply"|
The current plan is to process the parcels grouped by township (a fairly manageable amount of data) from JOSM. The comment=* and created_by=* tags will include the township name and the dedicated import user ID. The source=* tag will have the URL of the published list of parcels.
Getting from a pile of PDF files, even ones that are already georeferenced, to a coherent set of multipolygons is a bit of an adventure. There is a script written in the Tcl programming language that performs most of the tasks. (All scripts referenced in this proposal may be downloaded from the project repository.)
It is worth discussing the process in some technical detail, since scraping map information from ArcGIS-generated PDF files is somewhat arcane.
The general workflow is:
- Retrieve the published list of parcels and all the PDF's to which it links
- Construct an attribute table from the published list
Then, for each parcel, one at a time:
- The script retrieves the PDF file from the link in the attribute table. (This operation is gated by an HTML HEAD operation that bypasses retrieval if a current copy is already on the local machine.)
- The script constructs the tags according to the table in #Tagging Plans
- The script uses Ogr2ogr to import the parcel outline from the PDF file, into a PostGIS database table.
- The parcel arrives as a set of multi-linestrings that may have been split in ArcGIS. The hierarchy of this set is flattened to a set of inner and outer rings using the PostGIS functions ST_Collect, ST_CollectionHomogenize, ST_LineMerge and ST_Dump.
- Linestrings shorter than four points are degenerate polygons and are deleted from the table. This must be done before the next step, or the next step fails with a PostGIS error.
- The rings are promoted to polygons with the PostGIS function ST_MakePolygon.
- Some of the polygons, because of noisy GPS data, include self-intersections. The topology of the polygons is normalized by applying the PostGIS function ST_MakeValid. The result is a heterogeneous collection, which may include line segments, points, and empty objects as well as polygons. Only the multipolygons are extracted by applying the PostGIS function ST_CollectionExtract.
- The now-valid multipolygons are flattened back to a set of rings using the PostGIS functions ST_Dump and ST_ExteriorRing.
- There is now a consistent and connected set of boundaries for the parcel. The ST_Collect and ST_BuildArea functions identify inner and outer rings, and yield a single multipolygon to be imported.
- GPS noise is reduced by calling ST_Buffer to shrink the parcel by 2.5 metres all around, and ST_Simplify, also with 2.5 metre precision, to reduce the number of points.
- The parcel is added, with all tags, as a single row in a PostGIS table.
Conflation and consistency checking
The proponent has at hand a PostGIS database containing an export of the North America OSM data, updated daily from GeoFabrik. This database provides the basis for an initial quality and conflation check. The parcels are compared with other area features in OSM, after being contracted by 7.5 metres to allow for a certain amount of slop in the digitization. The following query appears to be sufficient to exclude most false positives, and to isolate the parcels from the NYS DEC Lands import. (There is a separate plan to reimport that file, and the collisions will change substantially).
select (defined(tags, 'NYDEC_Lands:LANDS_UID') or defined(tags, 'NYDEC_Lands:FACILITY')) as NYS, ST_Area(ST_Intersection(a.wkb_geometry, b.way)) as collision, a.name, b.osm_id, b.name from nyc_bws_rec_area a join na_osm_polygon b on ST_Overlaps(a.wkb_geometry, b.way) where (b.boundary is null or b.boundary <> 'administrative') and (b.natural is null or b.natural not in ('water', 'wetland')) and (b.landuse is null or b.landuse not in ('wetland', 'reservoir')) and ST_Overlaps(ST_Buffer(a.wkb_geometry, -7.5), b.way) order by nys, collision desc
Using the extract of 26 May 2016, this query identifies a total of ninety collisions, with only twenty-three against areas that are not already flagged for reimport. Of these:
- Two NYC DEP parcels are already in OSM and must be conflated.
- Two parcels overlap conflicting areas with landuse=forest. (This is an actual inconsistency, since the NYC DEP areas are not managed for timber production.) This inconsistency will most likely be resolved by replacing the offending tag with natural=wood.
- There is a boundary inconsistency between the Cave Mountain Unit and the Ski Windham resort. This inconsistency will be left in place.
- There is a boundary inconsistency between the Sackrider Cemetery and the Shaw Road unit. Again, this inconsistency will be left in place. It is possible that a portion of the cemetery has a dual land use and is also watershed protection land.
- There are parcels of the Kaaterskill and Sundown Wild Forests that have no tags indicating that they participated in the NYS DEC import and are in conflict. The reimport of the NYS DEC data will have to address these. They are ignored for the present.
- Three or four parcels contain building polygons. These are regarded as false positives.
Separately, a similar query was used to compare the areas with the latest version of the NYC DEP Lands shapefile. In the newer shapefile, the number of collisions is reduced from 68 to 25. Only ten of these exceed one hectare in extent. In all of these ten, the NYC DEP map is more plausible than the NYS DEC one, because its lines adhere quite closely to the lines of the eighteenth-century land allocations. (Cadastral research in this part of the world is greatly facilitated by a 1970-vintage map of the lots in the Catskill Park.) The plan is to ignore these inconsistencies in the import; the interagency coordination work that has been in progress for the last few years is likely to repair them.
Export to OSM format and quality control
With all this preliminary work done, the ogr2osm tool exports the parcels to OSM format. A final quality control check consists of loading them by themselves into JOSM and running the validator.
There is the expected cascade of warnings of duplicated nodes, since none of the steps above removes duplicates. After JOSM fixes the duplicates automatically, a second validation is run.
This final check results in only three warnings:
- There is a boundary inconsistency between the north and south Warner Creek Units. Inspecting the boundary manually, it appears that the intent is that it should follow the nearby stream, but neither linestring actually does so and both have large jumps, as if from a handheld GPS unit temporarily losing satellite coverage. Having neither field data nor an independent reference, the plan is simply to import the inconsistent data as it stands.
- A couple of parcels contain multipolygons consisting of two outer rings sharing a single node. JOSM doesn't like this, and the proponent has not found an easy way to work around the issue. As they are constructed, they render acceptably in Mapnik, QGIS and GRASS, so this issue is likely a case of JOSM being overly fussy. If anyone has an easy fix, these two parcels will be repaired manually.
The resulting OSM XML file may be inspected at https://kbk.is-a-geek.net/nycdep-import/nyc_bws_rec_area.osm.
Data Merge Workflow
With only a few hundred multipolygons to import, there's no real need to recruit a large team. The import will take place under a single dedicated account ke9tv-NYCDEP-import.
- One township at a time, use ogr2osm to make an OSM file of just the parcels that lie within the township's borders. This keeps each import to a manageable size.
- Load the OSM file into JOSM, download the data for its bounding box from the OSM server, revalidate, and upload with the appropriate changeset tags.
After the first township is uploaded, the process will pause for a few days to solicit feedback on the mailing lists. If no major problems are detected, the rest of the data should be uploaded in rapid succession, since very little manual work is required to process them.
The objects that require conflation are already identified, and are few enough in number that they can be readily handled manually.
Since the import consists solely of adding a set of tagged multipolygons, reversion should be equally simple. Using the JOSM/Plugins/Reverter plugin is expected to be adequate in the unlikely case that a revert is required.
It is hoped that the multiple checks:
- have this proposal and the associated OSM file reviewed by the community before beginning imports
- automatically check all the lands for collisions with other land uses
- automatically check data validity using the JOSM validator
- have the first changeset reviewed by the community before importing in bulk
will provide enough assurance that the import will be an improvement over what is already there. The proponent is willing to impose other reasonable controls, should readers suggest them.