Import/Poland Addresses

From OpenStreetMap Wiki
Jump to navigation Jump to search

This is an overview of the addition of address data for Poland. The data has been provided courtesy of GUGiK for use in OSM. For a list of other Esri-curated datasets that are available for mapping, please see Esri ArcGIS Datasets.

Goals

The goal of this effort is to allow contributors to use Poland Address Data to continue to expand to the coverage of addresses across the country using this data from GUGiK. This effort would facilitate the task of increasing coverage, while retaining the many addresses already represented in OSM.

Schedule

Data preparation was performed in January 2023. This OSM-ready data was reviewed by the Polish OSM community for its suitability to be imported through editor tools such as RapiD. The edits to OSM are to be performed incrementally by OSM contributors, performing manual imports of the data in RapiD or JOSM.

Source

The source address points were downloaded from GUGiK originally in December 2022 at the suggestion of the OSM Poland contributors, for the purpose of preparing for adding to OSM via editor tools such as RapiD. The GUGiK geoportal describes the original dataset in detail.

The processed address points that could be added to OSM are available to access on ArcGIS Online (see Poland Addresses). You can Open in Map Viewer to preview (click features to view tags) or sign in to export data for offline use.

OSM ODbL Compliance: Yes, the data is provided with explicit permission for use in OpenStreetMap.

Data Preparation

The processed address points were created using these Esri Data Processing Steps for Buildings and Addresses.

The data has initially been prepared for community review with five tags on each feature: addr:city, addr:housenumber, addr:place, addr:postcode, and addr:street. Not all feature will have values for each of these tags.

Original data is in GML format. It was parsed using custom script and loaded into PostgreSQL+Postgis database. OSM data was loaded into the same database using imposm3.

First city/street names were matched to official registry run by Polish Statistical Office (GUS) to confirm their correctness.

Polish community maintains a dictionary file with resolutions of shortened street names (things like Blvd. -> Boulevard). It was applied to the data set and osm compatible street names were used in later processing.

Some addresses were rejected due to obvious issues like: address in a city without street names (places legally designated as cities always use street names), missing or weird house numbers which don't make sense (contain 'test' string, contain street name, contains range or multiple numbers separated by space). Nowadays the gov data is pretty clean but there used to be weird entries in the past.

Address hash was created using the following formulas:

  • GOV data:
    • (lowercase city name + lowercase street name + standardized (uppercase, trimmed, backslashes converted to forward slashes, removed trailing slashes, removed dots/zeroes/slashes from the beginning) housenumber) -> md5 hash
  • OSM data:
    • (lowercase addr:city + lowercase addr:street + standardized housenumber) -> md5 hash
    • union
    • (lowercase addr:place + lowercase addr:street + standardized houseumber) -> md5 hash

Then addresses were filtered out if any of the following was true:

  • Hash matches and geometries are within 150 meters
  • Geometries are within 2 meters
  • Only standardized housenumbers matched and geometries were within 40 meters
  • There were duplicates (same city, street, housenumber) in gov data (there are legit addresses like that but here were filtered them out for safety)

Data Conflation

The processed address data contains 301,951 address points, most of which do not already exist as features in OSM. Raw data that was provided to Esri to be included in RapiD can be downloaded from here: https://budynki.openstreetmap.org.pl/dane/addresses_for_rapid.gpkg (We can turn on automatic process that will update the file weekly if that would helpful please create an issue in https://github.com/openstreetmap-polska/gugik2osm ).

Existing address features in OSM will not be replaced. The plan is to perform the updates in phases. In the first phase, only new address points that do not conflict with existing address points, or duplicate addresses that are being added to buildings, will be added. If the address is already associated with a building footprint (e.g. house) that is in OSM or available separately to add to OSM, then the address point will not be added separately. However, if the address point will provide additional detail for a building footprint (e.g. apartment building), such as a unit number, then it will be added separately. In future phases, existing OSM buildings that do not have complete addresses may be updated to include additional address tags.

Data Updates

The plan is to perform the updates using a RapiD and an updated Map with AI plugin for JOSM (see Esri blog post on new tools in OSM editors for more detail). The new tools enable OSM mappers to access ArcGIS Datasets hosted in ArcGIS Online and select individual features to use while editing OSM. The mapper is able to select a feature, review and edit the feature geometry and available fields, and then save their edits.

The mapper has the benefit of using existing features that have been created by the data provider, along with their available field values that have been pre-processed by Esri, while also being able to compare that feature with existing OSM data (e.g. street names) and imagery to ensure it is accurate and consistent.

Discussion