User:Lectrician1/OSMsync

From OpenStreetMap Wiki
Jump to navigation Jump to search

OSMsync is a proposed website and service that implements GIS-analysis functions, Osmsync, and OWL (OSM Watch List) to import, "link" (insert the dataset's feature IDs into their corresponding OSM feature), sync (automatically update the dataset with a GIS server API or manually), and manage open datasets in OSM.

The primary usage of the service is to link open datasets hosted by governments that use ArcGIS Server or GeoServer (no fully-proposed implementation yet), as they provide API endpoints that can be used to periodically update the linked data.

This helps establish OSM has a "master hub or database" (a secondary database) for all open geodata and have it act similar to Wikidata where External Identifiers link Items to outside websites. This helps build linked data and the semantic web.

Features

General

A user must sign in using their OSM account in order to use any of the features of the service.

All edits to the OSM database are executed via one approved bot/import account that represents the edits of all managed datasets.

Users can propose, configure, analyze, and approve imports of datasets without ever having to use QGIS, MapRoulette, the Wiki (it could still be automatically documented there), the import mailing list, or any other challenging and time-consuming tool that importers typically have to use in order to execute the import.

It is possible to manage the entire process of a import solely through the OSMsync website.

Initial import/link/update/sync

The core user feature of OSMsync involves the initial import/link/update of data into/in OSM. After this, most data management functions are handled by the service.

Definitions of use-cases:

  • Import: A user wants to import data from a dataset where most of the data does not exist in OSM yet.
  • Link: Most of the data has already been manually mapped in OSM, however it should be linked for useful purposes.
  • Update/Sync: OSM data had previously been mapped and linked through an import and whose data needs to be updated and/or continually synced from now on.

"Import" will be the term used in-place of all use-cases for the rest of this documentation since all use-cases involve importing some data into OSM.

Whatever the use-case, the user must follow this process in order to import and continuously-sync the dataset with OSM.

  1. User inputs the following documentation about the import into a form:
    1. Import name
    2. Import description
    3. Proof of compatible license
    4. GeoService (preferred) or GeoJSON API endpoint
      API URIs found in an ArcGIS data portal
      GeoService is preferred because the you can then access the layer JSON metadata like editingInfo.lastEditDate to check if the dataset is up-to-date. Example
      Or
      GIS file (will require manual re-upload of updated dataset to re-sync features)
  2. OSMsync analyzes and finds the feature attributes of the imported dataset.
    1. If a GeoService API endpoint was provided, feature attributes can be found via JSON metadata via the fields JSON attribute.
      Other relevant indicator JSON attributes may be used as well like objectIdField to automatically indicate the which dataset attribute represents the ID of the feature in the source dataset.
    2. If GeoJSON or a other GIS file format was provided, a parsing of the provided file can retrieve the dataset attributes.
  3. User inputs the descriptions and correlated OSM tags for each of the retrieved dataset attributes that will be used in the import (generates a Attribute Map).
    User has the additional options of:
    1. Excluding specific tags from being imported or overriding current tags for all or for specific cases
    2. Specifying a reformatting function that reformats the values of the original dataset to OSM-acceptable values
    3. Specifying a complex element manipulation function that can move and/or duplicate imported tags among other elements that are members of the same relation, nearby, or other Overpass-like relational functions.
  4. User chooses how they would like to import the dataset relative to current OSM data.

    Note: If the user wants to resolve all conflicts between the imported data and OSM data without the help of the service yet still take advantage of the syncing feature, they should rather import the data manually themselves and then do a Sync import.

    Definition: Exclusion condition
    Intersecting, nearby, geometrically-similar, or other similarity function of already-existing OSM features that meet specified constraints This is basically a QGIS "join attributes by location" or other vector function.

    For all imported elements, regardless if they meet the exclusion conditions, their IDs in the imported dataset should be imported into OSM for syncing and linking purposes.

    Choices:
    1. Complete import: Imports all features.
      Not recommended due to likely already-existing features and duplicates should be avoided.
    2. Exclusion import: Does not import geometries or tags of features that meet the exclusion conditions.
      For example, if a building already exists in OSM, then the building geometry and its tags are not imported into OSM.
    3. Tag import: Does not import the geometries of the imported features that meet the exclusion conditions but does import and merge the tags of the features that meet the exclusion conditions with the OSM features that correlate to the exclusion condition.
      For example, if a building already exists in OSM, the tags of the correlating building in the dataset will be imported into that building and the geometry will remain the same, even if it is different.
    4. Geometry import: Does not import the tags of the imported features that meet the exclusion conditions but does import and replace the geometry of the features that meet the exclusion conditions with the OSM features that correlate to the exclusion condition.
      For example, if a building already exists in OSM, the nodes that make up building area (way) of the correlating building in the dataset will be replaced by the ones in the imported dataset. The tags and original way ID in OSM will remain.
    5. Update/sync import: The imported dataset's IDs already exist in OSM and the user chooses which features should be updated, if they should be updated at all.
    6. Custom import: A mixture of a Tag or Geometry or even an Update import programmed based on specified conditions.
    7. Reviewed import: All or some (based on conditions) dataset features are manually reviewed and/or edited by contributors on the website before their data is imported.
      For example, a Reviewed import states that features that meet exclusion conditions are marked for review. If a reviewer finds that the geometry or tags of a feature under review is better than the OSM data, they can choose what data to replace.
      This example/import-choice is the most-appropriate for most imports.
  5. OSMsync analyzes and configures the yet-to-be-imported data based on the selected option and conditions. This is all done server-side with GIS analysis functions like those found in QGIS. The user can also tune the results of the data until they have a dataset they are ready to import.
  6. User chooses how they would like OSMsync to automatically manage the data post-import. This includes update-check frequency (how often should OSMsync check the database source to see if it's updated), protected-data enforcement (if the tag or feature of a linked dataset is removed, should it be automatically readded?), and conflict-management.
    For example, if future conflicts arise with new data if the linked dataset is updated (a building is mapped in OSM and the dataset is updated later), should OSMsync send that conflict for review? Should the data from the dataset always override and replace the data in OSM? Similar options to the initial import choices are given to the user to decide upon.
    If the user imported a dataset manually, the selected post-import management plan is not enforced, but can be used as a recommendation if the dataset was to be updated again.
  7. When the user thinks the import and post-import configuration is ready, they mark it for approval on OSMsync, the community is notified and they can vote on the proposed import. The community votes on both the initial import and post-import management plan. The post-import plan can always be changed via a succeeding proposal.

Post-import

After the initial import, OSMsync checks for updates in the source dataset at the selected frequency-check. If there is an update, the post-import plan is followed.

  • Osmsync is used to update tags and the OSM API is used to update geometries for already-linked features.
  • If new features are part of the update, then the post-import plan is still follow and conflicts are addressed accordingly.
  • The dataset-license of the update should be rechecked so that it is known it is still compatible.

The synced dataset can be added to your watchlist via a hosted instance of OWL. This is to ensure that no links are being deleted and tagging can remain monitored.

If tagging conventions change, OSMsync's website can be used to propose and update the tags of a dataset.

The power of external IDs

  • Data scientist wants to use road data from official dataset, yet doesn't want to make a query removing all of the OSM roads and wants to maintain the roads that are not in the official dataset. All they have to do is remove the roads with the external iDs.
  • TIGER comparison layer like shown in ID but for every dataset.
  • Review of features in OSMsync present on RapiD and/or ID.
  • Features that have links to multiple external databases. For example, addresses have links in city, county, state/province, country datasets, yet these datasets aren't even linked to each other and are instead copied from each other! National governments will be looking to OSM for where reliable data is linked, not bothering compiling it themselves!

Stuff to know

  • Features in OSM do not need to have the same geometries as their linked external dataset counterparts. Many times external datasets will have more, less, or near-accurate data with OSM. For example, road centerlines. They aren't going to be perfect in some comparisons with OSM data. However, they should have similar bounds. For example, if a road way in a dataset extends between two intersections without splitting, yet in OSM it splits at a point in the middle, well then in OSM you should probably make a relation combining those 2 ways with the ID of the external dataset in as a tag in the relation. Why a relation? Software might get confused if it sees 2 ways with the same ID and might now know how to combine them. A relation of 2 ways can be treated as a MultiLine Simple Feature, which is much easier to merge and treat as one. If there were 2 ways in the original dataset and 1 in OSM, well then the way in OSM should be split at a similar location and the resulting ways should be given the IDs.

Furthering OSMsync

  • Collaboration with ArcGIS Open Data Hub?
  • Encouraging data portals to update more-often.
  • Notifying data portals that their data is in-sync on OSM but out-of-sync on their portal.