OSM Conflator

From OpenStreetMap Wiki
Jump to: navigation, search

OSM Conflator is a python script to merge a third-party dataset with coordinates and OpenStreetMap data. It produces an osmChange file ready to be validated and uploaded with JOSM or bulk_upload.py. The script was inspired by Osmsync, and to repeat the warning from its page, as with any other automated edit, it is essential that people read and adhere to the Automated Edits code of conduct.

The script was made by MAPS.ME and is developed on Github: https://github.com/mapsme/osm_conflate

How It Works

First, it asks a profile (see below) for a dataset. In a simplest case (which even does not require you to write any code), a dataset is a JSON array of objects with (id, lat, lon, tags) fields. "Tags" is an object with OpenStreetMap tags: amenity, name, opening_hours, ref and all that. Some of these are marked authoritative: when merging, their values replace values from OSM. For example, we may be sure that parking zones are correctly numbered in the source dataset, but we allow mappers to correct opening hours.

Then the conflator makes a query to the Overpass API. It uses a bounding box for all dataset points, and a set of tags from which it builds a query. Alternatively it can use an OSM XML file, filtered by query tags and, if needed, more vigorously with a profile function.

Matching consists of several steps:

  1. If the dataset has a dataset_id, the script searches for OSM objects with ref:dataset_id tag (e.g. ref:mos_parking=*) and updates their tags based on dataset points with matching identifiers. Objects with obsolete identifiers are deleted.
  2. Then it finds closest matching OSM objects for each dataset point. Maximum distance is specified in the profile and can vary depending on a dataset quality. Tags on these objects are also updated, minding the authoritative keys list.
  3. Unmatched dataset points are added as new nodes with a full set of tags from the profile.
  4. Remaining OSM objects are either deleted, or few tags are added to these, like fixme=This object might have been dismantled, please check.

Finally, the conflator produces an osmChange file out of the list it prepared, and writes it to the console or to an output file.

Usage

You would need Python 3. Clone the github repository, install required python packages (kdtree and requests), and run ./conflate.py -h: it will give you an overview of the options. It is recommended to run the conflation with these options:

   ./conflate.py <profile.py> -v -o result.osc

After testing the importing process, you should follow the import guidelines. At the least, you should post to your regional forum or to the imports@ mailing list:

  • What are you planning to import.
  • Why the license for the dataset is compatible with OSM contributor's terms (CC0 and PD sources are okay, CC-BY and more restrictive licenses would require a permission from an owner).
  • How many relevant objects there are in OSM now, how many will be altered and how many will be updated (the conflator prints these numbers).
  • Link to the profile you are using, and to a sample osmChange file that it produced.
  • A date for the final import, if there are no major objections. Give it a week or two, depending on an import size.

Profiles

There are example profiles, some of which were used for actual imports, in the profiles directory. Do study these, starting with the moscow_parkomats.py.

The conflator asks a profile for the following fields, any of which can be python functions:

dataset
A function that should return a list of dicts with 'id', 'lat', 'lon' and 'tags' keys. It is provided with a file objects for the first parameter, either an actual file or a wrapper to downloaded data.
download_url
When there is no source file, data is download from this URL and passed to the dataset function.
query
A list of key-value pairs for building an Overpass API query. These tuples are processed like this:
  • ("key",)["key"]
  • ("key", None)[!"key"]
  • ("key", "value")["key"="value"]
  • ("key", "~value")["key"~"value"]
qualifies
A function that receives a dict of tags and returns True if the OSM object with these tags should be matched to the dataset. All objects received from Overpass API or OSM XML file are passed through this function.
bbox
A bounding box for the Overpass API query. If True (the default), it is produced from all the source points. If False, it is the entire world. Specify a list of four numbers to use a custom bounding box: [minlat, minlon, maxlat, maxlon].
dataset_id
An identifier to construct the ref:whatever tag, which would hold the unique identifier for every matched object.
no_dataset_id
A boolean value, False by default. If True, the dataset_id is not required: the script won't store identifiers in objects, relying on geometric matching every time.
max_distance
Maximum distance in degrees for matching objects from a dataset and OSM. Default is 0.001 (roughly 110 meters).
master_tags
A list of keys, values of which from the dataset should always replace values on matched OSM objects.
source
Value of the source=* tag.
delete_unmatched
If set to True, unmatched OSM points will be deleted. Default is False: they are retagged instead.
tag_unmatched
A dict of tags to set/replace on unmatched OSM objects. Used with delete_unmatched is False or an object is an area: areas cannot be deleted with this script.

See Also