OSM Conflator is a python script to merge a third-party dataset with coordinates and OpenStreetMap data. It produces an osmChange file ready to be validated and uploaded with JOSM or bulk_upload.py. The script was inspired by Osmsync, and to repeat the warning from its page, as with any other automated edit, it is essential that people read and adhere to the Automated Edits code of conduct.
The script was made by MAPS.ME and is developed on Github: https://github.com/mapsme/osm_conflate
How It Works
First, it asks a profile (see below) for a dataset. In a simplest case (which even does not require you to write any code), a dataset is a JSON array of objects with (id, lat, lon, tags) fields. "Tags" is an object with OpenStreetMap tags: amenity, name, opening_hours, ref and all that. Some of these are marked authoritative: when merging, their values replace values from OSM. For example, we may be sure that parking zones are correctly numbered in the source dataset, but we allow mappers to correct opening hours.
Then the conflator makes a query to the Overpass API. It uses a bounding box for all dataset points, and a set of tags from which it builds a query. Alternatively it can use an OSM XML file, filtered by query tags and, if needed, more vigorously with a profile function.
Matching consists of several steps:
- If the dataset has a dataset_id, the script searches for OSM objects with ref:dataset_id tag (e.g. ref:mos_parking=*) and updates their tags based on dataset points with matching identifiers. Objects with obsolete identifiers are deleted.
- Then it finds closest matching OSM objects for each dataset point. Maximum distance is specified in the profile and can vary depending on a dataset quality. Tags on these objects are also updated, minding the authoritative keys list.
- Unmatched dataset points are added as new nodes with a full set of tags from the profile.
- Remaining OSM objects are either deleted, or few tags are added to these, like fixme=This object might have been dismantled, please check.
Finally, the conflator produces an osmChange file out of the list it prepared, and writes it to the console or to an output file.
You would need Python 3. Clone the github repository, install required python packages (kdtree and requests), and run ./conflate.py -h: it will give you an overview of the options. It is recommended to run the conflation with these options:
./conflate.py <profile.py> -v -o result.osc
- What are you planning to import.
- Why the license for the dataset is compatible with OSM contributor's terms (CC0 and PD sources are okay, CC-BY and more restrictive licenses would require a permission from an owner).
- How many relevant objects there are in OSM now, how many will be altered and how many will be updated (the conflator prints these numbers).
- Link to the profile you are using, and to a sample osmChange file that it produced.
- A date for the final import, if there are no major objections. Give it a week or two, depending on an import size.
The conflator asks a profile for the following fields, any of which can be python functions:
- A function that should return a list of dicts with 'id', 'lat', 'lon' and 'tags' keys. It is provided with a file objects for the first parameter, either an actual file or a wrapper to downloaded data.
- When there is no source file, data is download from this URL and passed to the dataset function.
- A list of key-value pairs for building an Overpass API query. These tuples are processed like this:
- ("key",) → ["key"]
- ("key", None) → [!"key"]
- ("key", "value") → ["key"="value"]
- ("key", "~value") → ["key"~"value"]
- A function that receives a dict of tags and returns True if the OSM object with these tags should be matched to the dataset. All objects received from Overpass API or OSM XML file are passed through this function.
- A bounding box for the Overpass API query. If True (the default), it is produced from all the source points. If False, it is the entire world. Specify a list of four numbers to use a custom bounding box: [minlat, minlon, maxlat, maxlon].
- An identifier to construct the ref:whatever tag, which would hold the unique identifier for every matched object.
- A boolean value, False by default. If True, the dataset_id is not required: the script won't store identifiers in objects, relying on geometric matching every time.
- Maximum distance in degrees for matching objects from a dataset and OSM. Default is 0.001 (roughly 110 meters).
- A list of keys, values of which from the dataset should always replace values on matched OSM objects.
- Value of the source=* tag.
- If set to True, unmatched OSM points will be deleted. Default is False: they are retagged instead.
- A dict of tags to set/replace on unmatched OSM objects. Used with delete_unmatched is False or an object is an area: areas cannot be deleted with this script.