for conflating (merging) of data
This is a plugin for conflating (merging) objects in JOSM. For now it focuses on one-to-one matches between POIs, such as addresses and buildings, parks, schools, bus stops, etc. It leverages the Replace Geometry command from utilsplugin2 to replace or upgrade one OSM object to another.
- 1 Motivating use cases
- 2 Installation
- 3 Definitions
- 4 Usage
- 5 Implementation
- 6 Future work
- 7 Development
- 8 See also
- 9 External links
Motivating use cases
In 2009 feature data from GNIS was imported into OSM as nodes. Since that time some have been converted to areas, been moved, or even replaced. In this process the gnis:feature_id has sometimes been lost, and the conflation process could add back the correct ID. Also, GNIS has provided updates since 2009, which could be added to OSM with this tool. The conflation plugin could merge this data, with a strong weighting towards matching gnis_feature_id=* and name=*.
Replacing address nodes with building polygons
Consider replacing address nodes in OSM with high quality building data from a local authority. If the buildings have addresses associated with them, then matches would be heavily weighted towards matching addr:housenumber=* and addr:street=*. If there is no such addressing information, the data could still be conflated by setting a threshhold on the distance of perhaps 20 meters.
Updating Building Footprints
Buildings have been imported or traced by hand. An external data source has been released, and you would like to update the building in OSM to use the best building footprint from either source.
Optimal parking space assignment
This is for fun rather than for conflating data with OSM. Imagine you have a neighborhood where you've mapped all the parking spaces and the houses. You can "conflate" the two only using distance in the cost calculation to determine the optimal assignment of houses to parking spaces. If each house gets two parking spaces, you could duplicate (copy and paste) the house nodes/polygons, and you'd get two spaces matched to each house.
Follow these instructions to install the conflation plugin automatically. If you plan on doing matches over 500 objects, you should use the 64 bit java VM, so you don't run out of memory.
- Reference: the dataset which is presumed to have the best spatial accuracy and attribute (tag) comprehensiveness, typically a third-party dataset such as from a government entity
- Subject: the dataset which will have the geometry and/or attributes of the reference dataset merged into it, typically an up-to-date dataset downloaded from the OSM servers
The plugin requires that the Reference dataset be in a format that can be loaded by JOSM (an OSM file). Download the area that you would like edit first, then use the File | Open command to load in the external OSM file as a new layer. If you open the file external first, then download JOSM will download the data into the existing layer, which will make selections very difficult.
- If not already shown, enable the conflation dialog from the left panel.
- Click Configure, select some objects with better geometry and/or tags and click Freeze' in the Reference panel.
- Do the same for some other objects in the Subject panel.
- Click OK.
Doing the selection will require using the Edit | Search window. If you are conflating ways, you need to only select ways and not points. Conversely, if you are conflating points, you don't want to have any ways selected. Currently the plugin does not support relations, so they should never be selected. For example, "building=* type:way", will correctly select buildings. The type:way is needed to insure that no relations or points are selected. It is not possible to just select an area with the mouse.
In the conflation dialog, there are three tabs, Matches which lists matches found between the reference and subject layer, and Reference only and Subject only which lists unmatched objects from each dataset.
- Double-click on a match or unmatched object to zoom and center on and to select the object(s).
- Select one or more match and click Conflate to execute Replace Geometry. Select one or more objects from the Reference only list to copy them to the subject layer.
Note that you can also use the same selection for both target and source; the algorithm assigns a very high cost if both objects are the same, so a match should never occur with the same object as both target and source.
- Reuses existing search and selection functionality for creating reference and subject selections
- Uses Java Conflation Suite (JCS) and Java Topology Suite (JTS) for calculating scores between reference and subject objects, and generating the matches
- Use Replace Geometry from utilsplugin2 to conflate the objects
This is an incomplete and unordered list of tasks that I've thought of. If these or any others would be useful to you, please create a ticket for them.
- Allow for merging of just tags, and not geometry
- Add costs for "distance" between tags, e.g. by using Levenshtein distance (already implemented in JOSM core)
- Add thresholds and weights to individual costs (e.g. don't allow match if distance is greater than 100m)
- Allow clicking on arrow in conflation layer to select candidate match
- Allow declaring an invalid match, and using that information to regenerate the matches
- Add keyboard shortcuts to quickly go through matches, with keys for next, previous, merge, "mismatch", etc.
- Use color coding in arrows and table to indicate confidence of match
- Allow for custom cost and assignment methods
- Conflation/Nodes - initial design sketch