Automated edits/TTmechanicalupdates/Fix issue with duplicated inner polygons in Canada

From OpenStreetMap Wiki
Jump to navigation Jump to search

Who

TomTom team using TTmechanicalupdates bot account.

The team can be contacted at OSM@tomtom.com.

Why

Phase 2

During the first iteration of the bot it was discovered that:

  • there are cases which have 3 or more duplicated ways: initially our bot handled only duplicated ways (meaning exactly 2 duplicated ways) --> 151,339 issues were skipped in the first bot iteration
  • the issue of "double inner polygons" is also coming from other sources, not only from CanVec --> 21,628 issues were skipped because of a source tag which didn't contain CanVec in the value. (This is further explained in the 'Discussion - Phase 1' section of this document.)

Still, these cases are incorrect and need to be fixed. So, we will slightly modify the code by adding functionality to solve such scenarios and run a second phase of the bot. We expect to automatically solve up to 80% of the above issues. (There might still be individual cases that cannot be fixed in a mechanical way).

Phase 1

Based on the Osmose Rule 1170 Class 1 "Double inner polygon" (the geometry of the multipolygon inner ring is duplicated: one is in a relation but without a tag and another has tags but is not part of the relation), we have detected approximately 570,000 such issues in Canada only.

Verification of the data showed that these issues are mainly caused by imports of the CanVec source. According to the CanVec OSM wiki documentation the issue is already known - it is mentioned there as "Duplicate land features" issue. Following the OpenStreetMap wiki: if the inner way represents something in itself (e.g., a forest with a hole where the hole is a lake), then the inner way must be tagged as such.


Examples

Phase 2

Example of what will be fixed in Phase 2:

Example 2.png

The above example shows a case where there are more than 2 duplicated ways. Relation 3602683 has way 268798550 as an inner ring, which has exactly the same geometry as and shares the same nodes with inner ring way 268797546 of relation 3602591. There is also a duplicated way 268797439 sharing the same nodes, without an association to any relation. Ways which are part of the relations have no tags and the duplicating way has tags assigned.

Example of what will not be fixed:

Example5.png

Cases like this one won't be fixed because both duplicating ways 261328743 and 261328705 have different tags. It is impossible to determine in an automated way which of them is correct and which should be removed.

Phase 1

Canada bot examples.png

The first example shows the relation with ID=3656347. This relation has 116 members, where one way with ID=274712968 (hereinafter referred to as Inner Ring Way) has a duplicating way ID=274712970 (hereinafter referred to as Duplicating Way) which is assigned to the same nodes as the Inner Ring Way, but is not a member of the relation (ID=3656347). In addition, the Inner Ring Way has no tags assigned, and the Duplicating Way has assigned tags, which would suggest that the Duplicating Way should be a member of relation ID=3656347.

Other examples:


Algorithm

Phase 2

The bot takes violations from Osmose (rule id 1170, class 1) as input data. For each violation, data from OSM is fetched and violations are verified one more time.

Violations with common way ids (separate violations in Osmose, with repeating way id) are grouped into one changeset.

The following verifiers are executed:

  • all ways are closed,
  • all ways have the same nodes (direction of way digitalization and starting node can be different),
  • the duplicating way should have tags*,
  • the duplicating way is not a member of any relation,
  • inner ring ways should have no tags,
  • inner ring ways are a member of only one relation (the one from the Osmose violation),
  • optional: relation and duplicating way should have a required "source" tag (e.g., "source=CanVec 8.0 - NRCan")**.

*In case of multiple duplicating ways, these violations will be skipped.

**In the second iteration, the run will be performed without a source check.

When a violation is confirmed by the bot, data modification is performed.

Data modification is understood as:

Basic scenario:

  • copying tags from the way, which does not belong to any relation to the way, which is a member of a violating relation,
  • removing way which does not belong to any relation.

Three or more duplicated ways scenario:

  • removing inner ring ways which belong to relations,
  • assigning a duplicating way to those relations.

Phase 1

Canada bot algorithm.png

The bot takes violations from OSMOSE (rule id 1170, class 1) as input data. For each violation, data from OSM is fetched and violations are verified one more time.

The following verifiers are executed:

  • both ways are closed,
  • both ways have the same nodes (direction of way digitalization and starting node can be different),
  • duplicating way should have tags,
  • duplicating way is not a member of any relation,
  • inner ring way should have no tags,
  • inner ring way is a member of only 1 relation (the one from the Osmose violation),
  • optional: relation and duplicating way should have a required "source" tag (e.g., "source=CanVec 8.0 - NRCan).

When a violation is confirmed by the bot, data modification is performed.

Data modification is understood as:

  • copying tags from the way which does not belong to any relation, to the way which is a member of a violating relation,
  • removing way which does not belong to any relation.

Link to the GitHub: https://github.com/tomtom-international/osm-bots/tree/main/bot-double-inner-ring

Test Run

Before running the bot on the whole of Canada, we will run the same automated updates on a smaller area. For this, we've selected Southwestern Ontario, where we have 800 cases logged by the Osmose rule.


Bot runs

Phase 2

Just as with Phase 1, not to overwhelm the system, the bot will be executed in multiple iterations.

Phase 1

To make sure that the system is not overloaded, we plan to run the bot in parts based on Osmose regions. Below you can see the proposed approach:

Order Osmose region Count of issues from Osmose
1 canada_ontario_southwestern_ontario - TEST RUN 812
2 canada_quebec_montreal, canada_quebec_laval, canada_quebec_centre_du_quebec, canada_prince_edward_island, canada_ontario_golden_horseshoe, canada_quebec_estrie, canada_quebec_monteregie 2176
3 canada_quebec_chaudiere_appalaches 1127
4 canada_quebec_gaspesie_iles_de_la_madeleine 2116
5 canada_nunavut 2561
6 canada_quebec_bas_saint_laurent 2662
7 canada_ontario_eastern_ontario 5194
8 canada_british_columbia 6082
9 canada_quebec_lanaudiere 6144
10 canada_quebec_capitale_nationale 6433
11 canada_quebec_laurentides 6806
12 canada_ontario_northwestern_ontario 7562
13 canada_saskatchewan 8798
14 canada_quebec_outaouais 9093
15 canada_nova_scotia 9133
16 canada_yukon 9646
17 canada_new_brunswick 10037
18 canada_ontario_central_ontario 10335
19 canada_quebec_abitibi_temiscamingue 11895
20 canada_newfoundland_and_labrador 19094
21 canada_quebec_mauricie 19518
22 canada_ontario_northeastern_ontario 23560
23 canada_alberta 39736
24 canada_quebec_nord_du_quebec 42628
25 canada_quebec_saguenay_lac_saint_jean 48749
26 canada_northwest_territories 63542
27 canada_quebec_cote_nord 70243
28 canada_manitoba 126317


Discussion

Phase 2

The initial phase of the bot was intended to fix data primarily coming from CanVec imports. The main reasons were the following:

  • tag "source" containing "CanVec" represented the majority of corrupted cases (this is showed precisely with counts in the table in "Phase 1" section below)
  • thanks to OSM users explanation (https://lists.openstreetmap.org/pipermail/talk-ca/2021-December/010185.html) and info stored on wiki pages (https://wiki.openstreetmap.org/wiki/CanVec#Issues_found_in_OSM) the team was sure those are indeed errors.

However, further analysis has shown that cases with other source values are also incorrect (as they are also logged in Osmose). Moreover, a great majority of them can be solved automatically as well.

Phase 1

The problem was found mainly in data coming from CanVec imports, but it was also verified on a sample that data without any source in the tag follow exactly the same pattern. Should we consider running the bot only on cases where we have data imported from CanVec? Or can data without any source be updated as well?

The values of "source" tag and their counts proposed in the first iteration of the bot (embolden below):

source of relation (tag: source=*) count
NRCan-CanVec-10.0 299568
NRCan-CanVec-8.0 109732
NRCan-CanVec-7.0 102282
CanVec 6.0 - NRCan 53093
NRCan-CanVec-10.0 + Bing aerial 1103
NRCan-CanVec-10.0;NRCan-CanVec-8.0 187
NRCan-CanVec-10.0 + Bing aerial + DigitalGlobe 166
CanVec_Import_2009 145
CanVec 4.0 - NRCan 122
NRCan-CanVec-10.0 + Bing aerial + DigitalGlobe 115
(blank) 3214

Announcement and discussion was initiated on Canada mailing list (Dec 2021) - https://lists.openstreetmap.org/pipermail/talk-ca/2021-December/010184.html


Opt out

To opt out of this automated update, please write an e-mail (in English) to TTmechanicalupdates@groups.tomtom.com describing which area or source version should be excluded from the update scope and why.


When

Phase 2

Runs performed between 1 Feb 2022 - 4 Feb 2022.

Phase 1

Runs and further analysis performed between 27 Dec 2021 - 12 Jan 2022.

Test run

Test run (Southwestern Ontario) done on 13 Dec 2021.

Outcome

We will be populating this section as we are running the bot.

General summary (phase 1 and 2)

96% of the issues logged by Osmose at the beginning of December were fixed.

Total violations Fixed violations
572148 554258

Phase 2 details

Scope: Canada

Start Date: 1 Feb 2022

General summary of second iteration:

Opened changesets Total violations Fixed violations Found duplicates* Fixed duplicates Filtered out by verifiers** Others rejected***
4234 176384 160847 150012 139716 176 5649

Below you can see the results of the second bot iteration per region:

Run No. Region Opened changesets Total violations Fixed violations Found duplicates* Fixed duplicates Filtered out by verifiers** Others rejected***
41 canada_ontario_southwestern_ontario 2 64 24 40 2 0 0
49 canada_quebec_gaspesie_iles_de_la_madeleine 4 63 56 28 24 0 4
50 canada_new_brunswick 61 1009 887 865 770 1 50
51 canada_quebec_chaudiere_appalaches 6 136 92 72 44 0 29
52 canada_quebec_bas_saint_laurent 10 149 83 72 62 1 55
53 canada_british_columbia 5 241 223 32 0 0 34
42 canada_quebec_montreal 1 6 5 0 0 0 0
43 canada_quebec_laval 1 9 9 0 0 0 0
44 canada_quebec_centre_du_quebec 3 28 27 8 6 0 1
45 canada_prince_edward_island 2 36 11 24 4 0 25
46 canada_ontario_golden_horseshoe 2 59 55 4 0 0 4
47 canada_quebec_estrie 3 99 74 24 4 0 25
48 canada_quebec_monteregie 2 38 18 24 4 0 20
54 canada_newfoundland_and_labrador 20 287 215 186 118 0 4
55 canada_quebec_capitale_nationale 12 349 320 108 88 0 10
56 canada_ontario_eastern_ontario 5 370 43 322 14 0 43
57 canada_yukon 10 376 279 328 250 0 19
58 canada_nunavut 10 394 374 388 370 0 12
59 canada_quebec_lanaudiere 13 591 552 60 44 1 22
60 canada_nova_scotia 22 906 493 426 138 0 195
61 canada_saskatchewan 29 1229 866 890 798 1 270
62 canada_ontario_central_ontario 24 1444 297 1246 114 0 16
63 canada_ontario_northeastern_ontario 223 6815 5708 6078 5146 4 171
64 canada_quebec_abitibi_temiscamingue 44 2042 1829 602 442 2 33
65 canada_quebec_mauricie 47 2128 1909 746 570 0 43
66 canada_quebec_saguenay_lac_saint_jean 102 2827 1284 2570 1214 0 182
67 canada_quebec_cote_nord 127 5539 3537 3407 2138 0 1099
68 canada_quebec_laurentides 126 6211 5973 204 64 1 15
70 canada_quebec_outaouais 143 7130 6964 206 46 0 4
71 canada_quebec_nord_du_quebec 475 16157 15821 15722 15722 0 118
72 canada_northwest_territories 619 21573 17303 20368 17282 162 1202
73 canada_alberta 692 31269 29133 29106 28916 2 1944
74 canada_manitoba 1389 66810 66383 65856 65322 1 0

*Found duplicates - cases were there are more than 2 duplicated ways

**Filtered out by verifiers - cases which are rejected because they are not passing all the algorithm criteria

***Others rejected - other reasons of rejection, usually incomplete data, rather not solvable in an automatic way

Phase 1 details

Scope: Canada

Start Date: 27 Dec 2021

General summary of the first bot iteration:

Opened changesets Total violations Fixed violations Filtered duplicates* Filtered out by verifiers** Others rejected***
8460 572148 393411 151339 21628 5770

Below you can see the results of the first bot iteration per region:

Run No. Region Opened changesets Total violations Fixed violations Filtered duplicates* Filtered out by verifiers** Others rejected***
1 canada_southwesternontario (test run) 21 1149 748 80 311 10
2
canada_quebec_montreal 1 6 1 0 5 0
canada_quebec_laval 1 10 1 0 9 0
canada_quebec_centre_du_quebec 5 211 183 6 21 1
canada_prince_edward_island 5 244 206 14 7 17
canada_ontario_golden_horseshoe 8 360 301 4 55 0
canada_quebec_estrie 14 675 577 24 71 3
canada_quebec_monteregie 13 672 598 28 36 10
3 canada_quebec_chaudiere_appalaches 22 1127 991 68 48 20
4 canada_quebec_gaspesie_iles_de_la_madeleine 42 2116 2053 26 33 4
5 canada_nunavut 44 2560 2162 388 4 6
6 canada_quebec_bas_saint_laurent 52 2662 2550 72 20 20
7 canada_ontario_eastern_ontario 98 5194 4822 324 29 19
8 canada_british_columbia 245 6086 5845 16 223 2
9 canada_quebec_lanaudiere 121 6144 5551 60 511 22
10 canada_quebec_capitale_nationale 127 6433 6077 108 236 12
11 canada_quebec_laurentides 79 6806 585 206 5914 101
12 canada_ontario_northwestern_ontario 133 7562 5973 919 666 4
13 canada_saskatchewan 159 8798 7566 890 72 270
14 canada_quebec_outaouais 119 9093 1951 206 6930 6
15 canada_nova_scotia 174 9133 8170 460 368 135
16 canada_yukon 187 9644 9266 330 29 19
17 canada_new_brunswick 184 10033 8994 853 142 44
18 canada_ontario_central_ontario 181 10350 8886 1279 171 14
19 canada_quebec_abitibi_temiscamingue 220 11894 9851 622 1387 34
20 canada_newfoundland_and_labrador 379 19093 18806 186 97 4
21 canada_quebec_mauricie 376 19518 17388 748 1339 43
22 canada_ontario_northeastern_ontario 350 23560 16718 6090 581 171
23 canada_alberta 212 39730 8469 29106 211 1944
24 canada_quebec_nord_du_quebec 534 42627 26470 15940 99 118
25 canada_quebec_saguenay_lac_saint_jean 924 48749 45919 2574 74 182
26 canada_northwest_territories 864 63542 41789 20368 183 1202
27 canada_quebec_cote_nord 1356 70055 64442 3488 1545 580
28 canada_manitoba 1210 126312 59502 65856 201 753

*Filtered duplicates - cases were there are more than 2 duplicated ways

**Filtered out by verifiers - cases which are rejected because they are not passing all the algorithm criteria

***Others rejected - other reasons of rejection, usually incomplete data, rather not solvable in an automatic way

Test run details

Scope: Canada, Region: Canada_SouthWesternOntario

Date: 13 Dec 2021

Violations source date: 2021.12.14

Total Violations count: 812

Uploaded fixes: 748

Source verifier rejected: 20 (due to absence of source tag on relation and/or way)

Filtered out due to duplicated way id: 40

Incomplete data (inner and duplicated had tags): 4

Total opened changesets: 21 (114877687, 114877723, 114877771, 114877830, 114880554, 114912615, 114912679, 114912722, 114912759, 114912829, 114912911, 114912962, 114913049, 114913121, 114913138, 114966317, 114966347, 114966387, 114966424, 114966446, 114966479)

Total time of run: app 26 minutes

Additional comment: During first run execution we have encountered issues with more than 2 duplicated ways. Such situations shouldn't be fixed automatically (intention was to handle "double" polygons, other cases might require manual review), so we have modified the code for now to find and skip such cases.