Mechanical Edits/Mateusz Konieczny - bot account/fixing malformed surface tags

From OpenStreetMap Wiki
Jump to navigation Jump to search

Page content created as advised on Automated_Edits_code_of_conduct#Document and discuss your plans.

This edit will remove useless surface=* tags

Who

I, Mateusz Konieczny using my bot account

contact

message via OSM I will respond also to PMs to the bot account, though messaging my main account is preferable as I will get notifications in OSM editors.

English and Polish languages are preferable, for other I need to use an automatic translator.


Why

Why it is useful? It helps newbies to avoid becoming confused. It protects against such values becoming established. Without drudgery that would be required from the manual cleanup. It also makes easier to add missing surface= values and makes easier to use OpenStreetMap data, including support in editors which explain/translate meaning of surface values.

Why automatic edit? I have a massive queue (in thousands and tens of thousands) of automatically detectable issues which are not reported by mainstream validators, require fixes and fix requires review or complete manual cleanup.

There is no point in manual drudgery here, with values obviously fixable.

This values here do NOT require manual overwiev. If this cases will turn out to be an useful signal of invalid editing than I will remain reviewing nearby areas where bot edited.

Yes, bot edit WILL cause objects to be edited. Nevertheless, as result map data quality will improve.

What

replace following surface tags by doing an automated edit:

obvious typos:

different form than standard surface value:

Polish name to English one:

English vs very close to English but actually different:

Additional enabled by https://lists.openstreetmap.org/pipermail/talk/2023-February/088076.html

see https://community.openstreetmap.org/t/surface-artificial-grass-vs-surface-artificial-turf/6295

the same or even more accurate

something went wrong with autocomplete

name in a different language, but with a clear meaning

in several languages "beton" is word for concrete, so far I was opening notes and asking in changesets and in every single case it was a a clearly correct replacement

translating from Polish

we do not have unpaved_dirt either

typos

Additional enabled by https://lists.openstreetmap.org/pipermail/talk/2023-March/088125.html

  1. added by TendaiNkomo - see https://www.openstreetmap.org/changeset/67017223 where I tried to contact them
  1. surface=dirt would be incorrect, "dirt road" refers also to surface=compacted
  1. https://www.openstreetmap.org/changeset/48215497 https://www.openstreetmap.org/changeset/67215079 https://www.openstreetmap.org/changeset/25703937
  1. apparently autocomplete accident
  1. low use, detected via detector of values very likely to by typoed or having shiFt accident
  2. still verified whether indicating obvious issues
  1. surface=bamboo is not documented, this replacement is still useful
  1. also reviewed, no special comments
  1. more surface values with trailing letter/number
  2. especially 2 and q are common - missclick of tab button? Similarly 1 and 3
  3. and c - missclick of ctrl+c?
  4. and C - missclick of ctrl+c and using shift+c?
  1. I have not reviewed this values specifically - but I reviewed many other single-extra-letter-cases
  2. all values here are low use, some may be used once
  3. I expect that reliability here will be the same as sample which I verified based on aerial images
  4. for obvious mistake or indicators of problems

Note that some values were skipped! for example

as were not some obvious typos

there are also many low-use values with two extra bogus characters, for example

"would be also OK to migrate them without listing them for review here and just add them to replace list? And other similar obvious typos appearing or found in future?" caused no protests and seems to be accepted

extension proposed in Fall 2023

French from https://lists.openstreetmap.org/pipermail/talk/2023-April/088164.html review at https://forum.openstreetmap.fr/t/review-requested-before-proposing-bot-edit-for-automated-fixing-of-surface-values/18419

further obvious typos added started to be replaced in late 2023

"there are also many low-use values with one or two or three extra bogus characters, for example surface = artificial_turf22 → surface = artificial_turf

would be also OK to migrate them without listing them for review here and just add them to replace list later? And other similar obvious typos appearing or found in future?

Only low use obvious mistakes would be changed. If anyone at all will protest and I will not do this and post for review, like list here, once sufficiently many values are found."

https://lists.openstreetmap.org/pipermail/talk/2023-November/088437.html

Numbers

Large enough to make it useful to automate it.

How

state after a mechanical edit:

Changeset would be described and tagged with tags that mark it as automatic, provide link to discussion approving edit, include link promoting https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/ etc

Discussion

Discussed at talk mailing list at https://lists.openstreetmap.org/pipermail/talk/2023-February/088048.html and https://lists.openstreetmap.org/pipermail/talk/2023-February/088076.html

Spring 2023 Extension discussed https://lists.openstreetmap.org/pipermail/talk/2023-March/088125.html

Fall 2023 extension discussed at https://community.openstreetmap.org/t/proposed-bot-edit-automatic-replacement-of-surface-values-where-it-is-safe/105361 and https://forum.openstreetmap.fr/t/review-requested-before-proposing-bot-edit-for-automated-fixing-of-surface-values/18419 and https://lists.openstreetmap.org/pipermail/talk/2023-November/088437.html

Repetition

This is reoccurring edit and may be made as soon as new matching elements appear. At this moment triggering new edit requires human intervention so exact schedule is not predictable and bot may stop running at any moment.

This can change in a future. If bot is abandoned and does not run, feel free to ping me. If I am unable to run it any more feel free to use my code. Note that it may require going through bot approval process again and that code is on specific license.

https://codeberg.org/matkoniecz/OpenStreetMap_cleanup_scripts/src/branch/master/recurrent_bot_edits may have more up to date code version that what is listed on this page

Source code

GPL 3.0 licensed

from osm_bot_abstraction_layer.generic_bot_migrate_values_within_key import fix_bad_values

def key():
    return "surface"

def replacements():
    return {
        "żwirowa": "gravel", # Polish name
        "kostka": "paving_stones", # Polish name
        "gruntowa": "unpaved", # Polish
        "asfalt": "asphalt", # English vs very close to English but actually different
        "paving stones": "paving_stones", # obvious typo
        "Paving_stones": "paving_stones", # obvious typo
        "paving_stones:": "paving_stones", # obvious typo
        "wooden": "wood", # different form of the same
        "cobblestones": "cobblestone", # different form of the same

        "artificial_grass": "artificial_turf", # https://community.openstreetmap.org/t/surface-artificial-grass-vs-surface-artificial-turf/6295

        # opened notes for some of them and some turned out to not be even made of actual bark...
        "barkchips": "woodchips",
        "bark_wood": "woodchips",

        # something went wrong with autocomplete
        "as": "asphalt",
        "asp": "asphalt",
        "grav": "gravel",
        "pebb": "pebblestone",

        "asfalto": "asphalt", # seems like name in a different language, but with a clear meaning
        "beton": "concrete", # in several languages "beton" is word for concrete, so far I was opening notes and asking in changesets and in every single case it was a a clearly correct replacement
        "ziemna": "earth", # translating from Polish
        "unpaved_gravel": "gravel", # we do not have unpaved_dirt either

        # typos
        "ashpalt": "asphalt",
        "Asphalt": "asphalt",
        "ashalt": "asphalt",
        "aspahlt": "asphalt",
        "ashphalt": "asphalt",
        "paving_stone": "paving_stones",
        "Paving Stone": "paving_stones",
        "paving_stoness": "paving_stones",
        "wood_chips": "woodchips",
        "woodchip": "woodchips",
        "wood chips": "woodchips",
        "wood_chippings": "woodchips",
        "peeblestone": "pebblestone",
        "pebbles": "pebblestone",
        "pebblestones": "pebblestone",
        "pebbelstone": "pebblestone",
        "pepplestone": "pebblestone",
        "pebble": "pebblestone",
        "pavedq": "paved",
        "pavedc": "paved",
        "pavedw": "paved",
        "unapved": "unpaved",
        "groundц": "ground",
        "groundmm": "ground",
        "grround": "ground",
        "groundc": "ground",
        "gorund": "ground",
        "grounD": "ground",
        "concreate": "concrete",
        "concrete\\": "concrete",
        "gravelw": "gravel",
        "Gravel": "gravel",
        "fine gravel": "fine_gravel",
        "fine_gravelC": "fine_gravel",
        "Boardwalk": "boardwalk", # (many of them should be surface=wood, but that is a good step already)
        "Metal": "metal",
        "gras": "grass",
        "grasss": "grass",
        "concrete:plate": "concrete:plates",
        "Cobblestone:flattened": "cobblestone:flattened",
        "cobbelstone:flattened": "cobblestone:flattened",
        "cobblestone:flatten": "cobblestone:flattened",
        "cobblestone:flattended": "cobblestone:flattened",
        "cobblestone:flattend": "cobblestone:flattened",
        "cobblestone:flatened": "cobblestone:flattened",

        # tried to use them as detectors of bogus data, neither were useful at all for this purpose
        'unpaved33': 'unpaved',
        "unpaved_minor": "unpaved", # added by TendaiNkomo - see https://www.openstreetmap.org/changeset/67017223 "You added large amount of surface=unpaved_minor tags (basically all of them) Can you explain meaning of this tag? How it differs from surface=unpaved?"
        "unpaved_major": "unpaved", # is it also TendaiNkomo ? Yes - 26 out of 34
        "unsealed": "unpaved", # definitely not paved (and not even good surface=compacted), just about 150 cases worldwide
        'synthetic_grass': 'artificial_turf',
        'asphalt_no_1': 'asphalt', # https://www.openstreetmap.org/changeset/54370818 (Corban8)
        'asphalt deg 3': 'asphalt', # https://www.openstreetmap.org/changeset/44625597 (Torbat_streets added more than 121 of 140), https://www.openstreetmap.org/changeset/44625597 added some first but was reverted, adding 12 next to to Torbat
        'planks': 'wood',
        'cobblestone_flattened': 'cobblestone:flattened',
        'dirt road': 'unpaved', # surface=dirt would be incorrect, "dirt road" refers also to surface=compacted
        'Hard_Court': 'hard_court', # tennis pitches
        'groun': 'ground',
        'groud': 'ground',
        'groundw': 'ground',
        'ground2': 'ground',
        'paved2': 'paved',
        'gravel2': 'gravel',
        'asphalt22': 'asphalt',
        "concrete2": "concrete",
        "unpaved3": "unpaved",
        'unpaved22': 'unpaved',
        'asphalt2': 'asphalt', # https://www.openstreetmap.org/changeset/127040059
        "compacted_gravel": "compacted", # https://www.openstreetmap.org/changeset/90182476
        'unsurfaced': 'unpaved', # https://www.openstreetmap.org/changeset/18511043
        'plank': 'wood',
        'wooden_planks': 'wood',
        'wood_chip': 'woodchips',

        # asked on 2023-03-20
        'cobbled': 'cobblestone', # https://www.openstreetmap.org/changeset/48215497 https://www.openstreetmap.org/changeset/67215079 https://www.openstreetmap.org/changeset/25703937

        # apparently autocomplete accident
        'un': 'unpaved',
        'compact': 'compacted',

        # low use, detected via detector of values very likely to by typoed or having shiFt accident
        # still verified whether indicating obvious issues
        'Concrete': 'concrete',
        'GRAVEL': 'gravel',
        'Compacted': 'compacted',

        'Bamboo': 'bamboo', # surface=bamboo is not documented, this replacement is still useful

        # all surface values documented as existing ones with trailing letter/number?
        # especially 2 and q are common - missclick of tab button? Similarly 1 and 3
        # and c - missclick of ctrl+c?
        # and C - missclick of ctrl+c and using shift+c?
        'unpavedc': 'unpaved',
        'grounds': 'ground',
        'gravelc': 'gravel',
        'unpaveds': 'unpaved',
        'unpaved*': 'unpaved',

        # I have not reviewed this values specifically - but I reviewed many other single-extra-letter-cases
        # all values here are low use, some may be used once
        # I expect that reliability here will be the same as sample which I verified based on aerial images
        # for obvious mistake or indicators of problems
        'asphalt3': 'asphalt',
        'asphaltd': 'asphalt',
        'asphalt;': 'asphalt',
        'asphalts': 'asphalt',
        'asphaltz': 'asphalt',
        'asphaltc': 'asphalt',
        'asphaltN': 'asphalt',
        'asphaltn': 'asphalt',
        'asphaltl': 'asphalt',
        'asphalth': 'asphalt',
        'asphaltC': 'asphalt',
        'asphaltu': 'asphalt',
        'asphalt-': 'asphalt',
        'asphaltr': 'asphalt',
        'asphalt1': 'asphalt',
        'concretef': 'concrete',
        'concretev': 'concrete',
        'concrete6': 'concrete',
        'concretee': 'concrete',
        'concretew': 'concrete',
        'concretec': 'concrete',
        'concrete`': 'concrete',
        'concreteo': 'concrete',
        'concrete5': 'concrete',
        'concretex': 'concrete',
        'concreted': 'concrete',
        'concrete3': 'concrete',
        'concretem': 'concrete',
        'concrete-': 'concrete',
        'concreteŒ': 'concrete',
        # surface=sandy was skipped
        'sand1': 'sand',
        'sand]': 'sand',
        'sandw': 'sand',
        'sand`': 'sand',
        'sand-': 'sand',
        'sands': 'sand',
        'sand3': 'sand',
        'sandq': 'sand',
        'dirt+': 'dirt',
        'dirt;': 'dirt',
        'dirt-': 'dirt',
        'dirt1': 'dirt',
        'dirt2': 'dirt',
        'groundz': 'ground',
        'groundC': 'ground',
        'groundf': 'ground',
        'ground;': 'ground',
        'ground=': 'ground',
        'ground4': 'ground',
        'groundq': 'ground',
        'groundo': 'ground',
        'grounda': 'ground',
        'ground,': 'ground',
        'ground-': 'ground',
        'ground\\': 'ground',
        'paving_stones;': 'paving_stones',
        'paving_stones-': 'paving_stones',
        'paving_stones3': 'paving_stones',
        'paving_stonesq': 'paving_stones',
        'paving_stonesm': 'paving_stones',
        'grassm': 'grass',
        'grassr': 'grass',
        'grasso': 'grass',
        'grassO': 'grass',
        'grass/': 'grass',
        # surface=grassy was skipped
        'gravelv': 'gravel',
        'gravel.': 'gravel',
        'gravel+': 'gravel',
        'gravelq': 'gravel',
        'gravel-': 'gravel',
        'gravel{': 'gravel',
        'gravel1': 'gravel',
        'gravel;': 'gravel',
        'gravels': 'gravel',
        'gravel∑': 'gravel',
        # surface=gravely was excluded
        'compacted-': 'compacted',
        'compacted`': 'compacted',
        'compacted=': 'compacted',
        'compactedц': 'compacted',
        'unpavedù': 'unpaved',
        'unpaved5': 'unpaved',
        'unpaved.': 'unpaved',
        'unpaved,': 'unpaved',
        'unpavedz': 'unpaved',
        'paved`': 'paved',
        'paveds': 'paved',
        'paveda': 'paved',
        'wood3': 'wood',
        'woodw': 'wood',
        'wood=': 'wood',
        'wood2': 'wood',
        'wood1': 'wood',
        'sett7': 'sett',
        'settc': 'sett',
        'setts': 'sett',
        'settц': 'sett',
        'unpavedq': 'unpaved',
        'unpavedS': 'unpaved',
        'unpavedm': 'unpaved',
        'unpaveda': 'unpaved',
        'unpaved-': 'unpaved',
        'unpaved=': 'unpaved',
        'unpavedC': 'unpaved',
        'mudd': 'mud',
        # surface=mud? was skipped
        # all values with ? in the were skipped from detector of likely misspellings

        # there are also many low-use values with two extra bogus characters
        'concrete22': 'concrete',

        'astroturf': 'artificial_turf',
        'timber': 'wood', # probably should be 'wood', asked on 2023-03-16 and earlier, see https://www.openstreetmap.org/changeset/66866027 https://www.openstreetmap.org/changeset/126078123 https://www.openstreetmap.org/changeset/126078123 https://www.openstreetmap.org/changeset/68461319 https://www.openstreetmap.org/changeset/69445813 https://www.openstreetmap.org/changeset/57280475 https://www.openstreetmap.org/changeset/126800407
        'DIRT': 'dirt',
        'paving': 'paved', # https://www.openstreetmap.org/changeset/131627421 has ready question for mappers
        "U": 'unpaved', # yes, it seems reliable (but independent rechecking of what remained is welcome)

        # French from https://lists.openstreetmap.org/pipermail/talk/2023-April/088164.html
        # review at https://forum.openstreetmap.fr/t/review-requested-before-proposing-bot-edit-for-automated-fixing-of-surface-values/18419
        'enrobé': 'asphalt',
        'béton_bitumineux': 'asphalt',
        'béton_bitumimeux': 'asphalt',
        'bitumen': 'asphalt',
        'enrobes': 'asphalt',
        'bitume': 'asphalt',
        'goudronné': 'asphalt',
        'gourdon': 'asphalt',

        'plastique': 'plastic', 

        'banc_de_sable': 'sand',

        'terre': 'earth',
        'Terre': 'earth',

        'terre_boue': 'mud',
        'terre_humide': 'mud', 
        'terre,_boue': 'mud',

        'graviers': 'gravel',

        'tere': 'earth',
        'terre2': 'earth',

        'caoutchouc': 'rubber',
        'bois': 'wood', # but see https://www.openstreetmap.org/note/3841055

        'béton_désactivé': 'concrete',


        'terre_batue': 'clay',

        'pavés': 'paved', # paving_stones or sett, impossible to guess which one

        'carrelage': 'tiles', 

        'pelouse_et_terre': 'grass;ground',
        'terre_touvenant': 'ground',
        'terre_touvenant': 'ground',
        'terre;herbe': 'ground;grass',
        'terre_naturelle,_argileuse': 'ground',

        'gravier': 'gravel',
        'gazon_synthétique': 'artificial_turf',
        'béton': 'concrete',
        'ciment': 'concrete',

        'herbe': 'grass',
        'gazon': 'grass',
        'herb': 'grass',
        'herbe_naturel': 'grass',
        'pelouse': 'grass',
        'pelouse_naturelle': 'grass',

        # invalid values to invalid values, but new ones are clearly detectable as invalid
        'terre_et_rochers': 'ground;rock',
        'béton_bois': 'concrete;wood',
        'terre,_cailloux': 'ground;gravel',
        'terre_et_herbe': 'earth;grass',
        'herbe_et_sable': 'grass;sand',
        'sable_et_terre': 'sand;earth',
        'terre_et_sable': 'earth;sand',
        'terre_cailloux': 'ground;gravel',
        'terre_goudrons': 'ground;asphalt',
        'terre_goudron': 'ground;asphalt',
        'terre_pierres': 'ground;gravel',
        'terre_et_pierre': 'ground;gravel',
        'terr_et_pierre': 'ground;gravel',
        'terre/sable': 'ground;sand',
        'terre_et_pierres': 'ground;gravel',
        'Herbe_et_cailloux': 'grass;gravel',
        'terre_et_gravier': 'ground;gravel',
        'gravillons,_béton': 'gravel;concrete',
        'graviers_et_terre': 'gravel;ground',

        'sable': 'sand',
        'Sable': 'sand',
        'cailloux': 'gravel',
        'pierre': 'gravel',
        'gravier0': 'gravel',

        # end of French

        'asphalt_on_concrete_sub-base': 'asphalt', # https://www.openstreetmap.org/changeset/30055680 - asked on 2023-03-20 (mapper is inactive, commented that they are fine with this change)

        # low use, detected via detector of values very likely to by typoed or having shiFt accident
        # still verified whether indicating obvious issues
        'Sand': 'sand',
        'unhewn_cobblestones': 'unhewn_cobblestone',

        # translating German
        'holz': 'wood',
        'schotter': 'gravel',

        # it's latin script equivalent asfalt mean asphalt
        'асфальт': 'asphalt',

        # all surface values documented as existing ones with trailing letter/number?
        # especially 2 and q are common - missclick of tab button? Similarly 1 and 3
        # and c - missclick of ctrl+c?
        # and C - missclick of ctrl+c and using shift+c?
        'concrete1': 'concrete',
        'compactedc': 'compacted',
        'concreteq': 'concrete',
        'paving_stones2': 'paving_stones',

        # what about say surface=gravel22 and similar accidents?

        'asphaltq3': 'asphalt',
        'asphaltcc': 'asphalt',
        'asphaltqq': 'asphalt',


        'unpaved--': 'unpaved',
        'unpavedMN': 'unpaved',


        # moooreee
        'a': 'asphalt', # apparently it is reliably surface=asphalt
        'asphalt:chipseal': 'chipseal',
        'pavement': 'paved',
        'ground_dirt': 'dirt',
        'tiled': 'paved', # tiles or paving_stones
        'astro_turf': 'artificial_turf',
        'wood_planks': 'wood',
        'AS': 'asphalt',
        'flooring_tiles': 'tiles',
        'lawn': 'grass',
        'Brick': 'brick',
        'Artificial Turf': 'artificial_turf',
        'ea': 'earth',
        'dit': 'dirt',
        'wooden deck': 'wood',
        'paved_minor': 'paved',
        's-and': 'sand',
        'cobble': 'cobblestone',

        # trivial tax fixes,performed light overwiev here
        'unpaved2': 'unpaved',
        'paving_stones4=': 'paving_stones',
        'concretefl': 'concrete',
        'asphaltww': 'asphalt',
        'grassq': 'grass',
        'unpaved∂': 'unpaved',
        'unhewn-cobblestone': 'unhewn_cobblestone',
        'unpavedŒ': 'unpaved',
        'unpavedŒ': 'unpaved',
        'paving_stonesv': 'paving_stones',
        'unpavedNo': 'unpaved',
        'unpavedNo': 'unpaved',
        'gravel3': 'gravel',   
        'sett9': 'sett',       
        'concreteas': 'concrete',
        'paved3': 'paved',     
        'asphaltŒ': 'asphalt', 
        'asphaltŒ': 'asphalt', 
        'asphalt_': 'asphalt', 
        'asphalt_': 'asphalt', 
        'asphaltw': 'asphalt', 
        'asphaltq': 'asphalt', 
        'grass2': 'grass',     
        'paved22': 'paved',    
        'dirtц': 'dirt',
        'unpaved1': 'unpaved',

        # straight to bot edit, without explciit review
        # see
        # https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account/fixing_malformed_surface_tags#further_obvious_typos_added_started_to_be_replaced_in_late_2023
        'dirtv': 'dirt',
        'dirts': 'dirt',
        'concretep': 'concrete',
        'asphalt@': 'asphalt',
        'asphalt.': 'asphalt',
        'asphalt4': 'asphalt',
        'asphalteq': 'asphalt',
        'unpavedw': 'unpaved',
        'compacteds': 'compacted',
        'ground22': 'ground',
        'asphaltπ': 'asphalt',
        'asphalt++': 'asphalt',
        'paving_stones33': 'paving_stones',
        'paving_stones22': 'paving_stones',
        'paving_stones🍿r': 'paving_stones',
        'asphalt∏': 'asphalt',
        'sand,': 'sand',
        'groundv': 'ground',
        'fine_gravelm': 'fine_gravel',
        'grassw': 'grass',
        'groundww': 'ground',
        'dirt3': 'dirt',
        'asphalti': 'asphalt',
        'asphalt+': 'asphalt',
        'asphalta': 'asphalt',
        'asphalt<': 'asphalt',
        'asphalt==': 'asphalt',
        'compacted2': 'compacted',
        'metal2': 'metal',
        'concretez': 'concrete',
        }

if __name__ == "__main__":
    fix_bad_values(
        editing_on_key = key(),
        replacement_dictionary = replacements(),
        cache_folder_filepath = '/media/mateusz/OSM_cache/osm_bot_cache',
        is_in_manual_mode=False,
        discussion_url="https://lists.openstreetmap.org/pipermail/talk/2023-February/088048.html and other see https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account/fixing_malformed_surface_tags#Discussion",
        osm_wiki_documentation_page='https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account/fixing_malformed_surface_tags',
    )

Opt-out

Please write at bot approval thread. Note that in case of opt-out exactly the same edit will be made manually for objects where bot opt-out was used.

See also