Mechanical Edits/Mateusz Konieczny - bot account/fix descriptive names in several obvious cases

From OpenStreetMap Wiki
Jump to navigation Jump to search

Page content created as advised on Automated Edits code of conduct#Document and discuss your plans.

Who

I, Mateusz Konieczny using my bot account

contact

message via OSM I will respond also to PMs to the bot account. In both cases I will be notified about incoming PMs via email and notifications in OSM editors.

What

Removing descriptive names where situation is clear and they can be safely removed.

Why

Descriptive names repeating tags are useless. What worse, they may confuse new mappers in also adding them.

Such name tags should be removed.


Numbers

Depends on how many new descriptive names appear - depends on editing activity in OSM.

How

Automatically remove obvious descriptive names (obvious cases only, not all suspect objects)

There are object types where mappers relatively often add invalid name tag that repeats object type, and it is obvious enough that can be fixed remotely.

I was doing it with some objects, and in some cases it is often combined with very problematic tagging nearby (can link some queries if anyone wants).

But for some objects use of obvious descriptive names is quite popular, to the point that manual fixing cannot keep up AND it is possible to fix it with a bot edit AND other tagging in area is typically fine. Sometimes there are clusters of other objects with descriptive names, but these can be found independently.

And yes "Toilet" can be signed but it does not make it a name, like "Toilet, 2 euro fee" is not a name either. Or sign pointing toward bunker with "Bunker" label is not indicating that bunker has a name Bunker.

Note https://community.openstreetmap.org/t/is-name-toilet-even-theorethically-valid-for-amenity-toilets/105540 where it was discussed

See also approved bot edit doing this for viewpoints: https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account/remove_obvious_descriptive_names_for_viewpoints_(obvious_cases_only,_not_all_suspect_objects)

I also noticed that OsmAnd edit plugin is overrepresented in adding such bad pseudonames and tracked down problem to a bad interface design, reported in https://github.com/osmandapp/OsmAnd/issues/18651 to its authors. So far they decided to claim that it is not a problem, I plan to compile some statistics making clear their editor is causing problems in this specific area.

I propose to run automated cleanup for multiple types of objects. In each case it would remove also capitalisation variants - so not only name=toilet but also name=Toilet and name=TOILET). In each case only objects tagged as a single type would be processed. For example amenity=toilets with name=Toilet but also waterway=waterfall would not be edited as it has an unexpected tag.


Note that this relies on assumption that object tagged like

- amenity=toilets - name=Toilet

is always case of misusing name tag.


Objects which carry unexpected tags or tags not typical for viewpoints, or note/fixme tags will be skipped. So for example

- man_made=cairn - amenity=restaurant - name=Cairn

well not be touched. In theoretical case of restaurant named "Cairn" which is also cairn, such object will not be modified at all.

Cases like

- tourism=viewpoint - waterway=waterfall - name=Viewpoint

or

- tourism=viewpoint - name=Viewpoint - note=Actually named "Viewpoint"

would not be edited either as they carry extra unexpected tags.

(Though I do not expect last case to be ever validly tagged...)

Obviously objects with just

name=Toilet

would not be edited in this edit (as both object type and name is required).

- tourism=viewpoint - fee=yes - name=VIEWPOINT

would be edited. Similarly with other that are expected attributes of viewpoints.


Bot edit would be worldwide, with edits split in parts, edit run separately for each object type. Edits would be repeated in future.

Note: as required by automated edits code of conduct a bot proposal will be also posted on talk mailing list

Comments welcome - both if you see problems with this edit and if you support it (though upvoting also works I guess)

Following types of objects will be included in this edit (initial series was done only for viewpoints)


Bot source code

Bot is using https://github.com/matkoniecz/osm_bot_abstraction_layer library, this code is GNU GPLv3 licensed


from osm_bot_abstraction_layer.tag_knowledge import list_of_address_tags
import shared_cleanup_of_descriptive_names
from osm_bot_abstraction_layer.tag_knowledge import list_of_address_tags
import osm_bot_abstraction_layer.language_tag_knowledge as language_tag_knowledge
from osm_bot_abstraction_layer.utils import tag_in_wikimedia_syntax

# https://community.openstreetmap.org/t/bot-edit-proposal-automatically-remove-obvious-descriptive-names-obvious-cases-only-not-all-suspect-objects/107393

"""
Automatically remove obvious descriptive names (obvious cases only, not all suspect objects)

This is basically the same https://community.openstreetmap.org/t/bot-edit-proposal-automatically-remove-obvious-descriptive-names-obvious-cases-only-not-all-suspect-objects/107393

Some minor feedback was taken into account.

There are object types where mappers relatively often add invalid name tag that repeats object type, and it is obvious enough that can be fixed remotely.

I was doing it with some objects, and in some cases it is often combined with very problematic tagging nearby (can link some queries if anyone wants).

But for some objects use of obvious descriptive names is quite popular, to the point that manual fixing cannot keep up AND it is possible to fix it with a bot edit AND other tagging in area is typically fine. Sometimes there are clusters of other objects with descriptive names, but these can be found independently.

And yes "Toilet" can be signed but it does not make it a name, like "Toilet, 2 euro fee" is not a name either. Or sign pointing toward bunker with "Bunker" label is not indicating that bunker has a name Bunker.

Note https://community.openstreetmap.org/t/is-name-toilet-even-theorethically-valid-for-amenity-toilets/105540 where it was discussed

See also approved bot edit doing this for viewpoints: https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account/remove_obvious_descriptive_names_for_viewpoints_(obvious_cases_only,_not_all_suspect_objects)

I also noticed that OsmAnd edit plugin is overrepresented in adding such bad pseudonames and tracked down problem to a bad interface design, reported in https://github.com/osmandapp/OsmAnd/issues/18651 to its authors. So far they decided to claim that it is not a problem, I plan to compile some statistics making clear their editor is causing problems in this specific area.

I propose to run automated cleanup for multiple types of objects. In each case it would remove also capitalisation variants - so not only name=toilet but also name=Toilet and name=TOILET). In each case only objects tagged as a single type would be processed.
For example amenity=toilets with name=Toilet but also waterway=waterfall would not be edited as it has an unexpected tag.


Note that this relies on assumption that object tagged like

- amenity=toilets
- name=Toilet

is always case of misusing name tag.


Objects which carry unexpected tags or tags not typical for viewpoints, or note/fixme tags will be skipped. So for example

- man_made=cairn
- amenity=restaurant
- name=Cairn

well not be touched. In theoretical case of restaurant named "Cairn" which is also cairn, such object will not be modified at all. 

Cases like

- tourism=viewpoint
- waterway=waterfall
- name=Viewpoint 

or

- tourism=viewpoint
- name=Viewpoint
- note=Actually named "Viewpoint"

would not be edited either as they carry extra unexpected tags.

(Though I do not expect last case to be ever validly tagged...)

Obviously objects with just

name=Toilet

would not be edited in this edit (as both object type and name is required).

- tourism=viewpoint
- fee=yes
- name=VIEWPOINT 

would be edited. Similarly with other  that are expected attributes of viewpoints.


Bot edit would be worldwide, with edits split in parts, edit run separately for each object type. Edits would be repeated in future.

Note: as required by automated edits code of conduct a bot proposal will be also posted on talk mailing list 

Comments welcome - both if you see problems with this edit and if you support it (though upvoting also works I guess)

Following types of objects will be included in this edit (initial series was done only for viewpoints)

- waterway = waterfall with name waterfall
  - only following additional keys are allowed, presence of any other tags will block an automated edit:
    - height
    - name (tags starting from "name" repeat for each object type listed here)
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr
    - check_date
    - survey:date
    - source:date
    - is_in:country
    - is_in:state

- amenity = bench with name bench
  - only following additional keys are allowed, presence of any other tags will block an automated edit:
    - backrest
    - material
    - surface
    - seats
    - capacity
    - ele
    - colour
    - inscription
    - access
    - covered
    - name
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr           
    - check_date       
    - survey:date      
    - source:date      
    - is_in:country    
    - is_in:state      
                       
- leisure = playground with name playground                                                                                                                                                  
  - only following additional keys are allowed, presence of any other tags will block an automated edit:                                                                                     
    - surface          
    - access           
    - max_age          
    - min_age          
    - playground:theme 
    - name
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr
    - check_date
    - survey:date
    - source:date
    - is_in:country
    - is_in:state

- tourism = viewpoint with names viewpoint, punkt widokowy (“punkt widokowy” is in Polish, I am native speaker of Polish)
  - only following additional keys are allowed, presence of any other tags will block an automated edit:
    - direction
    - ele
    - wheelchair
    - opening_hours
    - area
    - name
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr
    - check_date
    - survey:date
    - source:date
    - is_in:country
    - is_in:state

- man_made = cairn with name cairn
  - only following additional keys are allowed, presence of any other tags will block an automated edit:
    - ele
    - name
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr
    - check_date
    - survey:date
    - source:date
    - is_in:country
    - is_in:state

- military = bunker with name bunker
  - only following additional keys are allowed, presence of any other tags will block an automated edit:
    - ruins
    - building
    - abandoned
    - disused
    - bunker_type
    - historic
    - name
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr
    - check_date
    - survey:date
    - source:date
    - is_in:country
    - is_in:state

- amenity = drinking_water with names drinking water, water, potable water
  - only following additional keys are allowed, presence of any other tags will block an automated edit:
    - ele
    - fee
    - charge
    - access
    - drinking_water
    - bottle
    - owner
    - cold_water
    - operator
    - indoor
    - covered
    - lit
    - wheelchair
    - name
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr
    - check_date
    - survey:date
    - source:date
    - is_in:country
    - is_in:state

- tourism = camp_site with names camp site, campsite
  - only following additional keys are allowed, presence of any other tags will block an automated edit:
    - water_source
    - capacity:tents
    - capacity:caravans
    - caravans
    - tents
    - drinking_water
    - fee
    - charge
    - payment:cash
    - payment:contactless
    - payment:credit_cards
    - power_supply
    - shower
    - toilets
    - name
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr
    - check_date
    - survey:date
    - source:date
    - is_in:country
    - is_in:state

- landuse = quarry with name quarry
  - only following additional keys are allowed, presence of any other tags will block an automated edit:
    - resource
    - mineral
    - name
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr
    - check_date
    - survey:date
    - source:date
    - is_in:country
    - is_in:state

- natural = beach with names beach, plaża (“plaża” is in Polish)
  - only following additional keys are allowed, presence of any other tags will block an automated edit:
    - surface
    - access
    - lifeguard
    - supervised
    - fee
    - charge
    - operator
    - name
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr
    - check_date
    - survey:date
    - source:date
    - is_in:country
    - is_in:state

- amenity = post_box with names post box, collection box, mailbox, letter box, drop box
  - only following additional keys are allowed, presence of any other tags will block an automated edit:
    - operator
    - operator:short
    - operator:type
    - operator:wikidata
    - operator:wikipedia
    - drive_through
    - collection_times
    - royal_cypher
    - royal_cypher:wikidata
    - ref
    - collection_times:signed
    - post_box:type
    - brand
    - brand:wikidata
    - brand:wikipedia
    - name
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr
    - check_date
    - survey:date
    - source:date
    - is_in:country
    - is_in:state

- landuse = grass with name grass
  - only following additional keys are allowed, presence of any other tags will block an automated edit:
    - name
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr
    - check_date
    - survey:date
    - source:date
    - is_in:country
    - is_in:state

- amenity = toilets with names toilet, toilets, toalety, toaleta, wc (toalety, toaleta - that is in Polish, not English)
  - only following additional keys are allowed, presence of any other tags will block an automated edit:
    - fee
    - charge
    - operational_status
    - operator
    - operator:type
    - wheelchair
    - check_date
    - toilets:handwashing
    - toilets:position
    - toilets:disposal
    - unisex
    - male
    - female
    - currency:RUB
    - opening_hours
    - toilets:wheelchair
    - changing_table
    - flood_prone
    - indoor
    - access
    - lit
    - toilets:access
    - toilets:num_chambers
    - source:form
    - handwashing
    - wheelchair:description
    - gender
    - level
    - supervised
    - lit
    - addr:city
    - addr:town
    - addr:place
    - addr:street
    - addr:housenumber
    - addr:postcode
    - addr:unit
    - addr:state
    - phone
    - contact:phone
    - addr:country
    - addr:suburb
    - addr:county
    - addr:district
    - addr:community
    - addr:subcounty
    - addr:village
    - addr:parish
    - addr:district
    - addr:settlement
    - addr:zone
    - addr:clan
    - addr:ward
    - addr:block
    - addr:full
    - addr:neighbourhood
    - addr:district
    - addr:subcamp
    - name
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr
    - check_date
    - survey:date
    - source:date
    - is_in:country
    - is_in:state

- amenity = parking with name parking
  - only following additional keys are allowed, presence of any other tags will block an automated edit:
    - parking
    - access
    - surface
    - fee
    - charge
    - hgv
    - lit
    - maxstay
    - smoothness
    - supervised
    - phone
    - website
    - capacity
    - addr:city
    - addr:town
    - addr:place
    - addr:street
    - addr:housenumber
    - addr:postcode
    - addr:unit
    - addr:state
    - phone
    - contact:phone
    - addr:country
    - addr:suburb
    - addr:county
    - addr:district
    - addr:community
    - addr:subcounty
    - addr:village
    - addr:parish
    - addr:district
    - addr:settlement
    - addr:zone
    - addr:clan
    - addr:ward
    - addr:block
    - addr:full
    - addr:neighbourhood
    - addr:district
    - addr:subcamp
    - name
    - source
    - created_by
    - layer
    - is_in
    - url
    - mapillary
    - image
    - wikimedia_commons
    - flickr
    - check_date
    - survey:date
    - source:date
    - is_in:country
    - is_in:state

"""

def data():
    """
        {
            'key': "",
            'value': "",
            'bad_names': [""],
            'banned_keys': [],
            'allowed_keys': [],
        },
    """
    returned = [
        # key lists reviewed and ready for edits
        {
            'key': "waterway",
            'value': "waterfall",
            'bad_names': ["waterfall"],
            'banned_keys': ["tourism"],
            'allowed_keys': ["height"],
        },
        {
            'key': "amenity",
            'value': "bench",
            'bad_names': ["bench"],
            'banned_keys': [
                "acres", 'park_type', # TODO: wat, really?
            ],
            'allowed_keys': ["backrest", "material", "surface", "seats", "capacity", "ele", "colour", 'inscription', 'access', 'covered'],
        },
        {
            'key': "leisure",
            'value': "playground",
            'bad_names': ["playground"],
            'banned_keys': [],
            'allowed_keys': ["surface", "access", "max_age", "min_age", "playground:theme"],
        },
        {
            'key': "tourism",
            'value': "viewpoint",
            'bad_names': ["viewpoint", "punkt widokowy"],
            'banned_keys': ["building", "buildingpart", "building:part", "natural"],
            'allowed_keys': ["direction", "ele", "wheelchair", "opening_hours", "area"],
        },
        {
            'key': "man_made",
            'value': "cairn",
            'bad_names': ["cairn"],
            'banned_keys': ["landuse"],
            'allowed_keys': ["ele"],
        },
        {
            'key': "military",
            'value': "bunker",
            'bad_names': ["bunker"],
            'banned_keys': [],
            'allowed_keys': ["ruins", "building", "abandoned", "disused", "bunker_type", "historic"],
        },
        {
            'key': "amenity",
            'value': "drinking_water",
            'bad_names': ['drinking water', 'water', "potable water"],
            'banned_keys': [],
            'allowed_keys': ['ele', 'fee', 'charge', 'access', 'drinking_water', 'bottle', 'owner', 'cold_water', 'operator', 'indoor', 'covered', 'lit', 'wheelchair'],
        },
        {
            'key': "tourism",
            'value': "camp_site",
            'bad_names': ["camp site", "campsite"], # polish name moved to separate entry
            'banned_keys': [],
            'allowed_keys': ["water_source", "capacity:tents", "capacity:caravans", "caravans", "tents", "drinking_water", "fee", "charge", "payment:cash", "payment:contactless", "payment:credit_cards", "power_supply", "shower", "toilets"],
        },
        {
            'key': "landuse",
            'value': "quarry",
            'bad_names': ["quarry"],
            'banned_keys': [],
            'allowed_keys': ["resource", "mineral"],
        },
        {
            'key': "natural",
            'value': "beach",
            'bad_names': ["beach", "plaża"],
            'banned_keys': [],
            'allowed_keys': ["surface", "access", "lifeguard", "supervised", "fee", "charge", "operator"],
        },
        {
            'key': "amenity",
            'value': "post_box",
            # https://osmus.slack.com/archives/C01BU04GW7L/p1703093450986319
            'bad_names': ['post box', 'collection box', 'mailbox', 'letter box', 'drop box'],
            'banned_keys': [],
            'allowed_keys': ['operator', 'operator:short', 'operator:type', 'operator:wikidata', 'operator:wikipedia', 'drive_through', 'collection_times', 'royal_cypher', 'royal_cypher:wikidata', 'ref', 'collection_times:signed',
            'post_box:type', 'brand', 'brand:wikidata', 'brand:wikipedia'],
        },
        {
            'key': "landuse",
            'value': "grass",
            'bad_names': ["grass"], # Polish name rare enough to not justify bot edit
            'banned_keys': [],
            'allowed_keys': [],
        },
        {
            'key': "amenity",
            'value': "toilets",
            'bad_names': ["toilet", "toilets", "toalety", "toaleta", "wc"],
            'banned_keys': ["building", "buildingpart", "building:part",
                "media:camera_device_number",
                "watsan:cleaner", "watsan:handwashing", "watsan:hand_washing", "watsan:toilet_type", "watsan:type", "watsan:operational_status", "watsan:operator", # watsan spam: see https://www.openstreetmap.org/changeset/16045188
                "recycled", # https://www.openstreetmap.org/changeset/128584200 https://www.openstreetmap.org/changeset/128419912 - revert time? TODO - pinged DWG already
            ],
            'allowed_keys': ["fee", "charge", "operational_status", "operator", "operator:type", "wheelchair", "check_date", "toilets:handwashing", "toilets:position", "toilets:disposal", "unisex", "male", "female", "currency:RUB", "opening_hours", "toilets:wheelchair", "changing_table", "flood_prone", "indoor", "access", "lit", "toilets:access", "toilets:num_chambers", "source:form", "handwashing", 'wheelchair:description', "gender",
            "level", "supervised", "lit"] + list_of_address_tags(),
        },
        {
            'key': "amenity",
            'value': "parking",
            'bad_names': ["parking"],
            'banned_keys': [],
            'allowed_keys': ["parking", "access", "surface", "fee", "charge", "hgv", "lit", "maxstay", "smoothness", "supervised", "phone", "website", "capacity"] + list_of_address_tags(),
        },
    ]
    return returned

def is_in_manual_mode():
    return False

def function_maker(data):
    def gen():
        return data
    return gen

discussion_url = "https://community.openstreetmap.org/t/bot-edit-proposal-automatically-remove-obvious-descriptive-names-obvious-cases-only-not-all-suspect-objects/107393/1 and https://lists.openstreetmap.org/pipermail/talk/2023-December/088474.html"
osm_wiki_documentation_page =  None
shared_ban = ["fixme", "note", "description", "name:ar", "name:de", "name:kn", "name:en", "name:sw", 'short_name',
"wpt_symbol", "wpt_description", "PFM:garmin_type", 'sym', # TODO WTF is that? GPX import debris? Automate asking changeset comments I guess? See https://www.openstreetmap.org/changeset/3529361
"Note", # TODO really ?
'facility_name', 'facility_type', # TODO Argh?
] + list(language_tag_knowledge.name_keys())
shared_allowed = ["name", "source", "created_by", "layer", "is_in", "url", "mapillary", "image", "wikimedia_commons", "flickr", "check_date", "survey:date", "source:date", "is_in:country", "is_in:state"]

for entry in data():
    names = "with names" + " " + ", ".join(entry['bad_names'])
    if len(entry['bad_names']) == 1:
        names = "with name " + entry['bad_names'][0]
    print("-", entry['key'], "=", entry['value'], names)
    print("  - only following additional keys are allowed, presence of any other tags will block an automated edit:")
    for key in entry['allowed_keys'] + shared_allowed:
        print("    -", key)
    print("")

for entry in data():
    names = "with names" + " " + ", ".join(entry['bad_names'])
    if len(entry['bad_names']) == 1:
        names = "with name " + entry['bad_names'][0]
    print("*", tag_in_wikimedia_syntax(entry['key'], entry['value']), names)
    print("** only following additional keys are allowed, presence of any other tags will block an automated edit:")
    for key in entry['allowed_keys'] + shared_allowed:
        print("***", tag_in_wikimedia_syntax(key, ""))
    print("")

for entry in data():
    shared_cleanup_of_descriptive_names.cleanup(function_maker(entry['key']), function_maker(entry['value']), function_maker(discussion_url), function_maker(osm_wiki_documentation_page), function_maker(shared_allowed + entry['allowed_keys']), function_maker(shared_ban + entry['banned_keys']), function_maker(entry['bad_names']), function_maker(is_in_manual_mode()))

Discussion

https://community.openstreetmap.org/t/bot-edit-proposal-automatically-remove-obvious-descriptive-names-obvious-cases-only-not-all-suspect-objects/107393/1 and https://lists.openstreetmap.org/pipermail/talk/2023-December/088474.html

Repetition

This is reoccurring edit and may be made as soon as new matching elements appear. At this moment triggering new edit requires human intervention so exact schedule is not predictable and bot may stop running at any moment.

This can change in a future. If bot is abandoned and does not run, feel free to ping me. If I am unable to run it any more feel free to use my code. Note that it may require going through bot approval process again and that code is on specific license.

https://codeberg.org/matkoniecz/OpenStreetMap_cleanup_scripts/src/branch/master/recurrent_bot_edits may have more up to date code version that what is listed on this page

Opt-out

Please write at community discussion forum.

Note that in case of opt-out exactly the same edit will be made manually.

See also

(edit list)