Mechanical Edits/Mateusz Konieczny - bot account/remove not needed GNIS tags

From OpenStreetMap Wiki
Jump to navigation Jump to search

Page content created as advised on Automated_Edits_code_of_conduct#Document_and_discuss_your_plans.

Who

I, Mateusz Konieczny using my bot account

contact

message via OSM I will respond also to PMs to the bot account. In both cases I will be notified about incoming PMs via email and notifications in OSM editors.

What

Remove:

Remove gnis:name=* and NHD:GNIS_Name=* if it is repeated in one of normal name tags.

Why

This keys were imported and are not actually needed or useful. They just confuse people editing OSM data, writing software to process it. Especially people writing software used by mappers to edit OSM.

I encountered them while editing and was confused how to handle them.

Removing them will require making some edits but will make OSM data slightly easier to edit and reduce future confusion.

And will make less likely that future import would add not necessary tags.

Numbers

Depends on how many new matches appear - depends on editing activity in OSM.

How

Each changeset contains a single element or group of close elements to avoid edits spanning across large areas (it is impossible in cases where edited object itself spans very large area)

Changeset would be described and tagged with tags that mark it as automatic, provide link to documentation page etc

Editing is limited to objects in USA.

state before a mechanical edit:


state after a mechanical edit:


Bot source code

Bot is using https://github.com/matkoniecz/osm_bot_abstraction_layer library, this code is GNU GPLv3 licensed

from osm_bot_abstraction_layer.generic_bot_retagging import run_simple_retagging_task
import osm_bot_abstraction_layer.overpass_downloader as overpass_downloader
import osm_bot_abstraction_layer.world_data as world_data
import osm_bot_abstraction_layer.language_tag_knowledge as language_tag_knowledge
import time
import shared_generate_import_tag_cleanup_wiki_page

# https://community.openstreetmap.org/t/mass-remove-gnis-created-and-similar-tags/107018



def tags_for_removal():
    return [
        # new ones

        # many cases
        "gnis:import_uuid",

        # just few cases
        "gnis:import_id",
        "gnis:feature",

        "gnis:reviewed",
        "gnis:review",
        "gnis:edited",
        "gnis:created",
        "gnis:date_created",
        "gnis:created_1",
        "gnis:created_date",
        "historic:gnis:created",
        "gnis:updated",
        "gnis:date_edited",
        "gnis:cre",
        "gnis:County",
        "gnis:County_num",
        "gnis:county_id",
        "gnis:county_name",
        "gnis:state_alpha",
        "gnis:state_id",
        "gnis:ST_num",
        "gnis:ST_alpha",
        "gnis:ST_alph",
        "gnis:state",
        "gnis:county",
        "gnis:Cell",
        "gnis:Class",
        "gnis:class",
        "gnis:feature_type",
    ]


def tags_for_keeping():
    return [
        "gnis:ele",

        "gnis:ftype", # "Unrelated to GNIS, this is actually an NHD Feature Type (nhd:ftype) Show/edit corresponding data item." - https://wiki.openstreetmap.org/wiki/Key:gnis:ftype
        "gnis:fcode", # "Unrelated to GNIS, this is actually an NHD Feature Code (nhd:fcode)" https://wiki.openstreetmap.org/wiki/Key:gnis:fcode

        "alt_name:gnis:feature_id", # https://overpass-turbo.eu/s/1JzS
        "alt_gnis:feature_id", # https://www.openstreetmap.org/relation/7132203 https://www.openstreetmap.org/relation/274921
        "gnis:id_2", "gnis:id_1", # why has https://www.openstreetmap.org/node/150952282 both? I will create note if noone will investigate this
        "gnis:feature_id_1",
        "gnis:feature_id_2",
        "gnis:feature_id_alt",
        "gnis:feature_id",
        "gnis:feature_id2",
        'alt:gnis:feature_id',
        "gnis:feature_id:old",
        "gnis:old_feature_id",
        "gnis:feature_id:duplicate",
        "disused:gnis:feature_id",

        "source:gnis",

        # lifecycle
        "historic:gnis:id",
        "old_gnis:feature_id",
        "demolished:gnis:feature_id", # not fan of it but...
        "historic:gnis:feature_id", # not fan of it but...
    ]


def duplication_of_name_tags():
    return [
        "gnis:name",
        "NHD:GNIS_Name", # process with NHD tags?
    ]


global total_count
total_count = 0


def edit_element(tags):
    global total_count
    for key in tags.keys():
        if "gnis" in key.lower():
            if key in tags_for_removal() + tags_for_keeping() + duplication_of_name_tags():
                continue
            print("apparently new gnis related key: <" + key + ">")
            print(tags_for_removal() + tags_for_keeping() + duplication_of_name_tags())
    for key in tags_for_removal():
        if key in tags:
            total_count += 1
            tags.pop(key, None)
    for key in duplication_of_name_tags():
        if key in tags:
            duplicates = False
            for name_key in language_tag_knowledge.name_keys():
                if name_key in tags:
                    if tags[key] == tags[name_key]:
                        if name_key not in ["name", "alt_name", "old_name", "name:en"]:
                            print(key, "matches", name_key, "both have", tags[key])
                        duplicates = True
            if duplicates == True:
                total_count += 1
                tags.pop(key, None)
            #else:
            #    print(key, "=", tags[key], "has no matches")
            #
            # https://www.openstreetmap.org/way/33285610 tags on ways forming water area, duplicating tags on https://www.openstreetmap.org/relation/115054
            # it appears to be quite wide import problem, repeating elsewhere
            # I can list likely candidates if anyone is interested in cleaup
    return tags


def query():
    returned = """
[out:xml][timeout:500];
(
"""
    for key in tags_for_removal() + duplication_of_name_tags():
        returned += 'nwr["' + key + '"];\n'
    returned += """);
out meta;
>;
out meta qt;"""
    return returned


def query_by_area(wikidata):
    returned = """
[out:xml][timeout:500];
area['wikidata'='""" + wikidata + """']->.searchArea;
(
"""
    for key in tags_for_removal() + duplication_of_name_tags():
        returned += 'nwr(area.searchArea)["' + key + '"];\n'
    returned += """);
out meta;
>;
out meta qt;"""
    return returned


def edit_part(query_provided):
    run_simple_retagging_task(
        max_count_of_elements_in_one_changeset=5000,
        objects_to_consider_query=query_provided,
        cache_folder_filepath='/media/mateusz/OSM_cache/osm_bot_cache',
        is_in_manual_mode=True,
        changeset_comment='remove unnecessary gnis tags that got imported',
        discussion_url="https://community.openstreetmap.org/t/mass-remove-gnis-created-and-similar-tags/107018",
        osm_wiki_documentation_page="https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account/remove_not_needed_GNIS_tags",
        edit_element_function=edit_element,
    )


def usa_states():
    return [
        {'name': 'Vermont', 'wikidata': 'Q16551'},
        {'name': 'Massachusetts', 'wikidata': 'Q771'},
        {'name': 'New York', 'wikidata': 'Q1384'},
        {'name': 'Maine', 'wikidata': 'Q724'},
        {'name': 'New Hampshire', 'wikidata': 'Q759'},
        {'name': 'Texas', 'wikidata': 'Q1439'},
        {'name': 'Illinois', 'wikidata': 'Q1204'},
        {'name': 'Missouri', 'wikidata': 'Q1581'},
        {'name': 'Kansas', 'wikidata': 'Q1558'},
        {'name': 'Oklahoma', 'wikidata': 'Q1649'},
        {'name': 'Arkansas', 'wikidata': 'Q1612'},
        {'name': 'Nebraska', 'wikidata': 'Q1553'},
        {'name': 'Iowa', 'wikidata': 'Q1546'},
        {'name': 'South Dakota', 'wikidata': 'Q1211'},
        {'name': 'North Dakota', 'wikidata': 'Q1207'},
        {'name': 'Kentucky', 'wikidata': 'Q1603'},
        {'name': 'Indiana', 'wikidata': 'Q1415'},
        {'name': 'Tennessee', 'wikidata': 'Q1509'},
        {'name': 'Mississippi', 'wikidata': 'Q1494'},
        {'name': 'Alabama', 'wikidata': 'Q173'},
        {'name': 'Georgia', 'wikidata': 'Q1428'},
        {'name': 'Colorado', 'wikidata': 'Q1261'},
        {'name': 'Wyoming', 'wikidata': 'Q1214'},
        {'name': 'Utah', 'wikidata': 'Q829'},
        {'name': 'New Mexico', 'wikidata': 'Q1522'},
        {'name': 'Arizona', 'wikidata': 'Q816'},
        {'name': 'Florida', 'wikidata': 'Q812'},
        {'name': 'Ohio', 'wikidata': 'Q1397'},
        {'name': 'West Virginia', 'wikidata': 'Q1371'},
        {'name': 'District of Columbia', 'wikidata': 'Q3551781'},
        {'name': 'Pennsylvania', 'wikidata': 'Q1400'},
        {'name': 'Delaware', 'wikidata': 'Q1393'},
        {'name': 'Maryland', 'wikidata': 'Q1391'},
        {'name': 'Montana', 'wikidata': 'Q1212'},
        {'name': 'Idaho', 'wikidata': 'Q1221'},
        {'name': 'Wisconsin', 'wikidata': 'Q1537'},
        {'name': 'Minnesota', 'wikidata': 'Q1527'},
        {'name': 'Nevada', 'wikidata': 'Q1227'},
        {'name': 'California', 'wikidata': 'Q99'},
        {'name': 'Oregon', 'wikidata': 'Q824'},
        {'name': 'Washington', 'wikidata': 'Q1223'},
        {'name': 'Michigan', 'wikidata': 'Q1166'},
        {'name': 'Connecticut', 'wikidata': 'Q779'},
        {'name': 'Hawaii', 'wikidata': 'Q782'},
        {'name': 'South Carolina', 'wikidata': 'Q1456'},
        {'name': 'Virginia', 'wikidata': 'Q1370'},
        {'name': 'North Carolina', 'wikidata': 'Q1454'},
        {'name': 'Louisiana', 'wikidata': 'Q1588'},
        {'name': 'New Jersey', 'wikidata': 'Q1408'},
        {'name': 'United States Virgin Islands', 'wikidata': 'Q11703'},
        {'name': 'Guam', 'wikidata': 'Q16635'},
        {'name': 'Northern Mariana Islands', 'wikidata': 'Q16644'},
        {'name': 'Rhode Island', 'wikidata': 'Q1387'},
        {'name': 'Alaska', 'wikidata': 'Q797'},
        {'name': 'American Samoa', 'wikidata': 'Q16641'},
        {'name': 'Puerto Rico', 'wikidata': 'Q1183'},
    ]


def storage_path(filename):
    return '/media/mateusz/OSM_cache/osm_bot_cache/' + filename


def main():
    global total_count
    print(shared_generate_import_tag_cleanup_wiki_page.page_content(tags_for_removal(), "https://community.openstreetmap.org/t/mass-remove-gnis-created-and-similar-tags/107018", "USA"))
    code = "US"
    admin_level_of_split = 4
    storage_file = storage_path(code + "_area_divisions.osm")
    #for area_data in world_data.list_of_area_divisions_data(code, admin_level_of_split, ["name", "wikidata"], storage_file):
    for area_data in usa_states():
        edit_part(query_by_area(area_data["wikidata"]))
        print("total_count", total_count)


main()

Discussion

https://community.openstreetmap.org/t/mass-remove-gnis-created-and-similar-tags/107018

Repetition

This is reoccurring edit and may be made as soon as new matching elements appear. At this moment triggering new edit requires human intervention so exact schedule is not predictable and bot may stop running at any moment.

This can change in a future. If bot is abandoned and does not run, feel free to ping me. If I am unable to run it any more feel free to use my code. Note that it may require going through bot approval process again and that code is on specific license.

https://codeberg.org/matkoniecz/OpenStreetMap_cleanup_scripts/src/branch/master/recurrent_bot_edits may have more up to date code version that what is listed on this page

Opt-out

Please write at forum topic where it was discussed .

Note that in case of opt-out exactly the same edit will be made manually.