Mechanical Edits/Mateusz Konieczny - bot account/import websites in Poland from ATP
This page describes import of website=* for various POIs in Poland.
Goals
To add website=* where missing or imprecise.
Part of multiple ATP-based imports I am running.
To provide unique POI identification preparing ground for import of more data.
Schedule
depends of availability of my free hobby time.
Import Data
Background
- Note: if some links are broken check https://status.codeberg.eu/status/codeberg and https://www.githubstatus.com/
- Data source site: https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/import_possibilities_website_tag_media_expert_pl.html https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/import_possibilities_website_tag_rossmann_pl.html https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/import_possibilities_website_tag_lewiatan_pl.html and other from https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/import_possibilities.html and more of similar ones
- produced by https://www.alltheplaces.xyz/ via https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/ from ATP dataset produced from first-party of various companies (see https://community.openstreetmap.org/t/what-you-think-about-importing-opening-hours-data-from-alltheplaces/120608/70 thread analysis)
- Data license: see below
- Type of license (if applicable): see below
- Link to permission (if required): https://osmfoundation.org/wiki/Licensing_Working_Group/Minutes/2023-08-14#Ticket#2023081110000064_%E2%80%94_First_party_websites_as_sources
- OSM attribution (if required): not required
- ODbL Compliance verified: yes
Import Type
Recurring import done with automated scripts
Data Preparation
data is published by various companies about their POIs
the data is crawled and published by https://github.com/alltheplaces/alltheplaces from public website(s)
ATP data and OSM data is the processed, validated and compared by https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data
Processing includes and is not limited to
- matching ATP and OSM POIs
- skipping ATP POIs not matched well to any OSM POIs
- skipping ATP POIs matched to multiple OSM POIs
- skipping cases where matched ATP and OSM entries are conflicting on important aspects
- skipping cases where OSM has specific website tags, but including cases where OSM
website=*links main page instead of a specific POI
See
- https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data/src/branch/master/test_processing.py
- https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data/src/branch/master/test_matching_logic.py
for tests that also document considered cases and behaviour.
Tagging Plans
website=* from ATP goes into website=*
Changeset Tags
discussion_before_edits=*mechanical=yesosm_wiki_documentation_page=*linking to this pagecreated_by_library=https://github.com/matkoniecz/osm_bot_abstraction_layercreated_by=osmapi/4.2.0(or similar)cases_where_human_help_is_required=https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/bot=yesimport=yessource=processed ATP data, based on first-party brand data (see https://community.openstreetmap.org/t/what-you-think-about-importing-opening-hours-data-from-alltheplaces/120608/70 thread )
Data Merge Workflow
Team Approach
I am doing this import myself but
- it is using ATP that has many contributors
- help is welcomed, especially in identifying other ATP spiders that have pure first-party data
- for example ATP may in theory include spider that pull data from Google Maps, such data would be ineligible for import - so it is necessary to check how data is actually sourced
- see https://github.com/alltheplaces/alltheplaces/issues/8790
- see https://community.openstreetmap.org/t/what-you-think-about-importing-opening-hours-data-from-alltheplaces/120608/59 or https://community.openstreetmap.org/t/what-you-think-about-importing-opening-hours-data-from-alltheplaces/120608/70 for which kind of analysis would be helpful, feel free to do such analysis and post it in this thread (or elsewhere and let me know about this)
- if someone is ATP contributor, fixing problems in emitted data would unlock using some spiders
- review of data for quality/copyright status and other traps would be helpful
Workflow
Import will be done by executing previously prepared script. Edits will be monitored and sample of edited objects checked in attempt to detect any previously missed problems and bugs.
Separate changeset for each POI
In case of bad, broken or otherwise problematic data such edit will be reverted. I have experience with reverting own automated edits - though it was not needed often. I will be using a separate account to make such cleanup easier, if it will be ever needed.
Edit will be done using Mateusz Konieczny - bot account - ATP import account
Conflation
Done with custom software residing at https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data
This software is intended to be enabling processing of ATP shop-type POI data in general.
QA
Samples of data was inspected manually.
Data was also reviewed by variety of automated QA, see scripts in https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data - see for example for just sample of problem found, reported back to All the Places project - in form of issues and/or patches
Discussion
The post to the community forum can be found at https://community.openstreetmap.org/t/wiecej-importow-tagow-website/132084
And posted to imports mailing list at https://lists.openstreetmap.org/pipermail/imports/2025-July/thread.html (send on 2025 July, not appeared on list yet - I see that https://lists.openstreetmap.org/listinfo/imports is marked as retored)
Conflict of interest info
I received grant funding for making software that processed ATP data.
Time for making import itself was deliberately not included in grant to reduce conflict of interest.
Not doing import at all will not block grant itself (again, setup this way to reduce conflict of interest).
I am not doing this import because funder requires me to do so, I rather obtained funding to make such kind of import possible.