Mechanical Edits/Mateusz Konieczny - bot account/import websites in Poland from ATP

From OpenStreetMap Wiki
Jump to navigation Jump to search

This page describes import of website=* for various POIs in Poland.

Goals

To add website=* where missing or imprecise.

Part of multiple ATP-based imports I am running.

To provide unique POI identification preparing ground for import of more data.

Schedule

depends of availability of my free hobby time.

Import Data

Background

Note: if some links are broken check https://status.codeberg.eu/status/codeberg and https://www.githubstatus.com/


Data source site: https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/import_possibilities_website_tag_media_expert_pl.html https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/import_possibilities_website_tag_rossmann_pl.html https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/import_possibilities_website_tag_lewiatan_pl.html and other from https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/import_possibilities.html and more of similar ones
produced by https://www.alltheplaces.xyz/ via https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/ from ATP dataset produced from first-party of various companies (see https://community.openstreetmap.org/t/what-you-think-about-importing-opening-hours-data-from-alltheplaces/120608/70 thread analysis)
Data license: see below
Type of license (if applicable): see below
Link to permission (if required): https://osmfoundation.org/wiki/Licensing_Working_Group/Minutes/2023-08-14#Ticket#2023081110000064_%E2%80%94_First_party_websites_as_sources
OSM attribution (if required): not required
ODbL Compliance verified: yes

Import Type

Recurring import done with automated scripts

Data Preparation

data is published by various companies about their POIs

the data is crawled and published by https://github.com/alltheplaces/alltheplaces from public website(s)

ATP data and OSM data is the processed, validated and compared by https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data

Processing includes and is not limited to

  • matching ATP and OSM POIs
  • skipping ATP POIs not matched well to any OSM POIs
  • skipping ATP POIs matched to multiple OSM POIs
  • skipping cases where matched ATP and OSM entries are conflicting on important aspects
  • skipping cases where OSM has specific website tags, but including cases where OSM website=* links main page instead of a specific POI

See

for tests that also document considered cases and behaviour.

Tagging Plans

website=* from ATP goes into website=*

Changeset Tags

Data Merge Workflow

Team Approach

I am doing this import myself but

Workflow

Import will be done by executing previously prepared script. Edits will be monitored and sample of edited objects checked in attempt to detect any previously missed problems and bugs.

Separate changeset for each POI

In case of bad, broken or otherwise problematic data such edit will be reverted. I have experience with reverting own automated edits - though it was not needed often. I will be using a separate account to make such cleanup easier, if it will be ever needed.

Edit will be done using Mateusz Konieczny - bot account - ATP import account

Conflation

Done with custom software residing at https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data

This software is intended to be enabling processing of ATP shop-type POI data in general.

QA

Samples of data was inspected manually.

Data was also reviewed by variety of automated QA, see scripts in https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data - see for example for just sample of problem found, reported back to All the Places project - in form of issues and/or patches

Discussion

The post to the community forum can be found at https://community.openstreetmap.org/t/wiecej-importow-tagow-website/132084

And posted to imports mailing list at https://lists.openstreetmap.org/pipermail/imports/2025-July/thread.html (send on 2025 July, not appeared on list yet - I see that https://lists.openstreetmap.org/listinfo/imports is marked as retored)

Conflict of interest info

I received grant funding for making software that processed ATP data.

Time for making import itself was deliberately not included in grant to reduce conflict of interest.

Not doing import at all will not block grant itself (again, setup this way to reduce conflict of interest).

I am not doing this import because funder requires me to do so, I rather obtained funding to make such kind of import possible.