Automated edits/butikbot

From OpenStreetMap Wiki
Jump to navigation Jump to search

NOTE: Since alltheplaces.xyz exists, this bot will not be put into operation. Focus will instead be put on importing alltheplaces data for Sweden.

Butikbot is an OpenStreetMap bot that continuously crawls store websites and updates OSM based on the crawled information. The bot focuses on Sweden.

For discussions about the bot, use this forum thread. To get in contact with the maintainer, email "bot name"@grenfeldt.dev.

Version 1.0: Opening hours and website link

Motivation

When I want to go to a store, but don't know where the closest one is, I open up my map app on my phone and search for the store brand. Then I look at the closest one and try to see if it is open. Usually the opening hours can be outdated or there are irregular hours due to holidays. In those cases it is nice if the store webpage is directly linked from the map, so you can get to the source of truth quickly.

This information, up to date opening hours and website links to specific stores, is rarely present in OSM. Keeping this sort of information up to date is well suited for automation by a bot. Having this information up to date improves the convenience of using the map in large city environments, decreasing the times where I need to switch to other maps.

Algorithm description

Butikbot crawls a brand's website, getting the information on all stores of that brand. And then, for every store:

  1. Determine the location of the store using a self-hosted Nominatim instance.
  2. If there is not exactly one match for the store address, give up.
  3. Use some heuristic to assert that the found Node or Way is a store of that type. For example, checking that `name` or `brand` matches the store brand. Otherwise, give up.
  4. Overwrite the website=* tag with the specific link for the store.
  5. Overwrite the opening_hours=* based on the crawled information. (Feedback wanted: Should the bot take any existing value into account here? How should it do it?)

For every store brand, this is repeated either daily or weekly. OSM is only updated when the stores actually changed something, which shouldn't be too often.

A way of structuring the changesets is to group all changes to a certain brand of stores into one changeset. If you have any other suggestions, come with feedback!

To start with, the bot will only crawl large store brands that likely keep their opening hours up to date on their websites. For example: food stores (ICA, Coop, etc.), clothing stores, electronic stores, other retail stores, etc.

Since the parsing of the brand website is done very strictly, any changes to the page will cause the parser to break, requiring the parsing code to be updated. This ensures that no garbage data is entered into the map and that when the structure of the website changes, it will be noticed.

Community opinion

The main place of discussion for this version of the bot is this forum thread.

Furthermore, I have announced the forum thread and bot in the Swedish email list, accidentally on the IRC channel (thinking it was only Swedish), and in the Swedish telegram chat.

Future plans

  • Perhaps put in more information crawled from the website. For example contact info such as email and phone number. (If the contact info is store-specific that is.)
  • It would be nice to add all stores of a brand that are missing. Or make some sort of note in approximately the right place so that someone can create it.
  • It would be nice to mark stores that have moved or been closed.