User:SwiftFast/scripts/brand

From OpenStreetMap Wiki
Jump to: navigation, search

This script uses the templates in this page, and applies them to branches/brands in Israel, ensuring they are consistent. SwiftFast_bot periodically scans POIs in Israel, and changes them to fit the templates.

Script pages:

This script is part of SwiftFast_bot. (Click for contact details, opt-out, other scripts, bot overview, etc.).

Overview

The goal

  • The aim is consistency for multi-branch companies. Consistency has many advantages.
  • It ensures all branches have useful tags, like website and multilingual names.
  • It ensures uniform data, making future bulk edits, statistics, automatic analysis, etc, easier.
  • Future updates can be easily applied to all branches via the same scripts.
  • See the initial discussion thread for a longer discussion.

Desired result

  • name, name:he, name:ar, name:ru, name:en, website, brand for all major branches according to the templates.
  • remove the operator tag if it duplicates the brand. Keep if it if useful.
  • For companies with variant sub-brands (e.g. "shufersal sheli"), the name tags are modified to reflect the variant, the brand tag remains the main brand (e.g. name="Shufersal Sheli" brand=shufersal), to keep the POI linked to the brand.
  • In cases where the original name tag is useful (not very common), keep it as is, distinct from brand.

Changesets

History

  • #49881487 - 28/06/2017 - 2 changes: Brands/Branches consistency.
  • #49799877 - 24/06/2017: Brand consistency for supermarkets and pharmacies
  • #49789569 - 24/06/2017: Added wiki tags to banks and fuel stations.
  • #49533757 - 14/06/2017: Brands/Branches consistency.
  • #49026835 - 27/05/2017: Branch consistency - Added some Russian and Arabic names.
  • #48920913 - 23/05/2017: Fixed typo in Hapoalim Businuss branches, and fixed operator tag removal bug.
  • #48695896 - 15/05/2017: Added Russian names to amenity=bank.
  • #48642259 - 13/05/2017: Removed redundant tags and duplicate amenities for amenity=fuel. (one-time bugfix)
  • #48614920 - 12/05/2017 - 719 changes: Brand tagging consistency for amenity=fuel.
  • #48603734 - 11/05/2017 - 345 changes: Automated edit: Consistency for amenity=bank.
  • #48536339 - 09/05/2017 - 57 changes: Consistency for Bank Hapoalim and Bank Leumi Branches.

Future runs

This script is periodic. Runs will be frequent until all existing branches become consistent. Then, it can be run roughly monthly.

Changeset tags

Changesets prior to 6/2017 didn't use consistent tags. All future changesets will be tagged as follows.

bot=yes

description=https://wiki.openstreetmap.org/wiki/User:SwiftFast/scripts/brand

used_script=SwiftFast_brand

used_template=Historic link to the relevant version of the templates page

The algorithm

The words in bold are functions which are precisely defined below. But knowing their exact meaning isn't required to understand the algorithm. All comparisons are normalized first: whitespace and dashes does not affect comparisons, neither does upper/lowercase.

For each POI:

  • match it to a template. At this point, exceptions listed below may cause the POI to be skipped.
  • Apply all the template's tags to the POI. (name or name:lang have some special treatment below).
  • If a POI matches a variant, apply the variant's tags, too. This will overwrite some of the tags applied by the template.
  • remove all brand:lang tags from the POI.
  • for each operator or operator:lang tag, keep if useful (and save that in logs for manual inspection), remove otherwise.
  • For each name or name:lang tag, keep original if it is useful (This is quite rare) (and save that in logs for manual inspection), apply the template's value otherwise.
  • Remove contact:website (in favor of website).

Exceptions

  • A POI is skipped (without warning) if it matches no templates. All further skips emit warnings for later manual inspection.
  • If a POI matches 2 templates, it is skipped.
  • If a POI has an amenity or shop tag, but it does not match the template's amenity or shop tag, it is skipped.
  • POIs with a highway tag are ignored and no warning is emitted. (This catches some cases in which e.g. a busstop has the name of a brand it is next to).
  • if a POI's amenity=* or shop=* does not already equal the matched brand's amenity or shop tag, then the POI is left alone and a warning is put in a log for me to manually inspect.
  • If a POI matches two variants, or a variant and an unrelated main brand, it is skipped.

Function definitions

There is a match function which can match either a POI (base on some of its tags' values), or a single tag's value to one of the templates. variant matching happens just like regular matching, but with variants. (and it has precedence).

The matching function - tag

If a tag's value, or any of the words in that value, equals the value of any of the following tags of a template, then it matches that template: "name:he", "name:en", "name:ar", "name:ru", "brand", or alt_find.

The program notifies me of "fuzzy matching" (one that is based on substring matching and not the whole string) so that I manually check it.

The matching function - POI

If any of these tags match (according to the tag matching rules) a template, then the POI carrying them matches that template. "name", "name:he", "name:en", "name:ar", "brand", "operator", "branch", "operator:he", "operator:en", "operator:ar", "brand:en", "brand:he", "brand:ar".

The program notifies me of "fuzzy matching" (one that is based on substring matching and not the whole string) so that I manually check it.

The "useful" function

A POI's tag is said to be useful if it doesn't match a template, whilst its POI matches one. (It means it conveys some extra information). Examples:

  • In a POI that matches sonol, operator=sonol isn't useful because it matches "sonol", too.
  • In a POI that matches sonol, operator=Ahmed is useful because it doesn't match a template.
  • In a POI that matches sonol, name="Gadi's quality Fuel" is useful because it doesn't match a template.

(If a tag matches a template different from the POI, the exceptions catch it).

Source code

todo