User:SafwatHalaby/scripts/brand

From OpenStreetMap Wiki
Jump to navigation Jump to search

This script uses the templates in this page, and applies them to branches/brands in Israel, ensuring they are consistent. It periodically scans POIs in Israel, and changes them to fit the templates.

Script pages:

This script is part of my scripting project. (Click for contact details, opt-out, other scripts, overview, etc.).

Overview

The goal

  • The aim is consistency for multi-branch companies. Consistency has many advantages.
  • It ensures all branches have useful tags, like website and multilingual names.
  • It ensures uniform data, making future bulk edits, statistics, automatic analysis, etc, easier.
  • Future updates can be easily applied to all branches via the same scripts.
  • See the initial discussion thread for a longer discussion.

Desired result

  • name, name:he, name:ar, name:ru, name:en, website, brand for all major branches according to the templates.
  • remove the operator tag if it duplicates the brand. Keep if it if useful.
  • For companies with variant sub-brands (e.g. "shufersal sheli"), the name tags are modified to reflect the variant, the brand tag remains the main brand (e.g. name="Shufersal Sheli" brand=shufersal), to keep the POI linked to the brand.
  • In cases where the original name tag is useful (not very common), keep it as is, distinct from brand.

Changesets

History

  • #52437938 - 28/09/2017 - 5 changes: Eden Teva Market Wikidata. (changeset aciddentally untagged)
  • #52437285 - 28/09/2017 - 15 changes: Brand tagging consistency and wikidata>brand:wikidata updates. (changeset aciddentally untagged)
  • #52346846 - 25/09/2017 - 89 changes: Brand tagging consistency for McDonald's.
  • #52162459 - 19/09/2017 - 1474 changes: Brands/Branches consistency and wikidata/wikipedia fixes. (See also #52345502 and #52345661, and see #51350214 for an explanation of this).
  • #52162366 - 19/09/2017 - Removed operator:wikidata and operator:wikipedia (will be followed up with a fix)
  • #49881487 - 28/06/2017 - 2 changes: Brands/Branches consistency.
  • #49799877 - 24/06/2017: Brand consistency for supermarkets and pharmacies
  • #49789569 - 24/06/2017: Added wiki tags to banks and fuel stations.
  • #49533757 - 14/06/2017: Brands/Branches consistency.
  • #49026835 - 27/05/2017: Branch consistency - Added some Russian and Arabic names.
  • #48920913 - 23/05/2017: Fixed typo in Hapoalim Businuss branches, and fixed operator tag removal bug.
  • #48695896 - 15/05/2017: Added Russian names to amenity=bank.
  • #48642259 - 13/05/2017: Removed redundant tags and duplicate amenities for amenity=fuel. (one-time bugfix)
  • #48614920 - 12/05/2017 - 719 changes: Brand tagging consistency for amenity=fuel.
  • #48603734 - 11/05/2017 - 345 changes: Automated edit: Consistency for amenity=bank.
  • #48536339 - 09/05/2017 - 57 changes: Consistency for Bank Hapoalim and Bank Leumi Branches.

Future runs

This script is periodic. Runs will be frequent until all existing branches become consistent. Then, it can be run roughly monthly.

Changeset tags

bot=yes

description=https://wiki.openstreetmap.org/wiki/User:SafwatHalaby/scripts/brand

used_script=brand

used_template=Historic link to the relevant version of the templates page

The algorithm

The words in bold are functions which are precisely defined below. But knowing their exact meaning isn't required to understand the algorithm. All comparisons are normalized first: whitespace and dashes does not affect comparisons, neither does upper/lowercase.

For each POI:

  • match it to a template. At this point, exceptions listed below may cause the POI to be skipped.
  • Apply all the template's tags to the POI. (name or name:lang have some special treatment below).
  • If a POI matches a variant, apply the variant's tags, too. This will overwrite some of the tags applied by the template.
  • remove all brand:lang tags from the POI.
  • for each operator or operator:lang tag, remove it. If it's useful, that removal is saved in logs for manual inspection.
  • For each changed name or name:lang tag, if the old value is useful, that change is saved in logs for manual inspection.
  • Remove contact:website (in favor of website).
  • If the POI is in a special list, some tag specific to that particular POI are applied, this may override both the template and the variant tags. (This may e.g. revive operator or apply some name specific to a single POI) - Currently no POIs have needed this, and the list is unimplemented. But it will be if needed.
  • Any tags not mentioned in the templates e.g. opening_hours, are never touched.

Exceptions

  • POIs with a highway tag are ignored and no warning is emitted. (This catches some cases in which e.g. a busstop has the name of a brand it is next to).
  • A POI is skipped without warning if it matches no templates. All further skips emit warnings for later manual inspection.
  • If a POI matches 2 templates, it is skipped.
  • if a POI's amenity=* or shop=* does not already equal the matched brand's amenity or shop tag, then the POI is left alone.
  • If a POI matches two variants, or a variant and an unrelated main brand, it is skipped.

Function definitions

There is a match function which can match either a POI (based on some of its tags' values), or a single tag's value to one of the templates. variant matching happens just like regular matching, but with variants. (and it has precedence).

The matching function - tag

If a tag's value, or any of the words in that value, equals the value of any of the following tags of a template, then it matches that template: "name:he", "name:en", "name:ar", "name:ru", "brand", or alt_find. The string comparisons are normalized - whitespace and letter case do not matter, (e.g. the templates don't need both "helloWorld" and "hello world". One is sufficient).

The program notifies me of "fuzzy matching" (one that is based on substring matching and not the whole string) so that I manually check it.

The matching function - POI

If any of these tags match (according to the tag matching rules) a template, then the POI carrying them matches that template. "name", "name:he", "name:en", "name:ar", "brand", "operator", "branch", "operator:he", "operator:en", "operator:ar", "brand:en", "brand:he", "brand:ar".

The "useful" function

Currently this function is only used to clean logging noise by suppressing non useful messages. A POI's tag is said to be useful if it doesn't match a template, whilst its POI matches one. (It means it conveys some extra information). Examples:

  • In a POI that matches sonol, operator=sonol isn't useful because it matches "sonol", too. (so removing that operator is not logged)
  • In a POI that matches sonol, operator=Ahmed is useful because it doesn't match a template. (removing it is logged and I may cancel the removal before upload)
  • In a POI that matches sonol, name="Gadi's quality Fuel" is useful because it doesn't match a template. (changing that name is logged and I may add it back)

If a tag matches a template different from the POI, the exceptions catch it.

Previously, the script did not remove "useful" tags. But it was found out the tags do need removal most of the time.