Semi-colon value separator

From OpenStreetMap Wiki
(Redirected from Semicolon)
Jump to navigation Jump to search

We use a semi-colon value separator (the ; character) in our tag values in some situations, but avoid them in others. This can be necessary when a single element needs to take multiple values for the same key.

Current applications (OSM "data consumers") can handle such semiColon Separated Values (CSV) without problems, as long as they are used appropriately. Older software from the "early days" of OSM had more problems. When such semi-colon separation is used in tags where they are not expected, software might handle them in unintended ways, such as treating the whole string as one value or considering only the first part of the concatenation.

Examples for established uses

  • Sections of a road that are designated multiple references, e.g. ref=B500;B550 for a road signposted as both B500 and B550. You would only do this if the identical section of road carries both ref values. However, if there is any point on this road section where the ref changes from one to the other, then you would place a node and split the way at that point.
  • Complex values that evidently cannot be represented using subkeys (notably when they are unordered lists of items) may use semicolons, e.g.:
    • opening_hours=Tu-Fr 08:00-18:00;Mo 09:00-18:00;Sa 09:00-12:00;closed Aug
    • turn=* lanes on roads can have several turn directions for the same lane, e.g. turn:lanes=left;through|through|through;right
  • In the case of additional, describing tags, there is often no better way to tag diverse properties and combinations of them.
    Here, semicolons are in wide use for these 'detail' tags where several values are common, e.g.:

When NOT to use

On important "top-level" tags that define what an element is avoid ; separated values whenever possible. Examples are highway=*, amenity=*, leisure=*, landuse=*, and natural=coastline.

Don't use them in your mapping, and don't propose them on the wiki if there are better ways of representing things. This is because use of semi-colons as value separators is contrary to the aim of keeping it simple both for data contributors (mappers) and data users. For the sake of new contributors and anyone trying to use the data (people building software for rendering, searching, "find my nearest cafe" mobile apps, etc) we should keep at least basic data directly usable.

In situations where you have multiple values, there are normally a couple of alternative approaches:

  • Choose one of the values: Take the overriding "primary" value, and go with that. Example: You're mapping something which is a cafe but also a bar. It's much more helpful to just pick amenity=cafe or amenity=bar (look at the cafe/bar, and make a choice: Is it primarily a cafe, or primarily a bar?) It is not a good idea to map it as amenity=cafe;bar.
  • Split the element: Separate things out into distinct features to allow them to be tagged separately with normal tags. Example: You're mapping a library which has a cafe inside it. Place a node for the cafe, and then either represent the library (a larger building) as an area instead, or just as a separate node. It is not a good idea to map it as amenity=library;cafe

In both examples, if you use ; in the amenity value, then that isn't going to show up in a "find my nearest cafe" mobile app any time soon. Even though it is entirely possible for systems to parse the value, and split it by the ; character, almost all existing systems don't.

Syntax details

Space character padding

Normally, the delimiter is a semicolon without a space before or after, for example, ref=B500;B550. However, the opening hours and conditional restrictions syntaxes require a space after the semicolon.

An older tagging style placed a space character after each of the ; characters in other contexts for human-readability, for example, ref=B500; B550. Potlatch automatically introduces a ; when merging two ways. [1][2][3] However, this usage became insignificant by 2013. [4] iD automatically removes a space after a semicolon, except in keys that require it. This is currently an inconsistency between JOSM and Potlatch (both versions) in their approach to automatic value separating. [ clarification needed ]

Escaping with ';;'

The semicolon was chosen as a delimiter because it rarely occurs within a name or keyword, so data consumers can split the tag on semicolons with more confidence than with any other delimiter. Nonetheless, in case an individual value within a list of values needs to contain a semicolon, you can escape it by entering two consecutive semicolons: ;;. As of January 2023, this escape sequence occurs only 36 times globally, of which this creatively named café is likely the only feature whose name is intentionally escaped. OpenStreetMap Americana is one of the few data consumers that understand the escaped semicolon.

Older separators

Prior to a community consensus on the use of the semi-colon ; several other characters were suggested to separate values. These included: "/" (solidus), " " (space), "-" (hyphen), and "#" (number sign). The semicolon is now widely accepted as the character to use, and is supported by Potlatch and JOSM. Older variants can now be replaced.

Software support

Supporting CSV lists in software is not complicated, it mostly requires some text processing by handling of substrings and regular expressions which is available in every programming language. However it needs to be implemented proactively by the developer, thus it can only expected to be implemented when the usage is reasonably expected.

Data consumers

Geocoders

  • Nominatim splits name tags by semicolons when indexing features. [5]

Query tools

  • The current Overpass Query Language supports CSVs in tag values by
  • Sophox splits most keys' values by semicolons. Each value in a semicolon delimited list results in a separate triple (with the value as an osmd:-prefixed object).
  • (Historic) XAPI (retired in 2017, development ceased in 2012) apparently did not support regular expressions and substrings, causing users difficulties handling CSVs in the past.

Renderers

  • Mapbox Streets replaces ; with a spaced em dash ( — ) in any name=* or name:*=* tag. [6] For primary keys such as amenity=* or shop=*, it considers only the portion up to the first semicolon and drops the rest.
  • (Historic) MapQuest Open used to interpret ref=* by placing each semicolon-delimited value on a separate shield (however free access to open tiles has been discontinued in 2016).
  • OpenStreetMap Americana replaces each semicolon in name=* with a bullet (•) or newline, depending on the type of feature, and unescapes escaped semicolons. [7] OSM Americana does not need to split ref=*, because it renders route shields based on route relations; a route concurrency is represented by multiple relations.
  • OpenStreetMap Carto as the style for the general map focuses on primary tags which rarely have CSVs. For the road shields generated from the ref=* tag, the values are pre-processed in SQL, replacing semicolons with a newline character, so that the individual refs show in separate lines on the shield. It renders semicolons in name=* verbatim. [8]

Routers and navigation applications

  • OsmAnd supports CSV lists correctly in the following examples:
    • In the map view, it alternates the different ref=* values on road shields.
    • For refs in the "current road" widget and navigation instructions, it replaces the semicolon with a comma and space for good readability.
    • It sends such comma to the text-to-speech engine to allow voice structuring of multiple refs.
    • It parses complex opening hours, presents them in a convenient form and calculates from the current time if the facility is open or closed.
    • It parses the turn:lanes and presents them in graphical form; while navigating it highlights the lane to choose.
    • It nicely reformats the cuisine of a restaurant with multiple values, i.e. showing cuisine=german;italian;mexican as "German • Italian • Mexican".
  • The Mapbox Directions API returns text and voice instructions that include the first or most relevant road name, ref, and destination and omit the rest from the sentence for brevity. The omitted names, refs, and destinations remain in other fields.
  • GraphHopper replaces semicolons in name=* with commas when generating turn-by-turn instructions. [9]
  • Valhalla replaces semicolons in name=* with slashes when generating turn-by-turn instructions. [10]

Editors

Editors for OpenStreetMap data need a process to handle different values of the same key, when two or more objects are merged.

  • iD prevents you in some cases from merging two elements with different values for a key (e.g. for highways). For some other tags it merges them using the semi-colon (e.g. leisure=park and leisure=water_park). Various fields allow the user to construct a semicolon-delimited value list graphically using autocompleting text fields.
  • JOSM presents the user a modal warning box that values are conflicting, followed by a dialogue to resolve the conflict by choosing a particular value. Only when the user explicitly selects to keep "all" tag values, they are merged into a semicolon-delimited value list. [11]
  • Potlatch 1 (maintained until 2010), Potlatch 2 (maintained until 2011), and Potlatch 3 all join tag values with semicolons when merging ways which have tag with the same keys. In most cases this creates invalid tagging and needs to be manually replaced by a single, valid value.
  • Vespucci 0.9.8 and above supports editing semicolon-delimited value lists. [12][13]

Alternatives

If you're proposing a new scheme which would seem to require values splitting with semicolons, consider converting it to multiple tags with yes/no values.

Simple "yes/no" tags

Most "properties" or "attributes" of features are described with a simple key, without namespacing:

  • lit=yes/no - to specify whether a street or parking lot is lit at night lit=yes/no is added
  • oneway=yes/no - to specify whether a highway is oneway oneway=yes/no is added to highway=*
  • drive_through=yes/no - specifies whether a feature such as a bank or restaurant offers drive-through service

Namespaced tags

It can be helpful to use a namespace if the property or attribute needs to be specifically related to a single feature, however, this isn't always necessary.

For example, a hypothetical scheme for describing the books and items a library offers could be expressed as:

amenity=library
library:stock=books;newspapers;recorded_music

But it's probably better to rewrite the scheme to express the concepts as:

amenity=library
library:stock:books=yes
library:stock:newspapers=yes
library:stock:recorded_music=yes

payment=* and fuel=* are good examples of this second approach. Boolean-valued tags such as these can be extended with extra values later on if necessary, or even sub-namespaced meaningfully.

Relations

Relations inherently support many-to-many relationships. For example, a highway=* way can be a member of multiple type=route relations. Each of these relations can store information about the respective routes in structured format, avoiding the need to devise a nested syntax for this information within the highway=* way's semicolon-delimited ref=* tag.

Other uses of semicolons

Occasionally, semicolons are used for purposes other than delimiting the values in a list:

See also

External links