Overpass API/Overpass QL

From OpenStreetMap Wiki
Jump to: navigation, search

Overview

Overpass QL is the second query language for the Overpass API and was designed as an alternative to Overpass XML. It has a C style syntax: The whole query source code is divided in statements, and every statement ends with a semicolon. It has imperative semantics: The statements are processed one after another and change the execution state according to their semantics.

The execution state consists of the default set, potentially other named sets, and for block statements a stack. A set can contain nodes, ways, relations and areas, also of mixed type and of any number. Sets are created as result sets of statements and are read by subsequent statements as input. Unless you specify a named set as input or result, all input is implicitly read from and all results are written to the default variable named _ (a single underscore). Names for sets may consist of letters, digits and the underscore but must not start with a digit. Once a new result is (implicitly or explicitly) assigned to an existing set, its previous contents will be replaced and are no longer available. Sets always have global visibility.

There are several different types of statement. You almost always need the print statement, which is called an action, because it has an effect outside the execution state (the output). The other statements are grouped into

  • Standalone queries: These are complete statements on their own.
  • Filters: They are always part of a query statement and contain the interesting selectors and filters.
  • Block statements: They group statements and enable disjunctions as well as loops.
  • Settings: Things like output format that can be set once at the beginning.

Sets

Overpass QL can work with sets. By default, everything is read from and send to the default set "_".

To send something to a different set, use the "->" syntax. For example

  (node[name="Foo"];)->.a;

will store all nodes with name=Foo in set "a".

To select something from a set, append the command with ".a".

  node.a[amenity=foo];

will return all nodes in the set "a" that have the tag amenity=foo.

Block statements

Union

The union block statement is written as a pair of parentheses. Inside the union, any sequence of statements can be placed, including nested union and foreach statements.

  (statement_1; statement_2;)[->.result_set];

It takes no input set. It produces a result set. Its result set is the union of the result sets of all sub-statements, regardless of whether a sub-statement has a redirected result set or not.

Example:

  (node[name="Foo"];way[name="Foo"];);

This collects in the first statement all nodes that have a name tag "Foo" and in the second statement all ways that have a name tag "Foo". After the union statement, the result set is the union of the result sets of both statements.

The result set of the union statement can be redirected with the usual postfix notation:

Example:

  (node[name="Foo"];way[name="Foo"];)->.a;

Same as the preceding example, but the result is written into the variable a.

Difference

The difference block statement is written as a pair of parentheses. Inside the difference statement, exactly two statements must be placed, and between them a minus sign.

  (statement_1; - statement_2;)[->.result_set];

It takes no input set. It produces a result set. Its result set contains all elements that are result of the first sub-statement and not contained in the result of the second sub-statement.

Example:

  (node[name="Foo"]; - node(50.0,7.0,51.0,8.0););

This collects all nodes that have a name tag "Foo" but are not inside the given bounding box.

The result set of the difference statement can be redirected with the usual postfix notation:

Example:

  (node[name="Foo"]; - node(50.0,7.0,51.0,8.0);)->.a;

Same as the preceding example, but the result is written into the variable a.

For-each loop (foreach)

The foreach block statement is written as the keyword foreach, followed by a pair of parentheses. Inside these parentheses, any sequence of statements can be placed, including nested union and foreach statements.

It takes an input set. It produces no result set. The foreach statement loops over the content of the input set, once for every element in the input set.

Example:

  way[name="Foo"];
  foreach(
    (
      ._;
      >;
    );
    out;
  );

For each way that has a name tag with value "Foo", this prints the nodes that belong to this way immediately followed by the way itself. In detail, the result set of way[name="Foo"] is taken as input set. Then, for each element in this input set the loop body is executed once. Inside the loop body the union of the element and its nodes is taken. Then this union is printed. Note that during execution, each printed subset in an iteration is independant of subsets printed in other iterations, possibly resulting in duplicate objects in the global output (no union is computed by the out statement within the loop).

The input set of the foreach statement can be taken from a variable with the usual postfix notation:

Example:

  foreach.a(...);

This loops over the content of set a instead of the default set "_".

The name of the variable to put the loop element into can also be chosen by adding a postfix immediately before the opening parenthese.

Example:

  foreach->.b(...);

This puts the element to loop over into the variable b. Without it, the foreach statement does not puts the elements into any set. Example for both input and loop set changed:

  foreach.a->.b(...);

Standalone queries

Item

The item standalone query consists only of an input set prefix.

It takes the input set specified by its prefix. It reproduces its input set as result set. This is in particular useful for union statements.

The most common usage is the usage with the default input set:

  ._;

But of course other sets are possible too:

  .a;

The item statement can also be used as filter.

Recurse up (<)

The recurse up standalone query is written as a single less than.

It takes an input set. It produces a result set. Its result set are all the ways that have a node appearing in the input set as a member, all relations that have a node or way from the input set as a member, and all relations that have a way from the result set as members.

Example:

  <;

The input set of the recurse up statement can be chosen with the usual prefix notation:

  .a <;

The result set of the union statement can be redirected with the usual postfix notation:

  < ->.b;

Of course, you can also change both:

  .a < ->.b;

Recurse up relations (<<)

The recurse up relations standalone query has a similar syntax to the recurse up query and differs only in two aspects:

  • It is written as a double less than.
  • It also recursively returns all relations that have a relation appearing in the input set as a member.

In particular, you can change the input and/or result set with the same notation as for the recurse up standalone query.

Precisely, the recurse up relations standalone query returns the transitive and reflexive closure of membership backwards.

Example:

  <<;

Recurse down (>)

The recurse down standalone query has a similar syntax to the recurse up query and differs only in two aspects: It is written as a greater than. And it returns the node members of all ways from the input set, the way and node members of all relations from the input set, and the node members of all ways that are in the result set.

In particular, you can change the input and/or result set with the same notation as for the recurse up standalone query.

Example:

  >;

Recurse down relations (>>)

The recurse down relations standalone query has a similar syntax to the recurse down query and differs only in two aspects:

  • It is written as a double greater than.
  • It also recursively returns all relations that are members in a relation appearing in the input set.

In particular, you can change the input and/or result set with the same notation as for the recurse down standalone query.

Precisely, the recurse down relations standalone query returns the transitive and reflexive closure of membership.

Example:

  >>;

Query for areas (is_in)

The standalone query is_in returns the areas the cover the given coordinates (when specified) or one or more nodes from the input set (when no coordinates are specified).

It takes either an input set or a co-ordinate. It produces a result set. The results are all areas which contain at least one of node from the input set or the specified coordinates.

  [.input_set] is_in[->.result_set];
  is_in(latitude,longitude)[->.result_set];

In its shortest form, it takes its input set as the coordinates to search for. Example:

  is_in;

The input set can be chosen with the usual prefix notation:

  .a is_in;

The result set can be redirected with the usual postfix notation:

  is_in->.b;

Of course, you can also change both:

  .a is_in->.b;

Instead of taking existing nodes you can also specify coordinates with two floating point numbers, divided by a comma. They are interpreted as latitude, longitude. In this case, the input set is ignored. Example:

  is_in(50.7,7.2);

Also in this variant, the result set can be redirected with the usual postfix notation:

  is_in(50.7,7.2)->.b;

Filters

The most important statement is the query statement. This is not a single statement but rather consists of one of the type specifiers node, way or relation (or shorthand rel), followed by one or more filters. The result set is the set of all elements that match the conditions of all the filters.

Example:

  node[name="Foo"];

Here, node is the type specifier, [name="Foo"] is the filter and the semicolon ends the statement.

The query statement has a result set that can be changed with the usual postfix notation.

  node[name="Foo"]->.a;

The individual filters may have in addition input sets that can be changed in the individual filters. Please see for this at the respective filter.

By tag (has-kv)

The has-kv filter selects all elements that have or have not a tag with a certain value. It supports the basic OSM types node, way, and relation as well as the extended type area.

It has no input set. As for all filters, the result set is specified by the whole statement, not the individual filter.

All variants consist of an opening bracket, then a string literal in single or double quotes. Then the variants differ. All variants end with a closing bracket. If the string literal consists only of letters, the quotes can be omitted.

Equals (=, !=)

The most common variant selects all elements where the tag with the given key has a specific value. This variant contains after the key literal an equal sign and a further literal containing the value. Examples, all equivalent:

  node["name"="Foo"];
  node[name=Foo];
  node['name'="Foo"];
  node[name="Foo"];
  node["name"='Foo'];

If you have a digit, whitespace or whatever in the value, you do need single or double quotes:

  node["name"="Foo Street"];
  node["name"='Foo Street'];
  node[name="Foo Street"];

Exists

The second variant selects all elements that have a tag with a certain key and an arbitrary value. It contains nothing between the key literal and the closing bracket:

  node["name"];
  node['name'];
  node[name];

Matches regular expression (~, !~)

The third variant selects all elements that have a tag with a certain key and a value that matches some regular expression. It contains after the key literal a tilde, then a second literal for the regular expression to search for:

  node["name"~"^Foo$"];    /* finds exactly "Foo" */
  node["name"~"^Foo"];     /* finds anything that starts with "Foo" */
  node["name"~"Foo$"];     /* finds anything that ends with "Foo" */
  node["name"~"Foo"];      /* finds anything that contains the substring "Foo" */
  node["name"~"."];        /* finds anything, equal to the previous variant */

Please note that in QL you need to escape backslashes: ["name"~"^St\."] results in the regular expression ^St. (which finds every name starting with "St"), while ["name"~"^St\\."] produces the most likely meant regular expression St\. (which finds every name starting with "St."). This is due to the C escaping rules and doesn't apply to the XML syntax.

You can also search case insensitively:

  node["name"~"^Foo$",i];    /* finds "foo", "FOO", "fOo", "Foo" etc. */

Both the key and value variants with and without regular expressions can be negated. They then select exactly the elements which have a tag with the given key, but no matching value and the elements that don't have a tag with the given key:

  node["name"!="Foo"];
  node["name"!~"Foo"];
  node["name"!~"Foo",i];

Bounding box

The bbox-query filter selects all elements within a certain bounding box.

It has no input set. As for all filters, the result set is specified by the whole statement, not the individual filter.

  (south,west,north,east)

It consists of an opening parenthesis. Then follow four floating point numbers, separated by commas. The filter is ends with a closing parenthesis.

The floating point numbers give the limits of the bounding box: The first is the southern limit or minimum latitude. The second is the western limit, usually the minimum longitude. The third is the northern limit or maximum latitude. The last is the eastern limit, usually the maximum longitude. If the second argument is bigger than the fourth argument, the bounding box crosses the longitude of 180 degrees.

Example:

  node(50.6,7.0,50.8,7.3);

Recurse (n, w, r, bn, bw, br)

The recurse filter selects all elements that are members of an element from the input set or have an element of the input set as member, depending on the given parameter.

The input set can be changed with an adapted prefix notation. As for all filters, the result set is specified by the whole statement, not the individual filter.

It consists of an opening parenthesis. Then follows one of the symbols: w (forward from ways), r (forward from relations), bn (backward from nodes), bw (backward from ways), or br (backward from relations). Then follows an optional input set declaration. The filter ends with a closing parenthesis.

Examples with default input set:

  node(w);        // select child nodes from all ways of the input set
  node(r);        // select node members of relations of the input set
  way(bn);        // select parent ways for all nodes from the input set
  way(r);         // select way members of relations from the input set
  rel(bn);        // select relations that have node members from the input set
  rel(bw);        // select relations that have way members from the input set
  rel(r);         // select all members of type relation from all relations of the input set
  rel(br);        // select all parent relations of all relations from the input set

Example with modified input set:

  node(w.foo);

You can also restrict the recurse to a specific role. Just add a colon and then the name of the role before the closing parenthesis.

Examples with default input set:

  node(r:"role");        // select node members of relations of the input set
  way(r:"role");         // select way members of relations from the input set
  rel(bn:"role");        // select relations that have node members from the input set
  rel(bw:"role");        // select relations that have way members from the input set
  rel(r:"role");         // select all members of type relation from all relations of the input set
  rel(br:"role");        // select all parent relations of all relations from the input set

Example with modified input set:

  node(r.foo:"role");

And you can also search explicitly for empty roles:

  node(r:"");
  node(r.foo:"");

By input set

The "item" filter selects all elements from its input set.

As for all filters, the result set is specified by the whole statement, not the individual filter.

It consists of a dot, followed by the name of the input set.

Examples: The default set

  node._;

and a named set

  node.a;

By element id.

The id-query filter selects the element of given type with given id. It supports beside the OSM datatypes node, way, and relation also the type area.

It has no input set. As for all filters, the result set is specified by the whole statement, not the individual filter.

It consists of an opening parenthesis. Then follows a positive integer. The filter is ends with a closing parenthesis.

Examples:

  node(1);
  way(1);
  rel(1);
  area(1);

By convention the area id can be calculated from an existing OSM way by adding 2400000000 to its OSM id or in case of a relation by adding 3600000000 respectively. Note that area creation is subject to some extraction rules, i.e. not all ways/relations have an area counterpart.

Relative to other elements (around)

The around filter selects all elements within a certain radius around the elements in the input set. If you provide coordinates, then these coordinates are used instead of the input set.

The input set can be changed with an adapted prefix notation. As for all filters, the result set is specified by the whole statement, not the individual filter.

It consists of an opening parenthesis. Then follows the keyword around. Then follows optionally an input set declaration. Then follows a single floating point number that denotes the radius in meters. The filter either ends with a closing parenthesis or is followed by two comma separated floating point numbers indicating latitude and longitude and then finally a closing parenthesis.

  (around[.input_set]:radius)
  (around:radius,latitude,longitude)

Examples:

  node(around:100.0);
  way(around:100.0);
  rel(around:100.0);

Example with modified input set:

  node(around.a:100.0);

Examples with coordinates:

  node(around:100.0,50.7,7.1);
  way(around:100.0,50.7,7.1);
  rel(around:100.0,50.7,7.1);

By polygon (poly)

The polygon filter selects all elements of the chosen type inside the given bounding box.

It has no input set. As for all filters, the result set is specified by the whole statement, not the individual filter.

It consists of an opening parenthesis. Then follows the keyword poly. Then follows a string containing an even number of floating point numbers, divided only by whitespace. Each pair of floating point numbers represents a coordinate, in order latitude, then longitude. The filter ends with a closing parenthesis.

  (poly:"latitude_1 longitude_1 latitude_2 longitude_2 latitude_3 longitude_3 …");

An example (a triangle near Bonn, Germany):

  node(poly:"50.7 7.1 50.7 7.2 50.75 7.15");
  way(poly:"50.7 7.1 50.7 7.2 50.75 7.15");
  rel(poly:"50.7 7.1 50.7 7.2 50.75 7.15");

newer

The newer filter selects all elements that have been changed since the given date. As opposed to other filters, this filter cannot be used alone. If the underlying database instance supports attic data, then "changed" is probably a better choice than "newer".

It has no input set. As for all filters, the result set is specified by the whole statement, not the individual filter.

It consists of an opening parenthesis. Then follows a date specification. Please note that this date specification cannot be abbreviated and has to be put in single or double quotes. The filter ends with a closing parenthesis.

Example:

  node._(newer:"2012-09-14T07:00:00Z");

This finds all nodes that have changed since 14 Sep 2012, 7 h UTC, in the given input set.

By date of change (changed)

The changed filter selects all elements that have been changed between the two given dates. If only one date is given, then the second is assumed to be the front date of the database. If only one date is given and it is run with the current timestamp, then it behaves exactly like "newer" with two exceptions: first, it is faster, second, it can also stand as the only filter.

It has no input set. As for all filters, the result set is specified by the whole statement, not the individual filter.

It consists of an opening parenthesis. Then follows a date specification. Please note that this date specification cannot be abbreviated and has to be put in single or double quotes. Then can follow a comma and a second date specification. The filter ends with a closing parenthesis.

Example: All changes since the given date and now

  node._(changed:"2012-09-14T07:00:00Z");

Example: All changes between the two given dates

  node._(changed:"2012-09-14T07:00:00Z","2012-09-14T07:01:00Z");

By user (user, uid)

The user filter selects all elements that have been last touched by the specified user.

It has no input set. As for all filters, the result set is specified by the whole statement, not the individual filter.

It consists of an opening parenthesis. Then follows either the keyword user, a colon and a string literal denoting the user name to search for. Or the keyword uid followed by the user id of the user to search for. The filter ends with a closing parenthesis.

Example:

  node(user:"Steve");
  node(uid:1);

By area (area)

The area filter selects all elements of the chosen type that are inside the given area. Please note with regard to attic data that areas always represent current data.

The input set can be changed with an adapted prefix notation. As for all filters, the result set is specified by the whole statement, not the individual filter.

It consists of an opening parenthesis. Then follows the keyword area. Then can follow a colon and a non-negative integer. The filter ends with a closing parenthesis.

Nodes are found if they are properly inside or on the border of the area. Ways are found if at least one point (also points on the segment) is properly inside the area. A way ending on the border and not otherwise crossing the area is not found. Relations are found if one of its members is properly inside the area.

If the area statement is provided without integer, the areas from the input set are used. An Example:

  node(area);
  way(area);
  rel(area);

The example with modified input set:

  node(area.a);
  way(area.a);
  rel(area.a);

If an integer is added, the input set is ignored and instead the area that has the given integer as id is taken.

  node(area:2400000001);
  way(area:2400000001);
  rel(area:2400000001);

By convention the area id can be calculated from an existing OSM way by adding 2400000000 to its OSM id or in case of a relation by adding 3600000000 respectively. Note that area creation is subject to some extraction rules, i.e. not all ways/relations have an area counterpart.

Area pivot (pivot)

The pivot filter selects the element of the chosen type that defines the outline of the given area.

The input set can be changed with an adapted prefix notation. As for all filters, the result set is specified by the whole statement, not the individual filter.

It consists of an opening parenthesis. Then follows the keyword pivot. The filter ends with a closing parenthesis.

The statement finds for each area in the input set the respective element that the area has been generated from. Which is either a multipolygon relation or a way.

Examples:

  way(pivot);
  rel(pivot);

The example with modified input set:

  way(pivot.a);
  rel(pivot.a);

Actions

There is currently only one action. This action prints out the content of its input set.

Print (out)

The out action can be configured with an arbitrary number of parameters that are appended, separated by whitespace, between the word out and the semicolon.

The out action takes an input set. It doesn't return a result set. The input set can be changed by prepending the variable name.

Allowed values, in any order, are:

  • one of the following the degree of verbosity; default is body:
    • ids: Print only the ids of the elements.
    • skel: Print also the information necessary for geometry. These are also coordinates for nodes and way and relation member ids for ways and relations.
    • body: Print all information necessary to use the data. These are also tags for all elements and the roles for relation members.
    • "tags": Print only ids and tags for each element and not coordinates or members.
    • meta: Print everything known about the elements. This includes additionally to body for all elements the version, changeset id, timestamp and the user data of the user that last touched the object.
  • one of the following modificators for derived information:
    • "bbox": Adds the bounding box of each element to the element. For nodes this is equivalent to "geom". For ways it is the enclosing bounding box of all nodes. For relations it is the enclosing bounding box of all node and way members, relations as members have no effect.
    • "center": This adds the center of the above mentioned bounding box to ways and relations.
    • "geom": Add the full geometry to each object. This adds coordinates to each node, to each node member of a way or relation, and it adds a sequence of "nd" members with coordinates to all relations.

The attribute "geom" can be followed by a bounding box in the format "(south,west,north,east)". In this case only coordinates that are inside the bounding box are produced. For way segments also the first coordinate outside the bounding box is produced to allow for properly formed segments.

  • One of the following for the sort order can be added. Default is asc.
    • asc: Sort by object id.
    • qt: Sort by quadtile index; this is roughly geographical and significantly faster than order by ids.
  • a non-negative integer for the maximum number of elements to print. Default is no limit.

Example:

  out;

Print the elements without meta information.

Example:

  out meta;

Print the elements with meta information.

Example:

  out 99;

Print at most 99 elements.

Example:

  out meta qt 1000000;

Print up to 1,000,000 elements, ordered by location, with meta data.

Example:

  .a out;

Reads from variable a the data to output.

Settings

timeout

The timeout setting has one parameter, a non-negative integer. Default value is 180.

This parameter indicates the maximum allowed runtime for the query in seconds, as expected by the user. If the query runs longer than this time, the server may abort the query with a timeout. The second effect is, the higher this value, the more probably the server rejects the query before executing it.

So, if you send a really complex big query, prefix it with a higher value; e.g., "3600" for an hour. And ensure that your client is patient enough to not abort due to a timeout in itself.

Example:

  [timeout:180]

Element limit (maxsize)

The maxsize setting has one parameter, a non-negative integer. Default value is 536870912.

This parameter indicates the maximum allowed memory for the query in bytes RAM on the server, as expected by the user. If the query needs more RAM than this value, the server may abort the query with a memory exhaustion. The second effect is, the higher this value, the more probably the server rejects the query before executing it.

So, if you send a really complex big query, prefix it with a higher value; e.g., "1073741824" for a gigabyte.

Example:

  [maxsize:1073741824]

Output (out)

The out setting can take one of the four values; default value is xml:

  • xml
  • json
  • custom
  • popup

The values custom and popup require further configuration. Please see details in the output formats documentation.

Example:

  [out:json]

Global bounding box (bbox)

The bbox setting can define a bounding box that is then implicitly added to all queries (unless they specify a different explicit bbox).

The bounding box is written in order southern lat, western lon, northern lat, eastern lon (which is the standard order).

  [bbox:south,west,north,east]

Example:

  [bbox:50.6,7.0,50.8,7.3]

Enforces a bounding box roughly around the German city Bonn, which is at 50.7 degrees latitude, 7.15 degrees longitude.

If a query is URL encoded as value of the data= parameter, the bounding box can also be appended as separate parameter. It has then order lon-lat. This is the common order for OpenLayers and other frameworks.

Complete Example:

  /api/interpreter?data=[bbox];node[amenity=post_box];out;&bbox=7.0,50.6,7.3,50.8

This finds all post boxes roughly in Bonn, Germany.

Attic data ("date")

The date setting lets the database answer a query based on a database state in the past. This is useful for example to reconstruct data that has been vandalised.

It consists of the identifier "date", followed by a colon and then a date specification.

Example:

  [date:"2012-09-14T15:00:00Z"]

This processes the rest of the query as if it were posed on 14th September 2012 at 15:00.

Delta between two dates ("diff")

The diff setting lets the database determine the difference of two queries at different points in the past. This is useful for example to deltas for database extracts.

It consists of the identifier "diff", followed by a colon, then a date specification, and optionally a comma and a second date specification. If only one date specification is given, then the second is assumed to be the current state.

Example:

  [diff:"2012-09-14T15:00:00Z"]

This processes the rest of the query as if it were posed on 14th September 2012 at 15:00, then processes the same query with current data and finally outputs the difference between the two results.

  [diff:"2012-09-14T15:00:00Z","2012-09-21T15:00:00Z"]

Does basically the same, but compares the state of 14th September with the state of 21st September.

Augmented Delta between two dates ("adiff")

The adiff does basically the same like "diff" , but for all elements that aren't contained in the newer result, it is indicated what happened to them.

If an element has been deleted, then its last deletion date is printed and the indication "visible=false". If an element has changed such that it no longer matches the query then its last change date is printed and the indication "visible=true".

Special syntax

Comments

The query language allows comments in the same style like in C source code:

  out; // A single line comment
  /* Comments starting with slash asterisk must always be closed with an asterisk slash. */
  /* But they can span
         multiple lines. */

Escaping

The following C-style escape sequences are recognized:

  • \n: escapes a carriage return
  • \t: escapes a tabulator
  • \", \': escaped the respective quotation mark
  • \\: escapes the backslash
  • \u#### (the hash characters stand for four hexadecimal digits): escapes the respective unicode UTF-16 code unit, see Unicode escape sequences.
    Note that the database encodes characters in UTF-8 on 1 byte (only characters in the 7-bit US-ASCII characters subset in the range U+0000..U+007F) or more. All characters that that are assigned a Unicode scalar value in the standard 17 planes are encoded as UTF-8.
    But this syntax only supports characters assigned in the BMP; excluding surrogates which are not Unicode characters and have no valid UTF-8 encoding; even if they has a 16-bit scalar value. Non-ASCII Characters in the BMP are encoded with UTF-8 on 2 bytes (in the range U+0080..U+07FF), or 3 bytes (in the range U+0800..U+FFFF, minus surrogates in the range U+D800..U+DFFF)
    Unicode characters outside the BMP can be represented in UTF-16 as a pair of surrogates: only valid pairs of UTF-16 surrogates (a high surrogate in U+D800..U+DBFF immediately followed by a low surrogate in U+DC00..U+DFFF) are convertible to UTF-8 and can be escaped as \uD###\uD### (the result of escaping invalid pairs of surrogates or unpaired surrogates is undefined); these valid escaped pairs of surrogates will be converted to UTF-8-encoded sequences of 4 bytes (in supplementary planes 1 to 15) or 5 bytes (in the last valid supplementary plane 16 assigned only for private use, not useful in OSM data as they are not interoperable).
  • There's currently no support for the common escaping syntax \U000##### used in modern C to represent a codepoint in any one of the 17 valid Unicode planes (excluding surrogates), and not even for arbitrary 8-bit bytes with the common escaping syntax \x## (defined in C independantly of the encoding used). As much as possible escaping should be avoided if it's not needed, and valid UTF-8 used directly in requests.