Talk:Overpass API/Overpass QL
- The old posts can be found on Talk:Overpass API/OverpassQL
- 1 Attic data ("date")
- 2 Key/value matches regular expression (~"key regex"~"value regex")
- 3 Closed Way
- 4 Broken discussion link
- 5 Valid hours, minutes, and seconds in date() and is_date()
- 6 Reverting and accepting changes
- 7 Regular expression dialect
- 8 How to filter by area ("all of X but only if the surface of X is > 5sqm")
Attic data ("date")
Example: The relation of Lake Nasser with the ID 280282. The changeset for the relations state contained in the first ODbL planet file was opened on 2012-06-02T13:23:36Z and closed on 2012-06-02T13:25:49Z. So querying for the relation for 2012-06-02T13:23:00Z returns nothing while a query for 2012-06-02T13:26:00Z returns the relation for this date. (Why a query for 2012-06-02T13:23:40Z doesn't work but for 2012-06-02T13:24:36Z does although at both times the change was not complete and what the query will return a more knowledgeable person may explain.) Malenki (talk)
- A changeset is not a "db transaction", where updates become only visible to the rest of the world, once you close it. In fact, any change you upload will be become immediately visible for the rest of the world and you can still add further changes later on to the same changeset! For Overpass API, only the object timestamp is relevant. The changeset open/close timestamps can be ignored for the purpose of finding out, if an object is returned or not. Mmd (talk) 17:52, 16 February 2015 (UTC)
If your query asks for an object in a state prior to what this planet file contains the API will return nothing.
- It would return the state contained in the first planet. Due to a DB inconsistency, overpass-api.de will indeed return nothing at this time, but as soon as the DB is rebuilt, you will get the state as in the first planet again (tested on my local instance). Mmd (talk) 18:36, 16 February 2015 (UTC)
Key/value matches regular expression (~"key regex"~"value regex")
Is it possible to simplify the syntax for filters using regexps for keys, but only equality for the value ?
A filter such as:
could be more simply written as
without having to escape some characters in the value and adding the extra "ˆ" and "$" bounds. I think that Overpass could perform itself this transform to the equivalent regexp.
Also, there should exist a way to query names by language (using a simpler mechanism, comparable to the lang() selector in CSS, i.e. using BCP47 language code resolution rules). The query above would become:
Or even more simply (if we don't care about "alt_ame", "old_name", "loc_name" and so on that this simpler form will not match):
And Overpass would transform the key regexp to match the ":languagecode" key suffix.
Still Overpass will infer the regexp, and there is still no support for excluding elements that match the specified name. Effectively the "!=" comparator is still not supported but could be supported using an internally generated difference of two sets, or an union of queries for each key value, depending on statistics:
When only one or or a few selective keys are matching, using an union may be a lot faster than using a difference between two large sets; it will be always faster if the input set has only one key matching the key regexp, or even no key at all in which case the output set will imediately be empty : the Overpass query engine can easily generate for each input set an index of all keys that are effectively used in it, possibly even with a counter for each one in order to evaluate some selectivity independantly of their values).
- Please read this Github ticket on regular expression support. Some of the points you're proposing are already mentioned there but got postponed. There's no more recent status on this, it simply hasn't been implemented due to other priorities/lack of time and/or resources, etc. I would suggest to create additional Github tickets for everything which is not covered in that ticket. To be honest, the :lang("en") looks like a very special case and is not very likely to be adopted at all. Just give it a try on Github. Mmd (talk) 18:14, 14 June 2015 (UTC)
- The lang() feature is very documented for use in XML, CSS, and very convenient for localisation (this includes performing Overpass QL to return data exposed to users on the map or hen clicking on a marker to sho details, ithout having long list of less relevant languages for translations); it handles correct fallbacks using BCP 47 resolution rules, not having to build complex regexps for them, and in case of multiple matches, it uses the most specific one (it acts just like a max() aggregator to return a single value matching the maximum specificity of the selector). And it reduces a lot the volume of data returned and loaded from the net, to be parsed by the client in a more memory-limited environment such as 32-bit browsers or smartphones. It saves also storage resources on the server for the temporary results that can be discarded much faster as the data ill be loaded by clients also faster. — Verdy_p (talk) 12:57, 5 March 2017 (UTC)
There is a link to http://permalink.gmane.org/gmane.comp.gis.openstreetmap.overpass/237 about memory usage. Unfortunatelly the link (Gmane) is broken. Do anybody knows another place to read it? --naoliv (talk) 10:50, 31 January 2017 (UTC)
- http://gis.19327.n8.nabble.com/Overpass-API-Development-f5839267.html, http://listes.openstreetmap.fr/wws/arc/overpass Mmd (talk) 12:51, 31 January 2017 (UTC)
- Unfortunately, the links above are pointing to the full archive. As a result, you don't know what to do if you experience error messages like "runtime error: Query run out of memory using about 2048 MB of RAM.",. If there is a link to the specific thread, I would gladly include a summary here. Zstadler (talk) 06:19, 24 March 2017 (UTC)
- Rephrased for clarity Zstadler (talk) 09:53, 16 October 2017 (UTC)
Valid hours, minutes, and seconds in date() and is_date()
The Date Check and Normalizer section says:
- The hour, if present, must be less or equal to 60,
- The minutes and seconds, if present, must be less or equal to 60.
Can someone confirm that this should be changed to:
- The hour, if present, must be less than 24.
- The minutes and seconds, if present, must be less than 60.
- Actually the seconds may be equal to 60 (or even 61 very rarely) when these are "leap seconds" which may be added on the last minute of the last day of January or December within the UTC calendar time, and consequently in all timezones aligned with a static offset from UTC, including for daylight. (It is also possible that these same days may not have a second number 59 or even 58 exceptionally because one or two seconds may be substracted too on these dates). This occurs to maintain the effective average duration of days so that the observed zenith of the sun will remain within at most 1 or 2 seconds of midday, for the next 6 months.
- As the rotation of Earth is not regular, and Earth rotation is also slightly increasing (very slowly), it is unavoidable: this occurs because the official duration of the second is no longer aligned to a constant 1/86400 fraction of the observed solar day on Earth, but on a constant number of pulsations in coordinated reference atomic clocks. For physics it is extremely important now to have very precise measurements of the second, independantly of the gross irregularities of Earth rotation.
- After some major earthquakes on Earth, there's a sudden variation of the duration of the day, and slight changes of position of the rotation axis. As well other planets (notably the tide effect of the Moon and major eruptions from the Sun) are affecting the Earth rotation, as well as when the Earth passes through clouds of dust or is impacted by meteorites (notably in August) and this also results in slight changes of the furation of the day and deceleration of the Earth rotation (so days are gradually become longer over time, but after some natural events, the rotation may accelerate again and we could have shorter days.
- So a UTC time "23:59:60.00" is valid on 30 June and 31 December and is one second (sometimes 2 seconds!) before "00:00:00.00" on the next Gregorian day...
- If necesssary the UTC standard boday may need to add/substract leap seconds more often (e.g. also at end of March or September) but given the current margins (1 or 2 seconds of difference of time between UTC clock and observed solar clock), it should be extremely exceptional. These adjustments are announced several months before they occur (you cannot predict when these adjustments will be made). — Verdy_p (talk) 21:11, 16 October 2017 (UTC)
- I think the documentation exactly describes the implementation (year should be >= 1000, rather than > 1000). In case you wonder, here's the respective link to the implementation: https://github.com/drolbr/Overpass-API/blob/350cf560956c669171b949b162f239b55936e1a8/src/overpass_api/statements/unary_operators.cc#L283 Mmd (talk) 09:41, 17 October 2017 (UTC)
- @Zstadler: Unfortunately, some of your rephrasing introduced semantic changes and as such no longer correctly represent, how a specific function operates. Also, I wouldn't recommend to change names of some functions (e.g. Union vs. Unique), as this will create inconsistencies between the code and the documentation. I already started fixing them / reverted them to the original documentation. Please note that the part you were editing is automatically generated from the source code. BTW:It would have been much easier to work on the spelling thing first, make much smaller changes. Mmd (talk) 12:31, 17 October 2017 (UTC)
- While a few people read or write the code, many more people read the documentation and use the Overpass QL. The use of "Union" for the u() aggregator is therefore misleading many more people. In the usual sense, a "union" contains many different items, while the u() aggregator produced a single result. Moreover, it returns the diagnostic text "< multiple values found >" when any of its arguments are different. I also doubt the code developers can be confused by the documentation... Zstadler (talk) 17:10, 17 October 2017 (UTC)
- I think it would make more sense to propose this change to Roland, and see if he's willing to change the terminology (=code + wiki), rather than just go ahead and do those changes in the wiki. It's not meant to be restrictive, but rather to avoid confusion in bug reports and conversations further down the road, as devs probably won't have an idea what "unique" refers to in the first place and then need to first dig through all the changes people made to the wiki. Mmd (talk) 17:38, 17 October 2017 (UTC)
Reverting and accepting changes
- Sure: It really only affects the part you were editing, namely chapters 8 + 9. All other chapters don't include automatically generated parts. As an example, please see https://github.com/drolbr/Overpass-API/blob/master/src/overpass_api/statements/aggregators.h . I don't know exactly which script is used by Roland, best is to ask him. Mmd (talk) 12:58, 17 October 2017 (UTC)
I'm not that strict on changes. I do understand what Mmd says and agree with the objectives. It is probably a less-than-optimal idea to change documentation without changing to code because the code may have error messages relying on the notions in documentation. Or the notion may have already been used elsewhere in the wiki.
However, I will not steamroll the content of this page (not even chapter 8 + 9). When there is time for the next release then I will do a diff on the content to the content copied from the source code and merge back to the source code the changes I have found.
In the given case, I suggest to
- leave a comment in the wiki: "Suggestion: rename the aggregator union to unique"
- write a hint in the clear text like "not to be confused with the statement union"
Thank you for the suggested change, it is by the way a good idea. -- drolbr
- I guess this means I've done something potentially problematic, because I've gone to trouble to make a lot of changes in the attempt to make this language guide more understandable, and didn't realize that apparently parts of it *are* automatically generated.
- The rationale is that this page is the specification for what the code should do. I does make sense to have the specification alongside the code that is expected to fulfill that specification. Thus, I welcome better wording here. What I do not want is a reorganization of the page. The text should define expected output and accepted input statement-per-statement.
- I assumed I was in the clear because I read [] which says "Everything that doesn't change code. Examples are a better documentation." If automatically generated however, this isn't true, because the source of Overpass API would have to be changed.
- Anyway, I received a note just now about changing the actual terminology on this page, which is understandable given the depth to which I'm editing this. I realize that making these changes can be disruptive in the sense that I've read the Overpass API QL source code to some extent now, and yes, I realize that all "things" in the QL source tree that I would call a "command" are in the "statements" directory. Anyway, that said, the terminology changes I'm proposing are the following:
- A "statement" is a complete interpreted language structure that begins with one or more commands, and ends with a semicolon. This was already established when I started editing. That said...
- "Standalone query" and "statement" are also used interchangeably to refer to what I would call individual commands to execute inside a statement. IMO a "command" and a "statement" are not the same thing, and mean two different things. Whether or not the source code itself refers to these things as one term or another, a new user to Overpass QL will likely expect and feel most comfortable using terminology they've already seen in other languages.
- Although I hadn't changed all of them I might, I'd like to give an example of the tag query filter, which is referred to as "has-kv". As far as I can tell, has-kv will never appear in Overpass QL interpreted code. It's a reference to what's inside the source code of Overpass API. Similarly, another example I found in here is 'print' when referring to 'out'. (Why does this matter?) A language guide for Overpass QL should not make a user care what functions exist, or what those functions are named, inside the source code that interprets the language, because it can be difficult enough to pick up the language itself to use it - thus the existence of the language guide.
- The thrust of all of this is, I guess what I've been trying to do here is make this Overpass QL language specification into a proper language guide (which is what other parts of the Wiki call it); a document that serves to instruct a new user to the language on its syntax and functions, using industry standard terms for its pieces.
- I've been asked to revert changes that convert the phrases "standalone query" and "statements" when referring to commands, to "commands", and I'll do so right now; that would be my most recent change.
- That being said...there is one phrase used in interpreted languages to refer to a key word used to execute a task inside a statement, and that's a "command". I don't know of another one. In this case, "standalone query" is used as a synonym, and with new (commands) such as 'make' and 'convert' appearing, it seems that these aren't going to be "just queries" anymore in any case. I used the word "command" because as far as I know, it's the only right word to use. A statement is simply something else.
- Your edits are welcome, not problematic. Automatic does not mean that I replace content without warning. The precise process is as follows: Before each release with a new version number (about once to twice per year) I diff the content of this page to the content that would result from evaluating the source code. This results in one large merge from wiki to the source code, like this one (look for the file print.h as an example), followed by one merge back from source code to the wiki.
- The rationale is that this page is the specification for what the code should do. I does make sense to have the specification alongside the code that is expected to fulfill that specification. Thus, I welcome better wording here. What I do not want is a reorganization of the page. The text should define expected output and accepted input statement-per-statement. —Preceding unsigned comment added by Roland.olbricht (talk • contribs) 05:03, 9 July 2018
- (Moved the above only to keep wiki discussion threading consistent) I think I understand a little where you are coming from about page reorganization now. I have done some page section reorganization at this point, as you've probably noticed, but the reasons I've done it are primarily for the benefit of a new user to the language. What my goal has been is to introduce concepts beginning with 'what kind of language it is and what the basic structure is' then 'its pieces, in the order in which a user of Overpass QL is likely to encounter it and want to try it themselves.' That's why the first thing I started with is sets: 'what is this funny .notation everywhere?'. Then global settings, that come at the beginning of code anyway and modify all default settings that follow; then the simplest ideas like unions and probably a query, but I hadn't gotten that far yet. If left on my own, I would probably also add an annotated 'hello world' program early.
- The idea is that I'm trying to let the document serve both the purpose of being a technical reference, and an instructional manual. Probably a few additional readers would want to chime in to say whether I'm succeeding in the change I've made so far...?
- For the wording issue around command: It does make sense to have a term subsuming block statement, statement, criterion, and evaluator; these are the building blocks. Feel free to refer to them as command. However, I would not buy in that command is the one and only notion that may fit. It may be preferable to keep the notion reserved for other purposes. It simply evolved to be statement for historic reasons, and there were no pain to change it so far.
- I'm highly opposed to have homonyms. Most problematic right now is query - it is both the query statement and in use for a full script. That said I'm skeptical about whether we do not want to save a very generic notion like command for later use. Given that subsuming the building blocks is not of high priority, I would like to use less generic notion for it. I suggest building block but I am open to other notions. I would prefer if the concept of setting would fit under that umbrella as well.
- For the subject of being beginner's friendly. This is a larger subject and I would like to have it discussed in a separate section. There have been multiple attemps, like LearnOSM, examples, talks, the blog, and so on to be beginners friendly. None of them got cheered on, thus there is definitely room for improvement. But this specification has a very specific purpose: defining what the software is supposed to do. I should not be unnecessary beginners unfriendly, but I do want to stay focused on that purpose for this document. --drolbr
- If it's okay, since most of the first section is edited already, I'll try to add a kind of simple 'hello world' like program at the beginning, just to get a reader acquainted, but then point them at other pages like the 'QL by example' one if they want more than that.
- Now that https://github.com/openstreetmap/operations/issues/223 is fixed, this is fully editable in Visual mode, which is a big help. Skybunny (talk) 12:34, 25 July 2018 (UTC)
Regular expression dialect
- Currently, POSIX extended regular expressions are being used by default, see https://www.regular-expressions.info/posix.html Mmd (talk) 12:15, 20 October 2017 (UTC)
- Thanks! It seems like POSIX extended regular expressions would not allow the use of Unicode hex values instead of the characters themselves... Zstadler (talk) 14:03, 20 October 2017 (UTC)