Talk:Proposed features/Date namespace

From OpenStreetMap Wiki
Jump to: navigation, search

Range separator

For range, wiki say:
When only ranges of years are specified (no month or other details) a single hyphen may be used where the standard excepts a double hyphen.

But ISO 8601 say:
Of these, the first three require two values separated by an interval designator which is usually a solidus or forward slash "/".

--Pyrog (talk) 12:08, 20 December 2014 (UTC)

ISO 8601 has "/" or "--" alternatively (according to wikipedia). Prevalent usage in Openstreetmap is something like "name:1994-2001"
Hence the current "compromise". Because it is part of key-name I am thinking that "/" in date ranges would be awkward and as few as possible special chars should be used. In particular we should resist something like the syntax mess off key:start_date.
  • single hyphen for "year-year" style ranges (because of our own backward compatibility)
  • double hyphen for "yyyy-mm-dd--yyyy-mm-dd" style ranges
RicoZ (talk) 12:45, 20 December 2014 (UTC)

Disadvantages and interpretation

Copying the section that needs clarification bellow so it can be discussed - article page is not meant for discussion.

  • it is impossible to link multiple properties with same period in time without losing consistency
name:1800-1900 = School 1
amenity:1800-1900 = school
Now your software need to check if date ranges are same. What should they do if rages are different?
  1. name:1800-1900 (1a) = School 1 (2a) + amenity:1800-1900 (3a) = school (4a)
  2. name:1799-1900 (1b) = School 1 (2b) + amenity:1800-1900 (3b) = school (4b)
  3. name:1799-1899 (1c) = School 1 (2c) + amenity:1800-1900 (3c) = school (4c)
  4. name:1796-1850 (1d) = School 1 (2d) + amenity:1800-1900 (3d) = school (4d)
How data consumers should use this data?
Should they treat (1) as different values?
Which values in (1) are same? Some? None?
How to use values (3) with respect to (1)? Somehow? Nohow?
What software should think if name timerange (1) is outside main tag timerange (3)? Is there name (2) for other feature for the period (1) or do they undefined or do they unnamed?
If example above is not clear for you, try to repeat your answers for datas with granularity of a day instead of year. Answers will be less obvious. Please define rules in proposal or somewhere at wiki.
  • this namespace schema is only about metadata. It is impossible to say what geometry of object was at moment X. OSM history shows history in OSM, not history of real object.
name:1796-1850 = School will tell you only about name, not type of object or position/geometry over time
amenity:1800-1900 = school will tell you class of the object, not position/geometry over time

Can you please expand the example with verbose values instead of "(1a)" or "(3d)" and clearly separate key/values from example numbering? I mean why you would tag "amenity:1800-1900 (3a) = school (4a)"? It is "amenity:1800-1900 = school".

  • as of linking (logically associating) multiple tags with periods of time/dateranges:
    • if the dates are exactly the same and on the same OSM object you may assume that they are logically linked.
This is wild guess. You cannot say anything even if date ranges are equal. Granularity of dates is always unknown for you.
ok, point taken. For many apps this won't matter - if you want a rendering what something looked like in January 1813 just apply all tags which are valid in January 1813. If you need to prove that some features existed over identical timespans you need something else. RicoZ (talk) 13:36, 23 January 2015 (UTC)
    • if mappers use 1799/1800 inconsistently - bad luck. The various properties of a single object do not always change at the same time so this flexibility may be good or bad.
You cannot say anything about data at all. if there name:1796-1850=ZYX, how you can say if there was name for it at 1795? How can you say if it was unnamed? What does it mean if object have amenity:1800-1900 = school, but name:1796-1850. What you should do about name during 1796-1800? What you should do about 1850-1900?
Lack of data or erroneous tagging is not an inherent problem of this data model. You may assume erroneous tagging and not display anything or incomplete data and render usable data for points in history where sufficient data does exist. RicoZ (talk) 13:36, 23 January 2015 (UTC)
  • geometry: it is possible to have different geometries over time. Draw the geometry of a "building:1929-1964=house" and another geometry tagged with "building:1965-=house", and in principle you can record every detail of the location as it changed over time. It is more difficult in the case that the geometry remains the same and only a part (like garage) of the building is added - the same overlapping geometries which are generally a problem in OSM. For objects with geometry changing over time I would prefer actually drawing overlapping geometries instead of any kind of multipolygon - because that makes it much easier to split/store/recombine objects between historic and main databases.
So? Where it was stated that you should use different geometries?
Might add it, though this is not a guide on historical mapping. RicoZ (talk) 13:36, 23 January 2015 (UTC)

One idea that was crossing my head was to use

  1. "tagname:period:daterange" or
  2. "tagname:period:period_identificator111"+"period:period_identificator111=daterange"

- instead of current "tagname:daterange". It would be much easier to search for ":period:" in tagnames and the use of period_identificators would make it easier to link features together - if indeed mappers would use it consistently. Not sure if I should push this - (1) is just syntactic sugar and (2) might be more appropriately modeled by relations. RicoZ (talk) 13:04, 22 January 2015 (UTC)

It should be clearly stated that without explicit indicator information about equal timeranges cannot be restored only based at "equal" time ranges. Time is continuous, you can only specify it up to some precision. Xxzme (talk) 06:36, 23 January 2015 (UTC)

This is a proposal

This a proposal and should be treated as such. I don't see a healthy discussion about implementation and tagging. The concept was added in 2009 as far as a I can see. Perhaps getting a statistic (one try) would get a better overview. I propose to add a Template:Proposal_Page Template.--Jojo4u (talk) 20:00, 4 August 2015 (UTC)

I tried to get an estimate of usage by doing an overpass query and post processing the results, there seems to be in the order of a few hundred uses - do not recall the details anymore. It is quite hard to get more precise results because often multiple tags with several date namespace suffixes apply to one object. Also did not find any really good way to do the query.
Don't think adding a proposal page at this point makes much sense, it will never be used widely unless historical mapping adopts it for their database and then it is their stuff. Otoh some examples like Adolf Hitler Straße or Istambul exist which are prominent enough and changed names many times to warrant existence and documentation of this. RicoZ (talk) 09:26, 5 August 2015 (UTC)
A good reason to move this to the proposal namespace (not add a separate proposal page) would be to prevent the impression that this is somehow recommended tagging. --Tordanik 11:11, 14 August 2015 (UTC)
There is so much else purely tentative content in the "main namespace" that has never been approved or recommended that overcrowding the proposal namespace with things which aren't even proposed may be worse. RicoZ (talk) 15:38, 17 August 2015 (UTC)
This should not hinder the movement of this page. We can hunt them down one by one.--Jojo4u (talk) 23:18, 22 May 2016 (UTC)
done--Jojo4u (talk) 23:23, 22 May 2016 (UTC)

Detection/Syntax

There should be no key for which statistics via overpass or via taginfo are not possible. Perhaps it would be best to be more verbose. E.g. name:date(1965--1971-12-18) = schoolor name:date[1965--1971-12-18] = school. Taginfo has some key character statistics to play with: http://taginfo.openstreetmap.org/reports/characters_in_keys#rest --Jojo4u (talk) 13:41, 22 October 2016 (UTC)

The []-brackets are also used by traffic_sign=* in the value: Where the traffic sign requires a value, you can supply it after the ID using brackets [value]. The value may contain a dot . as decimal separator and a minus - for negative values.--Jojo4u (talk) 13:41, 22 October 2016 (UTC)

I like the idea to make it easier to parse/regex, noticed that similar ideas are already in use like name:his:1965--1971-12-18 = school and similar ones. But then.. the currently described syntax was also in use long before I wrote this page so I am not sure what the best way forward is. RicoZ (talk) 21:11, 9 January 2017 (UTC)
Age of any scheme can not be more important than its consistency. We already have several examples of getting rid of technically wrong schemes. --BushmanK (talk) 22:42, 9 January 2017 (UTC)
Not sure about the brackets, having them in a value might be a huge difference from having them in the key name and having special regex chars in this place might cause a lot of damage to older software instead of helping? Haven't ever seen anything similar except one example in http://taginfo.openstreetmap.org/reports/characters_in_keys#problem. RicoZ (talk) 20:57, 14 January 2017 (UTC)

Historical routes

Would this way: 6676219 mapping of historical (train) routes (on existed and visable signs) be correct?

This is not a namespace.

This kind of syntax resembles a namespace (XML namespace) because it uses a colon sign as a delimiter, however, it's just a variable suffix. A namespace is not just a suffix - it's a reference to a group of entities (in OSM - keys). For example, language suffix for name keys is a namespace, because it refers to a group of keys with the values in a specific language. While here, date range suffix is a variable value itself. So, it would be better to keep the description technically correct and to stop referring to this syntax as to one which uses a namespace concept. --BushmanK (talk) 17:29, 17 December 2016 (UTC)

It is awful from the point of view of querying and data processing

I'm perfectly aware of "any tags you like" core principle, but there is a requirement for these tags to be usable. I also know that this scheme is already supported by some services (townlands.ie) but I'm wondering, how much of technically redundant preprocessing they have to do to make it usable. Here is what I'm talking about. Normally, every OSM key represents a specific property, for example - one of the names. Value is self-explanatory: it contains a value of that property. While in this case, both key and value containing a value. Existing applications, indeed, should not be confused by it, but only since they just ignore this unknown tag. Many of them are designed to use keys only as a source of information about which property does particular tag represent. Keys are usually seen as members of a finite set. While this scheme turns it into an infinite one and forces applications to do what they haven't been doing before: to parse and analyze keys instead of just comparing them with a known set. Another problem is that it is impossible to query such keys to get every key with a date range without using regular expressions. It dramatically affects the performance and increases the computing power requirements comparing to directly queriable keys. I mean, to process a single tag with data range, we have to do (roughly) the following:

  1. query all keys, ending with a pattern "colon, none or (less than five digits or ISO 8601 date), one or two dashes, none or (less than five digits or ISO 8601 date)"
  2. for each key, extract its name and store it in a variable; extract a start and an end date from a string, store each in own numeric variable; extract tag value
  3. cross-reference the preprocessed keys with a set of known ones
  4. deal with data ranges (for example, query them to find the most recent values of keys)
  5. finally, use keys and values as usual

Looks complicated, isn't it? And what happens if someone will use date range suffix for everything? Imagine an object with name:1953--=Springfield, name:1800--1953=Central town and without name=Springfield. It is completely correct from the point of view of this scheme, but to put "Springfield" on a map, we have to do all that processing above. Who is talking about backward compatibility here?

Actually, the only advantage of this scheme is that anyone can just easily type this manually.

To avoid accusations of not giving an alternative, I could probably say that relation for data range could be close to perfect in terms of querying and usage. For example:

type=daterange
for_key=name
start_date=1800
end_date=1953
value=Central town

An object with any properties (keys), affected by a date range, can be a member of any number of type=daterange relations. for_key= refers to a key, affected by this range. value represents a value of tag, effective for the indicated time period. Date tags are self-explanatory. Any other extensions can be provided, such as date keys in strftime format, for example.

  • With this syntax, affected keys are not modified.
  • It is possible to ignore date range relations by not querying them.
  • It is possible to filter them within a query (comparing date key numeric values with something is way less hungry operation than regular expressions).
  • This scheme is extendable.
  • Yes, relations are a bit more complicated to edit, but that shouldn't be a problem for people smart enough to work with historical data, right?

So, practically, current proposal only makes working with data harder. Anyone who wants to argue - start from writing an Overpass Turbo query that selects all names of all objects for a certain date. --BushmanK (talk) 18:29, 17 December 2016 (UTC)

I'm the developer of Townlands.ie which implements this, and I really can't find these complaints persuasive. This system is straight forward and natural for humans to read (and machines to parse) "name-from-X-to-Y-was-Z", and you're main complaint is "I can't use overpass, and now there are a non-fixed set of tags".
(i) Despite what BushmanK says, it's not hard to implement, only 30 lines of python. The Key:opening_hours syntax cannot be parsed with a regex. If you're worried about "computationally expensive parsing", that ship sailed long ago. If you're consuming OSM data and don't want to parse these tags, then just ignore them! Just like if you don't care about multilingual names, you don't have to parse out the keys which vary by language, dialect, and writing system.
(ii) "Can't query with Overpass" So? You can't do routing with Overpass. Or check opening hours. Or do geocoding. Can you check a polygon for validity with Overpass? The standard for OSM isn't "can be queried with Overpass". This tag shouldn't be dismissed because it's not overpass-able.
(iii) "Let's use relations" Relations are not categories and neither are they "tags". You're suggesting creating one relation, with many tags, and having one object be a member of that relation. To replace a simple, human and machine readable, tag. That's a silly suggestion. Your suggestion cannot handle dates, or ambiguous dates, so you cannot 'just do a numeric comparison'.
(iv) You suggest relations should be usable, but you forget who will be interested in historic names of things. Historians, Librarians, Geneology researchers, Humanities scholars. These people aren't usually less technical than, well, technical people. So yes, we should have something that's easy to use, and easy to type in manually.
Rorym (talk) 12:48, 20 December 2016 (UTC)
30 lines of code is about ten times more than normal. And it is just for names, while the idea of this proposal doesn't seem to be limited to it. So, expecting date range suffix to be applied to names only, it is possible to simplify parsing exactly in that manner I've mentioned in my diary entry on this topic: utilizing a qualifier, such as name:daterange_*=*. Even this will allow easier syntax handling. Your argument about routing is fallacious. Overpass API is a querying tool, and it is normally expected to be able to query OSM keys (not values since we have opening hours syntax) without any additional pre- or post-processing. Routing is not a querying problem. So, this argument is irrelevant. I've mentioned relations only as an example of non-exceptional syntax. But since you prefer to pick on that, I'd like to mention, that my example has nothing to do with relations as categories - it represents a complex property of a single object (so, it can't be, by any means, a category). Another fallacious argument. Calling a suggestion "silly" because I'm using a well-formed relation as an example is a typical attempt to directly dismiss an argument without any credible argumentation (which is a demagoguery), because if you are right, than any other scheme that uses relations in OSM is also "silly". So, your arguments are either subjective or fallacious. I have nothing more to add. --BushmanK (talk) 20:16, 30 December 2016 (UTC)
"30 lines of code is about ten times more than normal."'. You're saying you can do it in 3 lines? Go on. Show me how to implement your proposed alternative format. I don't think you can add that functionality in <30 lines of code Rorym (talk) 15:51, 10 January 2017 (UTC)
@BushmanK: if you're interested in using dates for overpass queries, you should probably join the discussion on the Overpass API dev ML fairly quickly, as your specific use case around this Date namespace thing doesn't seem to be covered yet, see http://gis.19327.n8.nabble.com/Date-ranges-with-Overpass-API-td5886864.html and http://listes.openstreetmap.fr/wws/info/overpass Mmd (talk) 12:37, 22 December 2016 (UTC)
@Mmd, I'm guessing, it's up to Overpass developers - to implement something for this particular case or to ignore it. My point here is that current form of the proposed syntax is exceptional, while fundamental incompatibility of Overpass API with it is just a practical example of how it could make working with this scheme harder for everyone who'd want to query this sort of data.--BushmanK (talk) 20:22, 30 December 2016 (UTC)
There or other ideas for a syntax that might be easier on machines, see also Talk:Proposed_features/Date_namespace#Detection.2FSyntax. But nobody is forced to parse this - every object with a name should have a plain "name" tag while name:daterange is optional and should be used in addition to plain name even if it means the current name is written twice there. RicoZ (talk) 21:17, 9 January 2017 (UTC)
The thing is, limiting date range usage to names only doesn't make sense in terms of data architecture. --BushmanK (talk) 22:39, 9 January 2017 (UTC)
"name" was just an example here, the intended meaning was if "tagXXX:date-" (meaning untill now) is used than "tagXXX" should be provided as well as fallback for those apps that don't evaluate the daterange. Regardless of the daterange syntax we might agree upon this kind of fallback will still be necessary .. will add this to the description. RicoZ (talk) 20:28, 14 January 2017 (UTC)