Talk:64-bit Identifiers

From OpenStreetMap Wiki
Jump to navigation Jump to search

Osmosis

about Osmosis: I have no problems in cuttings files using polygons. Please be more specific about "filters", osmosis-version, ... wambacher 16:00 Feb, 6 2013

Did you try with big node ids? See this message: osmosis will fail when it encounters id too big for int32. --Zverik (talk) 18:55, 6 February 2013 (UTC)
I did - like wambacher - try with Frederiks 64-bit test data set. Polygon filtering (--bp) went just fine, no problems there. Is that code you're pointing to still being used? Or does the size of an int (which the piece of code casts to) depend on the Java VM? --Oli-Wan (talk)
In Java the size of int (32bit) and long (64bit) is fixed. To have 64bit a long must be used. If the source uses int then it's broken. See JVM specification [1] --Stephankn (talk) 21:21, 6 February 2013 (UTC)
Thanks Stephankn, as a C++ user I normally expect integers to have varying widths on every platform...
I just searched the code for uses of the LongAsInt code pointed to by Zverik. In EntityBuilder.java and CommonEntityData.java it operates only on changeset IDs, which will not cause trouble in the near future. The code in BitSetIdTracker.java and ListIdTracker.java, however, seems problematic as it contains various int's and also uses the LongAsInt code. Are those IdTracker implementations actually being used, perhaps only if certain operations are being carried out? As said above, polygon filtering worked fine for me.
Having had another glimpse at the code, it appears to me that DynamicIdTracker is always used whenever an IdTracker is needed. If that impression is true, then yes, the two other IdTrackers are broken (or will be soon), but osmosis will still continue working as it does not use those trackers. --Oli-Wan (talk)
DynamicIdTracker is just a wrapper class for BitSetIdTracker and ListIdTracker. But it puzzles me if osmosis works on the test data. LongAsInt is a possible problem, and as long as it exists and is being used, we can't be sure osmosis will work fine with big ids, even if it passes some tests. --Zverik (talk) 05:13, 7 February 2013 (UTC)
I think the puzzle has now been solved by Brett. If I understand correctly, osmosis 0.41 is mostly 231-proof, the only exceptions being the options --used-node and --used-way. I suggest to rewrite the remark on the wiki page correspondingly, e.g.

Osmosis: Recent versions (in particular, the current version 0.41) will work in most cases, with the following exception: the options --used-node or --used-way will, in general, break, unless the additional option idTracker=Dynamic is supplied. Alternatively, use the GitHub code as of October 26, 2012, or later (or version 0.42, when released), which is expected to be completely 231-proof (and uses the dynamic IdTracker by default).

--Oli-Wan (talk) 11:05, 7 February 2013 (UTC)

How about limits for relation members

We've seen some people importing relations with largely more than 2^15 members (i.e. exceeding the positive 16-bit limit) and even 2^31 members (i.e. exceeding the positive 32-bit limit, for collecting DMT elevation nodes in Brazil). The last import for Brazil caused major problems in rendering tools.

We need to document that the 16-bit limit (strongly suggested for modelizing data) may be inappropriate but we need to discuss how to handle these giant relations (in my opinion this should still stay largely below the 32 bit limit): the OSM API does not restrict such imports. Howeer I don't think it's in out interest to support them. 32-bit may be acceptable in an interim period (i.e. tools should be prepared to support 32-bit limits without failing, even if finally they'll choose to ignore these members, or just keep their relation members and way members).

But the OSM API should now check that the 2^31-1 limit is not exceeded and reject the data as invalid, forcing users to remodel their collections in more maintainable objects. But all relations with more than 2^15-1 members are suspect and should be signaled in QA tools, even if they are valid.

There's a limit on the number of nodes per way (not exceeding a signed 16-bit integer), but for relations we need to enforce it better (at least the 32-bit one, then later, once we've restructured them, by no longer accepting relations with more than 2^15-1 members.

A 64-bit integer may be used in implementations (if they feel this makes their code easier to write, notably on 64-bit native architectures in C/C++), but the extra bits will only be zeroes.

Note that giant relations with more than 2^15-1 members cannot come from normal editors, only from import tools, and such imports normally need discussion and approval: if this had ever occured, we would not have seen such failure occuring blocking imports of minute diffs in various tools (including osmosis and osm2pgsql for renderers or for QA analysers).

Now we've got this data in Brazil: what to do with it ? Redact it to remove it from the dB and then force importers to better prepare their dataset by structuring it better ?

Limits on servers should be reported by the OSM API, enforced and monitored. If ever these limits are approaching, we should follow their progression and reasonnably estimate the time when it will be reached so that we can instruct others to prepare their software (this requires a margin of time of at least 2 years).

For now only the node Ids have reached the 32 bit limit and we want to use 64-bit for them. Next, this will be the turn for way Id's. We're far from reaching this level for relation Id's. But in all cases, we should not need more than 2^15-1 nodes per way, and not more than 2^15-1 members per relation. And no more than 2^15-1 objects per changeset.

As well some limits also exist onb the number of tags, and the maximum lengths of tag keys, tag values, and other metadata (notably edit comments in changesets which are probably too much restricted).

The 32-bit limit on user Id's will not be reached before very long. But as well this limit should be reported by the API, using the "capabilities" query.

There may also exist a limit on the number of versions: it's very possible for a large object to reach the 2^15-1 versions (caused by adding/removing members notably in ways, moving nodes, splitting/merging ways, editing tags) so version numbers should be 32-bit. (if ever this level is reached, anyway the object history will be extremely hard to follow: it could be time to recreate a separate object and removing the old one in the limbo with alml its many deleted versions).

Other limits may exist on the number of changeset of users (we are already largely above the 16-bit limit, and some "users", notably bots such as the redaction bot may already have exceeded the 32-bit limit)

As soon as these limits are reported by the OSM API, we must say that all editors and import tools MUST respect them, and sooner or later these limits will be enforced by the OSM API on data submission (this will cause the OSM API to return errors, and such errors must be documented so that tools will know what to do if this occurs).

I'm not sure that we need to maintain in the database a complete history of objects past some time: they will be archived and possibly all version numbers will be renumbered by looking for them not by exact value but only by a 16-bit truncation mask: we should not keep more than one version with the same 16-bit truncated version numbers, even we do not renumber any version.

We need a long term plan on how to manage these limits: for me 64-bit numbers are only useful for object id's (node, way, relation) and changeset id's, possibly user id's. Everything else should remain within positive 16-bit integers, even if implemntations are internally using 64-bit integers for everything.

Verdy_p (talk) 22:57, 27 March 2017 (UTC)