Talk:OSM Mobile Binary Format

Endianess? --Skinkie 09:14, 8 August 2008 (UTC)

Expandable

The format should be expandable. So I suggest a TIFF-like format. That means, the binary data is divided into sections which roughtly have this format:

  id
  length
  data

id could be something like the 4-character tiff id, or an unsigned word, or whatever. It doesn't really matter. Length is needed so that an application that doesn't understand (or care) for this data block can skip it. And data is the actual data.

The id would stand for different things, e.g.

point data
way data
area data
POI data
restriction data

Indexes

For spatial data, we should think about using an index, e.g. Quad-Tree, Kd-Tree, or something similar.

Application Specific Structures

I would propose the following; pretty C specific format. It is based on the assumption mmaping is available for files, and the OS in efficient in doing so. In principle basic structures are defined:

struct {
  lat: integer;
  lon: integer;
  name: pointer; /* offset */
}

char array names;

In the above example all points can be serialized into one or two files and do very efficient lookups. There is completely no overhead. All information is required. Indices can be made over positional structures in order to look them up efficiently.

So you propose some [lat|lon|name-id]? I guess that is for POIs with names? The feature of a table of names referenced by their number would be indeed nice as names (or other strings) are repeated often. --MarcusWolschon 09:37, 8 August 2008 (UTC)

It would be even better if the name pointers are shared for *any* duplicate names, hence that would make it non-editable (duh!) and highly efficient. Name-id is maybe a bit vague, because technically you don't need an id, but for converting back and forth you will need to maintain information, thus an id. A pointer could be just an index to another array, and in other languages you would call it an offset :) It might be just \0 terminated, or as many databases do, prefix with the length of the field. --Skinkie 12:02, 8 August 2008 (UTC)

Protocol buffers?

Binary data storage/transmission format, with optional fields, parseable without needing to know every field, extensible without breaking old code, etc...

http://code.google.com/apis/protocolbuffers/docs/tutorials.html

I think this is too generic for us. --MarcusWolschon 12:34, 8 August 2008 (UTC)

I mean to define an OSM format with, it would still require discussing what format of data to put in it, then just helps with compacting the data structure and making it extensible. Ojw 17:40, 8 August 2008 (UTC)

We need not only compact the data but also index it while taking special care of topology (connected roads, nodes being included in shaped areas) and names. This can quickly become very non-generic but efficient. Since the basic data-primitives are known and never change I too would like to be as generic as possible with the name-value -pairs on ways and nodes. I would hgowever not like to be generic where we never anticipate change (e.g. allo to introduce new data-primitives next to ways, nodes and relations). --MarcusWolschon 06:24, 11 August 2008 (UTC)

Open Mobile Map Format?

I am working on binary data storage format for mobile device which I name it as OpenMobileMap format --vgps 17:38, 16 March 2009 (UTC)

http://www.openmobilemap.org/20090316/data-structure-design-for-vector-map

Why 8 bit for a dirIndicat? --MarcusWolschon 12:20, 16 March 2009 (UTC)

dirIndicator = 0 means this is two ways road, = 1 means this is one way road. I don't want to use boolean data type so I use 1 byte (8bit) for dirIndicator. I just updated the data structure to add more detail, you can take a look at the above link --vgps 09:24, 17 March 2009 (UTC)

You seem to support only one, fixed and pre-computed metric. I guess for a restricted mobile app a possible compromize to size. --MarcusWolschon 12:20, 16 March 2009 (UTC)

The data structure is special design for mobile device with limited screen size, memory and CPU therefore it much be in integer format and only support one metric.

I do not see any kind of index in your structure. Having variable sized records this means to scan throu maybe a few GB on a slow SD-card to load a single polyline. Without fixed size records you cannot skip so even ordering all your elements in Z-order or a similar room-filling-curve will not help. --MarcusWolschon 12:20, 16 March 2009 (UTC)

index to specific country/city/point/polyline/polygon/routing node object will be index/pointer of one-dimension array of country/city/point/polyline/polygon/routing node. So I leave it for developer to store this data structure in memory card and implement which way they want to retrieve from memory card. In the worst case, if developer do not know how to implement index, they can just write one object in one file. (Best case is to group number of objects into one file). --vgps 09:34, 17 March 2009 (UTC)

If you leave the index to be application-specific then multiple applications will need to generate their own index (thus you cannot just start the application with any map in your format given removing the advantage of a common format) and if the file is later modified by one application this will trash the index of all others. --MarcusWolschon 08:01, 17 March 2009 (UTC)

Required Tags

What tags do we need to import into our format? (please add)

way (routable)

highway
name(required, may be empty)

way (geographic feature)

landuse

way (administrative feature)

name
boundary

node

ele (optional)?

relation

everything for house-numbers

Coordinates

How to represent coordinates?

mapping lat-lon to (int32) -INT_MAX/2 - INT_MAX/2 ?
mapping lat-lon to (uint32) 0 - INT_MAX ?

I prefer the signed version as we do have negative degrees on this planet and thus it's easier to understand. --MarcusWolschon 10:12, 8 August 2008 (UTC)

For the longitude, we should use the "Daggett angle representation", where 360 degrees = INT_MAX. Because computers use 2's compliment aritmetic, the binary representations of -180 to 0 signed are the same as the representations of 180 to 360 unsigned. For example, 355 degrees has the same representation as -5 degrees, which is convenient because in longitude terms, they are the same place. If you just add the integers and ignore overflows, the right thing always happens. If you tell the complier that the ints are unsigned, I believe it will ignore the overflows, and do the right thing. If you tell the compiler the ints are signed, it's likely to raise overflow exceptions or errors.

A second convenient property of the Daggett angle representation for us is that the higher bits correspond to the x tile coordinate on the slippy map. If I were designing the format, I'd project the latitude with a simple spherical mercator projection, so that it also corresponded to the tiles in the slippy map, and I'd make the protocol fetch data in areas equivalent to the slippy map tiles. --Rjmunro 15:20, 18 August 2008 (UTC)

a) What integer-length do you suggest? 32bit-unsigned, 16bit-unsigned, 16bit-signed,...?

b) I don't think this works well once you try to use the coordinates in common calculations (distance, projection, great-circle-distance, ...) that need degrees or radiants.

--MarcusWolschon 11:09, 19 August 2008 (UTC)

a) 16 bit is useless - only gives 610m accuracy. 32 bit gives better than 1cm accuracy at the equator, more at higher lats. I've tried hard to explain that being singed or unsigned makes no difference. Unsigned has the advantage of usually do the right thing and not causing (or just swallowing) overflows. Signed has the advantage of being closer to what we use in normal life, but the binary is the same for both representations. E.g. with 8 bit, it doesn't matter weather 11000000 is 192 or -64 - they correspond to an angle of 270 degress and -90 degrees respectively, which is the same angle. (convert by multiplying by 360 and dividing by 2^8 i.e. 256)

b)It is a standard way to measure angles with integer arithmetic. There should be libraries that work with it. If you want to convert to floating point, just divide by (2^N)/360 for degrees, or (2^N/2pi) for radians.

--Rjmunro 13:40, 21 August 2008 (UTC)

Text-Search

How can we index names and maybe even adresses for efficient searching? --MarcusWolschon 10:32, 8 August 2008 (UTC)

The specification might require different file formats for storing node data und text data (like street and city names). Since the user will want to search for objects on the map based on text information, a tree-like structure for text data might be appropriate. --Nitegate

I remind that the names are also required for rendering. --MarcusWolschon 10:35, 8 August 2008 (UTC)

Yes, that's right, but storing text values together with node values will make them impossible to find. I assume that most cities have the same street names, so storing all street names in a separate file (tree-like format maybe) and assigning an ID to every street would solve the problem. In your node data you simply store the ID of the street name and not the text. --Nitegate

Splitting large maps

Split the map into pieces of one tenth of a degree (0.1) in both directions. Create new nodes for ways that cross tile borders. Store with each tile the top left latitude and longitude coordinates together with the file offset position of the tiles around this tile. This approach let’s you easily find your position on the map and allows easy panning and zooming. --Nitegate

I would prefer to split by tileid. This alignes much nicer with the pre-rendered tiles for programs that do not render themself. The tile-id=file-name already identifies the coordinates of it's corners and the content is small enough for a granularity. I suggest not to add new nodes but to have a reference to the tile with the next/prev node on tile-borders. This way you know that the way extends bejond the current tile and where to look. --MarcusWolschon 11:27, 8 August 2008 (UTC)

Splitting by slippy map tilenames seems as convenient as any other scheme. It's used in tile data server (not just for generating images, but for vector-data programs and routing programs too) Ojw 12:16, 8 August 2008 (UTC)

I think we have to forget the tiles that are used in the rendered map. Splitting the map based on latitudes and longitudes would make it easy to find a (binary-)tile in a large file. First you would jump from tile to tile in the latitude direction and then in the longitude direction (or vice versa). We have to assume that we will have to render the map ourselves on the mobile device and not use pre-rendered tiles. --Nitegate

I can live with any of these two. I agree that the usual case wil be local rendering but I also agree with Ojw; we use tiles in many other places to do this and it would thus be consistent. Both take nearly no time to compute. --MarcusWolschon 12:37, 8 August 2008 (UTC)