User:Stephankn/knowledgebase

From OpenStreetMap Wiki
Jump to navigation Jump to search

Some useful scripts and snippets others might consider useful as well

apply partial diff

I use the following to apply minutely diffs to an extract.

osmosis -q --rri --bc --simc --bc --write-xml-change "-" | osm2pgsql -s -a -b "97.33,5.6,105.66,20.47" -U osm -d osm -e 15 -o expire.list -

grouping/clustering PostgreSQL query results

I had the problem to display POI icons on a map where there are too many POIs in a place to place them without overlapping. So I needed a clustering strategy.

As the database was running on a different machine than the web server I looked for a way to do the clustering directly on the database server. I hoped that PostgreSQL would contain already some functions helping me to cluster it. Was either not the case or I did not find them.

Due to the high amount of data I still wanted to do the clustering on the server side. So I came up with my own way of clustering.

A basic requirement for the clustering was to know the number of elements grouped together to be able to give this as a feedback to the user.

Strategy 0: random placement

No clustering is done at all but a ramdom ordering/limiting statement appended to the query. This distributes the icons evenly on the map.

SELECT st_astext(way) FROM planet_point ORDER BY random() LIMIT 500

Good: Very fast
Bad: Icons do overlap, no clustering information available, no deterministic placement of icons

Strategy 1: grid layout

As PostgreSQL is able to do grouping based on string the simplest strategy was to round the coordinate of each element to the nearest node on a grid.
Then using the GROUP BY clause to generate the groups and counting the elements in each group.

To group by a 500 unit cluster it would be this query:

SELECT st_astext(st_makepoint(round(st_x(way)/500)*500,round(st_y(way)/500)*500)) as cluster FROM planet_point GROUP BY cluster

As the width of the grid relates to the zoom level it has to be adjusted based on the current map resolution.

Good: The icons do never overlap as long as the grid is wide enough
Bad: It is visually very obvious that the icons are grouped to a grid.

Strategy 2: modified grid

Drawing explaining clustering problem

To compensate the bad visual experience of strategy 1 I tried various ways to improve it. To get rid of the grid-look I use the st_centroid() of the elements.

This introduces the chance of overlapping icons. The diagram to the right shows the reason:
Some features get rounded to different nodes on the grid. So they do not seem to overlap. But by using its position (the centroid of a single node is its position) to place the icon they result in being close to each other.
Depending on their distance they will overlap more or less. Also using floor instead of round will not solve the problem as this is related to the placement of the icon and not to the clustering.

A possible way out would be for example to use as the icon position the middle between the grid-node position and the calculated centroid. As long as the grid is at least 50% larger than the icon this will eliminate overlap by still preserving a position of the icons as close to the real position as possible.

SELECT st_astext(st_makepoint((st_x(point)+st_x(cluster))/2, (st_y(point)+st_y(cluster))/2)) FROM
  SELECT st_centroid(st_collect(way)) AS point, st_astext(st_makepoint(round(st_x(way)/500)*500,round(st_y(way)/500)*500)) as cluster FROM planet_point GROUP BY cluster) as t

For non-square grid the algorithm needs to be adjusted. In case the icons for clustered POIs are in a different size than regular icons the grid needs to be adjusted based on the largest icon possible. In mixed situations when there are clustered an non-clustered elements visible the same time the map coverage might look "thin".

Strategy 3: clustering in PHP

Do server-side clustering. As PostgreSQL did not cluster well enough the server will do clustering inside a script. It does so by first checking whether the bounding box of a icon would result in an overlap with existing icons.

  • If yes it adds the new icon as a cluster to the existing one.
  • If not it will be stored and the bounding box remembered.

The good thing is that icons can get as close together as their size permits. The position of the first icon is exact, the following icons in a cluster get rounded to that position.
Bad thing is that the processing happens on the web server. So the data needs to be transferred from the database to the web server.

Maybe it's possible to implement this clustering mechanism as a pgsql function then the transfer would not be needed. For the data I was using this was not needed.

Demonstration

I used strategy 2 for the smoking map on Wikimedia toolserver. For the cuisine map the strategy 3 is used. Source code is available and linked on the index page of my toolserver account.

Counting the different values of a tag

Do you want to know how many different values for cuisine exist? Easy with HSTORE. It's poor mans tagwatch:

SELECT count(*) as c, cuisine FROM (
  SELECT lower(tags->'cuisine') as cuisine FROM planet_point WHERE tags ? 'cuisine'
  UNION ALL
  SELECT lower(tags->'cuisine') as cuisine FROM planet_polygon WHERE tags ? 'cuisine'
) as t GROUP BY cuisine ORDER BY c DESC;

Cleanup of ways outside the bounding box

Importing data using a bbox with osm2pgsql leaves ways and relations in the tables needed for the slim mode. I thought about adjusting the code of osm2pgsql to include a check. Until done with the following query the ways and relations get cleaned up. Ways near the border of the bbox might not represent the actual DB as there is the possibility to miss some changes. Cleaning up relations does not distinguish between ways and nodes, so some relations left.

DELETE FROM planet_osm_ways AS w WHERE 0 = (SELECT count(1) FROM planet_osm_nodes AS n WHERE n.id = ANY(w.nodes));

DELETE FROM planet_osm_rels AS r WHERE 
  0=(SELECT count(1) FROM planet_osm_nodes AS n WHERE n.id = ANY(r.parts)) 
AND
  0=(SELECT count(1) FROM planet_osm_ways AS w WHERE w.id = ANY(r.parts));
REINDEX TABLE planet_osm_ways;
REINDEX TABLE planet_osm_rels;
VACUUM FULL;

REINDEX and VACUUM FULL will finally reduce the database size.

calculating metatile path

#define METATILE (8)

#include <stdio.h>

int xyz_to_meta(int x, int y, int z)
{
    unsigned char i, hash[5], offset, mask;

    // Each meta tile winds up in its own file, with several in each leaf directory
    // the .meta tile name is beasd on the sub-tile at (0,0)
    mask = METATILE - 1;
    offset = (x & mask) * METATILE + (y & mask);
    x &= ~mask;
    y &= ~mask;

    for (i=0; i<5; i++) {
        hash[i] = ((x & 0x0f) << 4) | (y & 0x0f);
        x >>= 4;
        y >>= 4;
    }
    printf("%d/%u/%u/%u/%u/%u.meta", z, hash[4], hash[3], hash[2], hash[1], hash[0]);
    return offset;
}

main () {
  xyz_to_meta(203181, 117122, 18); 
}

Bilingual map rendering

Bilingual names rendered for Khon Kaen, Thailand

I modified the rendering to render two languages at the same time. It looks similar to the way that's common with other maps like google. You can get some sources from my download space: http://downloads.osm-tools.org/bilingual/

A nice example of the rendering is Khon Kaen shown to the right. A up-to-date map of Thailand featuring Bilingual/English or Thai labels is available on my server: http://thaimap.osm-tools.org/