Zenbu
Zenbu is a business listings website, containing data for companies operating in New Zealand; the data on the site is user-generated and released under a license compatible with OSM (Creative Commons by Attribution) [1]. They currently (2008-01) have over 40,000 POIs.
This page is dedicated to facilitating the sharing of data between Zenbu and OSM (a two-way process).
Contents |
Data Attribution
a method for attributing the data back to Zenbu, needs to be developed:
The license the zenbu data is released under states that "You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work)."
Some method must therefore be arrived at, for attributing the data, which the operators of the Zenbu website are happy with.
Possible methods include:
- osmarender will print the name of the person who last edited a given way/node, next to it at high zoom levels. this could display 'zenbu.co.nz' for all data originating from the site
- editing the node in any way will remove the attribution and replace it with the name of the editor, which is not acceptable
- adding a new category to nodes, named 'attribution' and forcing the contents of this tag to display alongside the icon. this tag would be locked after creation, only editable by an administrator, or other named person
- when ever a zenbu originating icon is on screen, attribution could be displayed at bottom-right, in a similar way that the various mapping companies (Navteq, etc.) display on google maps, for the different areas of the earth
zenbu tags --> OSM tags
The POIs in zenbu have tags to describe what they represent (police station, takeaway, library, etc.)
These need to have corresponding categories devised in OSM, and a table produced which maps the zenbu tags to the OSM tags. When the periodic import happens, this table will drive the change of tag names
- it appears that Zenbu do not use a controlled, site-wide, consistent tag scheme for businesses, allowing users to fill in the tag field with whatever they deem suitable, resulting in a lot (how many?) of different tags. Therefore the tags used on zenbu will possibly change over time, requiring the mapping-table be updated regularly also.
- several hundred POIs do not have tags at all. these probably need to be either:
- ignored
- updated in zenbu before being exported to OSM
- modified at some intermediary stage, either manually or possibly using a script to guess at what they do from their name (e.g. 'Gilbert's cyclery' should probably be a cycle shop)
- imported into OSM, tagged as being incomplete and manually updated gradually (and possibly exported to zenbu)
to do
Ascertain categories used in Zenbu
Develop corresponding tags in OSM
Create a table to tie the two sets of tags together (assuming they are not identically named)
Zenbu tags - a list of all tags used in Zenbu, with their equivalent OSM tags and keys
additional tags
we may need to add additional tags, to be able to keep track of the data/clear up mistakes later
these may include:
- zenbu_id - the unique identifier used by zenbu, to enable us to keep track of data points which change, and remove the possibility of duplicates
- zenbu_verified - a tag initially set to 'no', later changed to 'yes' when it has been manually checked as correct
- batch index - each batch of imports will have a unique identifier, in case a whole batch job goes wrong they can all be deleted/updated in one go
geocoding
the zenbu data incorporates latitude and longitude values for each POI. these coordinates were arrived at by one of three methods:
- from GPS tracklogs of zenbu users
- the address is added to zenbu and the coordinates derived by geocoding against the LINZ database (which we are free to use in OSM).
- the address is added to zenbu and the coordinates derived by geocoding against Google Maps
The third of these options gives data which is derived from a non-free source i.e. it is incompatible with OSM's license. This represents a significant proportion of the data, which thus has to be re-encoded, either manually or using the LINZ database.
software for importing
the zenbu data is released as kml, gpx snd csv
the data could therefore conceivably be imported with JOSM - what are the practical limitations on the amount of data that JOSM can handle in one hit?
if JOSM is not suitable, a custom script may need to be developed, along similar lines to the ones used for AND and TIGER
the import process
learning from TIGER, and the aborted 2005 import, it would be sensible to break the data into sections and import gradually, looking for errors that may crop up as we go