TIGER fixup/node tags

From OpenStreetMap Wiki
Jump to navigation Jump to search

The TIGER 2005 import created a huge amount of nodes (~170 million), each of which had four superfluous tags, causing the database to be bloated and to negatively impact users. All of these superfluous tags were removed by January 2010.

Details

An example node has the following tags:

source = tiger_import_dch_v0.6_20070813
tiger:county = St. Louis, MO
tiger:tlid = 100111260:100111261:100111155:100111159
tiger:upload_uuid = bulk_upload.pl-6143e1a9-589d-43a0-9248-e95658773ef4

This same information is repeated on the ways.

source The name of the script that the uploading was carried out with. It's really not important any more
tiger:county Can be worked out from the TIGER county borders, or from the ways attached to the node. It's not important to have on every node
tiger:tlid This is the combination of all the tlid ("TIGER/LINE id") on the ways. Since the reference ids belong to the ways, there's no point in them being kept on the nodes too.
tiger:upload_uuid A reference id of the script and time that it was run. Again, not important to have it on every node.

Effects

So if it's not important, why worry about it? Here's a few reasons.

  • Every editor, new and old, needs to know to ignore them. When you delete a way an editor should also delete the unneeded nodes from that way. They all have to be hard-coded to ignore these tags (although I don't think JOSM does this, come to think of it)
  • Applications need to be adapted to ignore them e.g. [1]
  • Every time someone in the US does a map call, all that data gets parsed and downloaded for no effect
  • Ditto above for the planet generation, downloading & parsing.
  • and so on.

Tiger node tags make up 85.43% of all node tags and take up:

  • 12.97% of the bzipped planet size (805Mb).
  • 34.68% of the uncompressed planet size.
  • 42.20% of the lines in the planet.
  • 31.51% of the parsing time of the planet (based on xmllint --stream).

based on planet data as of 2009-06-17.

Removal

Between September 2009 and January 2010, Frederik Ramm used his woodpeck_fixbot account to remove all of these superfluous tags. See his fixbot log and OSM-Dev post from 21 July 2009 for more details.