Talk:TIGER fixup/250 cities

From OpenStreetMap Wiki
Jump to navigation Jump to search

Discuss TIGER fixup/250 cities

Any features that editors could implement?

Are there any features that editors could implement which would help this? --Richard 13:24, 17 July 2009 (UTC)

A common cause of routing failures on the interstates is duplicate nodes. I guess potlatch doesn't flag duplicate nodes as problem particularly. Neither does JOSM except as a validator plugin rule. My WayDownloaderPlugin also makes fixing these quick and easy. A problem with this though is that many duplicate nodes are part of larger mess along a county border (TIGER fixup#Connectivity along county borders) where it might actually not be terribly helpful to quickly fix one dupe node on an interstate, whilst ignoring all the other duplicate nodes. Often these form part of an entire duplicated way running along the county border, so that should all be tackled all at once really. To help with that, I was pondering a new feature of my plugin and/or a new validator plugin fixable rule, to detect the case where an two ways sit exactly on top of each other, with every node a duplicate.
I'm thinking in terms of rapid automated/semi-automated fixup, but I know Matt & Andy have been tackling the interstates very fastidiously, so actually eyeballing the aerial imagery fixing cases where, for example, these county roads are not even really connected to the interstate.
I got frustrated with doing this with JOSM because the WMS plugin can be very slow to fetch the imagery, so not great for skipping along an interstate to look at new areas all the time. Potlatch seem to fetch Yahoo imagery a lot faster in general.
-- Harry Wood 12:08, 18 July 2009 (UTC)
Indicating oneway roads on the main view - it's a pain to click on every motorway_link, check that it has oneway=yes, then check that it points in the right direction. Gravitystorm 11:10, 21 July 2009 (UTC)

Success really isn't

Routing from Elk Grove, CA to Akron, OH claims to succeed. But it needlessly takes you through Southern CA adding ~250 miles to the trip [1]! It should just hop on 80E and go.

So, while this one appears green, I'd suggest that it actually failed. What about putting a ceiling on some of these routes?Bron 06:31, 22 July 2009 (UTC)

Didn't start this but the idea seems to be that
  1. likely routing to some or several intermediate destinations on that better route isn't working yet - once it is, the detour will be avoided
  2. Once most routes are working, it could be possible to compare the route lengths to the bee line lengths and to routes to adjoining destinations to find the detours.
In your example, the problem seemed to be on the county border between Sweetwater and Carbon - duplicate nodes as left by the import. The carriageways would still need to be separated to parallel ways, though. Alv 08:09, 22 July 2009 (UTC)
Yeah. We know green doesn't mean perfect routing (See TIGER fixup/250 cities#FAQ) On the plus side, it should be relatively quick and easy to get this whole grid turning green right? That's the first aim, to create a sort of "minimal" network between the cities. That's a good step, because once you can route between cities, it's easier to see where the routing is broken. We are also pondering enhancements to the analysis, to pick up on where routes are wildly wrong. -- Harry Wood 08:57, 22 July 2009 (UTC)

Routing to wrong city...

The routing to Mesquite, Texas is going to the wrong city. We should be routing to the suburb of Dallas (32.77, -97.00), not the ghost town in west Texas (32.87, -101.63). FYI. 25or6to4 18:41, 22 July 2009 (UTC)

Fixed, will be correct in the next update. I've also upgraded the Mesquite node in OSM from hamlet to city. If you spot any more like this then put them here. Gravitystorm 11:19, 23 July 2009 (UTC)
St. Paul, MN is in the wrong spot as well. 25or6to4 18:06, 28 July 2009 (UTC)

Making this simpler

I had a look at this last night and have a number of observations:

  • The munged HTML file is huge and its very difficult to get an overview (I wanted cities which were all red). Would be much easier if presented as a graphic as well (appreciate that an OpenLayers app is too much now). A GPX of the routes would be helpful to generate something like this: could it be munged in the same way as the HTML?
  • There is in effect a huge amount of duplicate data in the matrix. Concentrating on a graph based on near-neighbours would focus the work directly where returns are highest.
  • I looked at Albuquerque NM, and all the connectivity problems I encountered were duplicate (or unconnected) nodes on county borders.
  • Building route relations on the non-connected Interstates might be a quicker way to find the gaps than an initial step by step search. But, see next comment.
  • I don't understand how to build, order and use the ordering feature of relations, and there is little help in the wiki. Can someone who does understand them write something (preferably a how-to) on editing the order of members in a relation?
  • It's a real pain that county boundaries disappear at higher zoom levels!
  • Beware link roads connected to end of one way, but the main way not (i.e, check end nodes of both ways).
  • CloudMade routing does not give an indication of how current the planet data is. This is important for debugging!

All in all there's a lot of work here and this is just a tiny part of the interconnectivity problem. A good example of why OSM works: I had previously had no idea how ropey the data for the US is. SK53 09:05, 23 July 2009 (UTC)

Wow, lots of stuff in there, let me answer some of it. The HTML is huge but you can zoom out in firefox to get an overview. It'll be much more obvious in a week or two when all the sub-groups get connected since then a town will either be mostly connected or not at all. Focussing on high returns will get mappers stumbling over one another - we can all spread out and work in parallel. I had a look around Alberquerque and there's also a lot of stuff that needs dual-tracking too and the alignment fixing. Route relations are completely irrelevant to this, trust me. ACK on county boundaries disappearing and CloudMade not showing update days - generally updated either Friday or Monday following the planet dump. Gravitystorm 11:25, 23 July 2009 (UTC)
Albuquerque was so horrible I thought I'd just try and fix connectivity rather than anything else. Complex interchanges etc., need a bit more thought and effort. As I-25 is a road I know I thought I'd follow that, and its got non-connected nodes at county boundaries all the way to Wyoming. I thought that putting the relation in would enable me to use relation browsing and analysis tools to get to these locations more rapidly: but ran into a whole set of other issues (like ordering, which I need to know about some time). Now looking at Cape Cod and the duplicate node boundary problem applies at the township level. SK53 12:58, 23 July 2009 (UTC)
I wouldn't worry about the ordering of relations at the moment as nothing takes it into account yet. The only editor that allows you to change the order relatively easily is JOSM, in fairly recent releases. The TIGER fixup is pretty complex, hence why you need humans rather than scripts. Smsm1 11:15, 24 July 2009 (UTC)

Also applies to MassGIS import. SK53 09:11, 23 July 2009 (UTC)

Yep Gravitystorm 11:25, 23 July 2009 (UTC)


When and how often will this list be updated? 25or6to4 18:06, 28 July 2009 (UTC)

It'll get updated by me, Matt and Harry, whenever the CloudMade routing service is updated. The scripts are running now - they take several hours to complete but we should have everything done by around 11am BST tomorrow. I'm hoping they'll be more prompt next week. Gravitystorm 19:33, 28 July 2009 (UTC)
Ok thanks. Did they get the other incorrect city changed (St. Paul, MN)? 25or6to4 20:11, 28 July 2009 (UTC)
Any chance of an ETA for this week's planet data? SK53 13:37, 30 July 2009 (UTC)
Usually changes made before the Wednesday planet.osm dump, will show up within the CloudMade routing the following Friday, but this can overrun due to technical difficulties. So ETA is some time tomorrow -- Harry Wood 13:58, 30 July 2009 (UTC)

What Next?

As of 28-Jul-2009 this is now over 90% interconnected, although with plenty of strange routes. Here are the obvious ways to extend this approach:

  • Some kind of simple visualisation if required to help locate the most egregious routing errors, so that they can be fixed rapidly.
  • A few cities remain unconnected, but the most obvious problems are in the Massachusetts area. Here duplicate nodes occur at the borders of cities/towns rather than counties. Many roads with separated ways are tagged with oneway but the direction of both carriageways is often the same (e.g., Highway 128).
  • Despite
    US counties with at least one city > 5000 population connected to near neighbours
    this progress, the ability to interconnect and route between major centres of population masks huge deficiencies in the TIGER data. Using geonames data I have found all US counties with a centre of population over 5000, chosen the largest city in each such county and have then found the 5 nearest neighbours (using Postgres 8.4 with Window functions). At the moment I have used this to generate a .gpx file with just under 8000 routes. A simple minded visualisation is shown at the right. If the this data could be used against Cloudmade's routing engine, it would be possible to identify routeable & non-routeable combinations. The data set can later be extended to use population centres of 1000 and over, or just all US population centres.

Will possibly add more on my user page. GPX file available by email to SK53_osm at SK53 19:06, 4 August 2009 (UTC)

  • What about repeating the same algorithm, but for the smallest 250 villages/towns? Or for four smallest of each state to make them evenly spaced out. Connecting remote parts to each other is IMO likely to connect other locations nearby, too. Alv 15:54, 17 August 2009 (UTC)
  • Yes, as long as you can choose a list of places with co-ordinates, it's possible to generate the relevant .gpx file. I just haven't got round to following through the next step. I only chose cities with population over 5000 because it was quite a small download from geonames. My ideal would be to cover every county in the lower 48, using say the county seat. This would ensure the elimination of many of the unroutable issues, which in turn might then make the routing oddities stand out more (funny oneways, ramps not connected properly ...). SK53 15:53, 20 August 2009 (UTC)
  • Connecting all counties would be a good goal for the next step. As we already have a basic interstate network, it would be sufficient to connect each county seat with Washington, D.C. Having achieved this, basic routing is enabled for all streets in the U.S. As a further step, we should try to enable foot and bicycle routing which requires to connect many rural roads. --FK270673 17:09, 20 August 2009 (UTC)
  • The idea of nearest neighbour interconnectivity is that it's much easier to identify what to fix. With some of the 250 city routes it took ages of walking along with the CM router to tie down the issue: and I think you'd get the same with connect everything to Washington. You'd also get the problem of steering everyone interested to potentially fixing the same problem. I've worked along the borders of the counties of NW Colorado JUST fixing the duplicate nodes, and it appears that routing works pretty well from any random point to any other random point in this group of counties (and those over the border in Utah and Wyoming). By fixing along a county line it should fix cycle and (within reason) walking routing, whereas just fixing the interstates will still leave funny routes particularly when the interstate is the only way to move between adjacent counties. SK53 16:47, 24 August 2009 (UTC)

With there being just a few (8 as I type) dupe nodes remaining in the continental US... What is next? While I agree that routing from county seats would be good; I believe we should finish out the interstates by confirming reverse routing sanity. There should be few times when the routing between A and B has a difference of more than the lesser of 5% or 25km when compared to the routing from B and A. Mythdraug 16:29, 15 September 2009 (UTC)

As you'll have seen, we've set up some details on TIGER fixup/250 cities/duplicate nodes including looking at duplicate nodes on other types of highways (lots) -- Harry Wood 18:54, 10 October 2009 (UTC)

What about making this a world-wide project. Connecting all the captial cities to start with...? GercoKees 13:37, 10 October 2009 (UTC)

In general routing is only a widescale problem within the messed up U.S TIGER data. You can of course test out routing between any other cities on You'll find it's a lot better elsewhere (assuming you're looking somewhere where we have data) with only the occasional connectivity glitches -- Harry Wood 18:54, 10 October 2009 (UTC)
I have been correcting some duplicate nodes in China and found realy a lot of errors. This made me think that make this a world-wide project is a good idea. When testing for example this route one sees that there is still a lot of work to do. The problem there is a lot of unconnected motorway_links, duplicate ways and so on.. GercoKees 16:08, 11 October 2009 (UTC)
But your example would work except it fails to find a road close enough to the 'B' destination (zoom in) I'm sure there's one or two other problems with routing there, but it's nothing like the totally broken routing we saw in the TIGER data. -- Harry Wood 18:34, 11 October 2009 (UTC)

Tortuous Routes

Just curious, what is being used as the definition for this? Looking at the grid my guess is it is the routes in yellow. But, what metric is being used to make that dertermination? Is it that the route is xx% higher than a straight line distance? One thought I have had when looking at the interstates is to check for instances of the distance for the route from A-B being different than the distance from B-A. While I am relatively new to OSM, and just found this cleanup project late last week, I've found a number of instances where the reason for odd routing is the dual carriageway is a single line with oneway:yes set.Mythdraug 19:13, 17 August 2009 (UTC)

Yup, that's just one of the problems. The most usual is that there is not a dual carriageway and the nodes are not joined at coznty lines. Harry has produced a nice map of the one's still to be fixed, but I've gone and lost the link! When a single way exists for a dual carriageway its a bit tedious to fix the whole thing, especially if you want to keep it useful whilst fixing it. I usually leave one of the carriageways without a oneway tag until I've done the freeway from one end of the county to the next, and also use little stubs to connect the new way to the old way so that routing still works in both directions. SK53 15:53, 20 August 2009 (UTC)
Yeah, I've been practicing converting a single way interstate to dual out near Albq, NM. My approach to keeping it routable has been to remove the oneway tag. Then as I add the second way, I'll add it back in. If I need to end an editing session early; I'll snip the original way, add oneway back to the dual side and join both paths to the single !oneway. Advantage of practicing in NM is I have few junctions to deal with and long sections of straight road.  :) Mythdraug 16:08, 20 August 2009 (UTC)
I've linked the duplicate nodes map on there now, we need to rewrite this page a bit to reflect next steps. You guessed it though. We're now showing yellow grid results where routes distance is more than 50% longer than straight line distance. I'm going to modify my routability map to show some routes in yellow (since it's uselessly showing everything green at the current time!) 50% is based on testing it out in well-connected Europe. It's just a crude measure really, but it gives us some fun targets to work towards. Obviously what really need to happen is TIGER fix-up EVERYWHERE! (with particular chaos to sort out along county lines) but let's not boggle our minds with magnitude of the problem :-) We're also pondering visualization of progress sorting out carriageways and oneways. -- Harry Wood 18:28, 20 August 2009 (UTC)

Taking it to Brazil

Hi, there! We are discussing about of having a project like this in Brazil. It's possible to have a version of the script that generate the table of conections? Thanks! vgeorge 23:45, 4 November 2009 (UTC)