Talk:Global Statistical Speed Matrix

From OpenStreetMap Wiki
Jump to navigation Jump to search

Storage capacity

Even if you were to take, say 7 bits for speed (128), 5 bits for number of gpx points and 14 bits for elevation and managed to compress it all down to 5% of the uncompressed size, there would be (for 24*7 time slots and global coverage) over 6 Terabytes of data (if you manage to additionally exclude the seas covering 70% of the globe). Alv 09:37, 17 June 2009 (UTC)

That would be a problem only if roads were everywhere on the planet. But there is majority of the continent area where there is no roads, so there will be no data records addressed in those places, which - I believe - gives us a lot of savings. But few simulations need to be done in order to prove this theory. DrJolo 11:37, 17 June 2009 (UTC)
There is still no problem even if I am wrong with my theory. In that case segmentation into continents would be a must. DrJolo 11:56, 17 June 2009 (UTC)
As an example, using an estimated average width and total length of highways for Finland, they seem to cover roughly 1% of the total land area. Percentage of occupied tiles would likely be somewhat higher. This was in part taken into account in that 5% figure above. If the 95% compression ratio were possible even after dropping the empty records, it'd still be 60 gigabytes. My guess would be less compression (60% -> 480 GB), but compared to the uncompressed Planet.osm of 150 GB it's then possible that it could be roughly of the same size but others would love to hear the results of your simulations/feasibility tests. Alv 12:07, 17 June 2009 (UTC)
There are few ways you can cope with storage capacity issue: segmentation into continents, variable cell width/height starting from 10 seconds on the equator (instead of 1 second suggested before), but I am not sure if putting so much complexity will be a big benefit in routing process, and finally hope, that hard-drive capacity will grow up faster than amount of data uploaded. ;-) DrJolo 00:44, 18 June 2009 (UTC)

Potential issues

Interesting concept. I wonder how places like this: http://www.openstreetmap.org/?lat=51.07582&lon=-0.2035&zoom=17&layers=B000FTF will work out, where you have totally different speed roads running very close together. Also in Tokyo where you have motorway running on top of tertiary roads? Obviously one has to make sure that only car tracks are included. Also, how can you tell the difference between mappers pulling over to decied where to map next, and being stuck in a traffic jam? Will GPX files need to be made with certain "rules" in mind? Daveemtb 12:28, 17 June 2009 (UTC)

It will work out everywhere, but in cells like those, you mentioned above, GS^2M taken alone will be extremely inaccurate. I am rather an optimist and IMHO it will not be a big problem especially if your trip is more than 10 km. Perhaps variable cell size will solve this issue, but this is a story of far far future. Taking average speed from road type together with GS^2M could a solution as well. Mappers pulling over are not a problem as long as you take NMEA/GPX points only with non-zero speed value to contribute an average values. Siplicity rule says that you take NMEA/GPX files as they are - there should be no need to made them with certain "rules" in mind. I believe in rules of statistic, they will take place sooner or later. DrJolo 23:13, 18 June 2009 (UTC)

Speeds by day/time

I'm hugely in favor of a project like this, but I think it's only valuable if it knows average speeds organized by time of day (for rush hour), possibly by day of week (businesses are closed on weekends), and maybe, if it's doable, by something even larger to account for holidays (everyone travels on Thanksgiving). But how would we store these without making the dataset expand to an unmanageable size? I have a couple ideas, but has anyone else thought about this at all? --BigPeteB 15:20, 19 July 2010 (UTC)