Global Statistical Speed Matrix

From OpenStreetMap Wiki
Jump to navigation Jump to search

GS^2M is a concept of a database containing data stored in records they can be addressed with integer values taken from geographic coordinates with easy formulas. It is meant to be a layer totally separated from OSM vector data never linked with existing ways and nodes, used only for queries during routing process when searching for the fastest way to get to destination.

In first thoughts each record contains information about quasi-square cell (1 second of width and 1 second of length on the map). Each second of latitude is always about 31 meters on the earth surface. Each second of longitude is about 31 meters only near the equator and the more far from equator we are, the less each second of longitude is to finally meet zero on poles. Because of that reason it is necessary to make few tweaks to make database more memory-efficent without losing any important data necessary in routing process. This will be described later what we need to do in order to have 31 meters as an average cell width.

Requirements

First of all, to make it widely usable, GS^2M should meet few requirements like:

Decentralization

  1. everybody can have her/his own GS^2M database created from her/his own NMEA/GPX tracks
  2. database can be merged easily with, for example, my friends' databases without losing any necessary information
  3. then our databases can be uploaded to a public server in order to make this database Global

Privacy

  1. nobody needs to know where I parked my car 3 days ago from 10pm to 7am ;-), but you may need to know that usually because of traffic from 8am to 11am GMT maximum possible speed is 40 km/h when you travel by M1 from Swords to Dublin (fortunately it is not so bad when you travel in opposite direction ;-) that time)

Simplicity

  1. easy segmentation (for example into continents, countries, etc. whatever is necessary to cope with limited memory devices)
  2. easy updating process by merging main public database with recent NMEA/GPX points processed into small update-databases
  3. easy upgrade to variable cell size addressing

Story begins

Basically Global Statistical Speed Matrix is 2-dimensional matrix containing records stored in 360x60x60 columns and 180x60x60 rows. Each record contains data from NMEA/GPX points localized inside a quasi-square cell with 1 longitude-second of width and 1 latitude-second of length.

One record can contain data such as:

  • average speed (actually multiplicative inversed speed value which will be our metric, measured in time units, used to find the fastest way to destination). To make it even better, all speed data in each record can be stored hour after hour in a sub-matrix with 24x7 size in order not to loose important information about daily and weekly periods in traffic.
  • average altitude above a sea level in order to exclude NMEA/GPX points collected when travelling with a plane
  • number of NMEA/GPX points included in these average values
  • heading, in order to have separated speed data about going forward and backward when passing that cell. On weekdays there is a huge traffic in the morning on ways they lead from country side to city center, because majority of people go to work that time. But you can easily travel during that time in opposite directions. Analogically there is a huge traffic on those opposite directions in the afternoon when people come back from work, etc.

Cell adressing

There is no cell addressed with [zero,zero].

Rows

Height of each row on the map is about 31 meters.

Rows are addressed with integer numbers:

  • from -324000 to -1 on southern hemishere
  • from 1 to 324000 on northern hemisphere

Columns

In brief columns are addressed analogically:

  • from -648000 to -1 on western hemishere
  • from 1 to 648000 on eastern hemisphere

But in order to maintain 31 meters as an average width of a column on the map, little bit of complexity is necessary. Starting from North Pole to Equator there will be 13 sections with different number of columns:

rowscolsmax & min latitude valuesnumber of cells
1490°00'00" >= φ > 89°59'59" 4
42089°59'59" >= φ > 89°59'55" 80
2210089°59'55" >= φ > 89°59'33" 2 200
10550089°59'33" >= φ > 89°57'48" 52 500
213150089°57'48" >= φ > 89°54'15" 319 500
742450089°54'15" >= φ > 89°41'53" 3 339 000
21231350089°41'53" >= φ > 89°06'30" 28 660 500
64754050089°06'30" >= φ > 87°18'35" 262 237 500
64338100087°18'35" >= φ > 85°31'22" 521 073 000
1952416200085°31'22" >= φ > 80°05'58" 3 162 888 000
3330232400080°05'58" >= φ > 70°50'56" 10 789 848 000
8303764800070°50'56" >= φ > 47°46'59" 53 807 976 000
172019129600047°46'59" >= φ > 00°00'00" 222 936 624 000

So there will be 291 513 020 284 cells to cover the northern hemisphere, so in total we need 583 026 040 568 cells to cover the Earth this way.

Data processing

Average values

Only NMEA/GPX points with non-zero speed value are included to process the average values in a cell. New n+1 average value is counted iterationally, so there is no need to know n points they were counted before in that cell.

av_new = av_old * n / (n+1)  +  value_new / (n+1);
n++;

However using only simple arithmetic mean formula can cause one issue, that in fact that will be weighted mean value. Tracks with lower speed values will contribute more to the final average value, because there are more NMEA/GPX points left in the same cell, when you travel slower. Knowing that, final formula should look like:

av_new = av_old * n / (n+speed_new/weight)  +  value_new / (n+speed_new/weight);
n = n+speed_new/weight;

where, for example, weight=130.

Heading

Each cell contains average values and n-values for going forward and backward separately.

24x7

Knowing daily and weekly traffic periods might be very valuable in routing process. Nobody would mind to know that if she/he started a trip one hour earlier, she/he would reach her/his destination 5 hours earlier.