Algorithms for QA
This page is collecting ideas for calculating a reliability index for each and every object in OpenStreetMap (and maybe for each of their tags) by looking at the mapping experience of the person that added it.
Algorithm for experienced mappers reliability index
- how long since signup (measure in days, longer than 400 days is minimum to score over 50)
- how many edits per year
- editdays per year
- additions per year
- modifications per year
- deletions per year
- how many types of objects buildings, routes, public transport, shops, opening hours
- specialized in (most frequently edited object type e.g. building=)
More complex and expensive
- analyzing whether things they add or modify are kept or changed by following mappers (days they remain is the measure here. Over 50 is good, over 100 better, +200 is really good, unchanged since edit more that 400 days ago fantastic. Edits after 500 days are discarded from the score because it is most probably not a problem with the data entered but a change in the environment like the Martijn-example or a demolished building or rebuilt road)
- It could also be done according to the field of thing (e.g. this mapper does reliable work with buildings or this mapper is an expert for outdoor routes but does poor work in cities, or is an expert for railways, etc. etc.)
Time since signup is under 400: cap the score to 50 if above.
The result of the algorithm is an int between 0-100 and 100 is most experienced. Test the algorithm on known good mappers like someoneelse, ...
Algorithm for QA score on nodes
- add together the reliability index of the mappers for each change and divide by the number of changes.
Algorithm for QA score on ways
- add together the score on each node and divide by the number of nodes.
Algorithm for QA score on relations
- member score: add together the score on members and divide by the total number of members.
- integrity score: is the relation geographically correct? Use keep right database for this.
- calculate a total score based on the member score, tag score and integrity score. The integrity score weighs 80% and to he other 2 10% each
TODO contact keep right authors to find out if an API for their database exists.
Calculating a tag score
- does the the relation have a type?
- Does the type (e.g. water=lake) usually have a name and does it have that?
- add more inspired by keep right rules
Relation types that usually have a name
water=lake amenity=* ...