The goal is to import the opening hours for all French post offices from the datanova.laposte.fr data source.
Only post offices without opening hours information will be modified, any existing information will be preserved. It's probably the right thing to do, even if manually entered information might be much older than the up-to-date data from laposte.fr themselves...
The data source provides a CSV file with all opening hours for the next 3 months (day by day) for all post offices in France. This means it has more than 1.6 million lines.
Example (there are 90 lines about that same post office, one per day):
(the three "11:45" are max time for sending letters, parcels, etc. This might be useful in a future import but for now it's ignored)
What's great is that the first column is the ID of the post office which is already used in OSM to identify post offices, in the ref:FR:LaPoste=* attribute.
What's not great is that it contains hours for every single date, not rules like "every Saturday, from 09:15 to 12:00". See below.
ODbL Compliance verified: YES
The datanova data is ODbL licensed.
The concept is to run a set of scripts that create opening hours rules from the above, set them into the corresponding OSM objects, prepare changesets by chunks of 100 (geographically close together), and upload them.
- Only set the opening hours on post offices that don't have any
- ... or which still have the hours set by the script in the past
- See Talk:Import/FrenchPostOfficeOpeningHours for discussion about what to do about existing entries that don't agree with the datanova data. For now they are kept untouched.
See the README on github for the full details about the algorithms.
Given that the CSV file contains opening hours for every single day, a script is needed to identify patterns and reverse-engineer the rules from the data.
For instance, based on the 95 lines that are about the 15830A post office, the script outputs this one line:
15830A|AVIGNON REPUBLIQUE|Mo,We-Fr 08:30-12:30,13:30-18:00;Tu 08:30-12:30,14:00-18:00;Sa 09:15-12:00;PH off
Current status (statistics)
datanova: 17364 post offices: 17044 with resolved rules, 320 with unresolved rules.
OSM data: 12620 post offices with ref:FR:LaPoste
10031 opening hours set by this import. 1056 disagreements (skipped). 61 not in datanova (wrong ref?), 230 not ready (unresolved rules)
Log / Schedule
- 2020-10-30: Dfaure started this import idea -- with thanks to JL Zimmerman for the information about the data source.
- 2020-10-31: Dfaure wrote the initial script to parse opening hours and convert to OSM syntax.
- 2020-11-07: Dfaure created this wiki page and started the discussion on the talk-fr list.
- 2020-11-11: github repo created, JOSM upload procedure tested on a single modification
- 2020-11-12: Entries added to Contributors#France, Import/Catalogue. More discussions on talk-fr, more work on the scripts.
- 2020-11-13: davidfaure_bot OSM user created, and Automated edits/davidfaure bot.
- 2020-11-21: Implemented automated upload and update mechanism with local storage for future comparison.
- 2020-11-28: More tags in changesets (including an upstream contribution to osm-bulk-upload), imports for PH=off, then the rest of the imports. Notified talk-fr with a small report
- Add support for collection times (present in the CSV files)
- Heure_limite_dépôt_Courrier => collection_times
- Heure_limite_dépôt_Chrono => collection_times:chronopost (non standard, needs to be added to the documentation)
- Heure_limite_dépôt_Colis => collection_times:parcel (non standard, needs to be added to the documentation)
Data Import Workflow
See https://github.com/dfaure/DataNovaImportScripts for the details.
Each changeset will contain
- comment=Import des opening_hours sur les bureaux de poste n'en ayant pas
- source=datanova.laposte.fr, 2020-11-21 (day of import)
- created_by=DataNovaImportScripts 1.0.0
Every changeset groups updates for a number of geographically-colocated post offices that need an update. Until October 2021, the script was grouping all changes in a 50000 km² area. Feedback from other contributors resulted in changing this to 5000 km² (the area calculation is very approximate).
Since the data source is automatically updated every day, and covers the next 3 months, the ultimate goal would be a fully automated import every week (to detect changes in opening_hours, and new post offices).
For now, however, I am running the update manually about once a month.