Import/FrenchPostOfficeOpeningHours

From OpenStreetMap Wiki
Jump to navigation Jump to search

Goals

The goal is to import the opening hours for all French post offices from the datanova.laposte.fr data source.

Only post offices without opening hours information will be modified, any existing information will be preserved. It's probably the right thing to do, even if manually entered information might be much older than the up-to-date data from laposte.fr themselves...

Import Data

Data Description

The data source provides a CSV file with all opening hours for the next 3 months (day by day) for all post offices in France. This means it has more than 1.6 million lines.

Example (there are 90 lines about that same post office, one per day):

15830A;AVIGNON REPUBLIQUE;2021-01-23;09:15-12:00;;;;;;;11:45;11:45;11:45;;09:00-12:00;;;;;;;

(the three "11:45" are max time for sending letters, parcels, etc. This might be useful in a future import but for now it's ignored)

What's great is that the first column is the ID of the post office which is already used in OSM to identify post offices, in the ref:FR:LaPoste=* attribute.

What's not great is that it contains hours for every single date, not rules like "every Saturday, from 09:15 to 12:00". See below.

License

ODbL Compliance verified: YES
The datanova data is ODbL licensed.

Import Type

The concept is to run a set of scripts that create opening hours rules from the above, set them into the corresponding OSM objects, prepare changesets by chunks of 100 (geographically close together), and upload them.

Import Strategies

  • Only set the opening hours on post offices that don't have any
  • ... or which still have the hours set by the script in the past
  • See Talk:Import/FrenchPostOfficeOpeningHours for discussion about what to do about existing entries that don't agree with the datanova data. For now they are kept untouched.

See the README on github for the full details about the algorithms.

Data Preparation

Given that the CSV file contains opening hours for every single day, a script is needed to identify patterns and reverse-engineer the rules from the data.

For instance, based on the 95 lines that are about the 15830A post office, the script outputs this one line:

15830A|AVIGNON REPUBLIQUE|Mo,We-Fr 08:30-12:30,13:30-18:00;Tu 08:30-12:30,14:00-18:00;Sa 09:15-12:00;PH off

Current status (statistics)

datanova: 17345 post offices: 16832 with resolved rules, 513 with unresolved rules.

OSM data: 11965 post offices with ref:FR:LaPoste

9546 set because empty in OSM, 1 to be updated, 183 only missing 'PH off', 1057 disagreements (skipped), 4 agreements, 0 skipped because modified by a human, 799 not in datanova (wrong ref?), 361 not ready (unresolved rules)

9730 objects modified in total

1538 file(s) created, to upload changes to 9730 object(s)

You can view the diff of the OSM XML here (before the splitting in separate changesets).

Log / Schedule

  1. 2020-10-30: Dfaure started this import idea -- with thanks to JL Zimmerman for the information about the data source.
  2. 2020-10-31: Dfaure wrote the initial script to parse opening hours and convert to OSM syntax.
  3. 2020-11-07: Dfaure created this wiki page and started the discussion on the talk-fr list.
  4. 2020-11-11: github repo created, JOSM upload procedure tested on a single modification
  5. 2020-11-12: Entries added to Contributors#France, Import/Catalogue. More discussions on talk-fr, more work on the scripts.
  6. 2020-11-13: davidfaure_bot OSM user created, and Automated edits/davidfaure bot.
  7. 2020-11-21: Implemented automated upload and update mechanism with local storage for future comparison.
  8. 2020-11-28: More tags in changesets (including an upstream contribution to osm-bulk-upload), imports for PH=off, then the rest of the imports. Notified talk-fr with a small report

Next steps

  • Add support for collection times (present in the CSV files)
    • Heure_limite_dépôt_Courrier => collection_times
    • Heure_limite_dépôt_Chrono => collection_times:chronopost (non standard, needs to be added to the documentation)
    • Heure_limite_dépôt_Colis => collection_times:parcel (non standard, needs to be added to the documentation)

Data Import Workflow

See https://github.com/dfaure/DataNovaImportScripts for the details.

Each changeset will contain

Updates

Since the data source is automatically updated every day, and covers the next 3 months, the ultimate goal would be a fully automated import every week (to detect changes in opening_hours, and new post offices).