|オブジェクトの再構築||Ruby on rails||Matt Amos||-||Expected 23rd Mar|
|再構築規則用のテストハーネス||Ruby||Matt Amos||Frederik Ramm, Dermot McNally, Richard Fairhurst||Expected 23rd Mar|
|校訂をサポートするAPI||Ruby on rails||Matt Amos||-||Needed 26-29 Mar|
|エディタのテスト||varied||Editor maintainers||-||Needed 30 Mar
|「疑わしいオブジェクト」一覧の保有||WTFE||Frederik Ramm?||-||Needed 25 Mar|
|テストラン||dev server, new API||Matt Amos||-||Planned 24-25th Mar|
|例外変更セットの凍結一覧||wikiより||Frederik Ramm||-||Before 25 Mar|
|最終 CC Planet ファイル||通常通り||?||-||N/A|
|読込専用APIモード開始||API サーバ||?||-||Planned 27 Mar|
|本稼働(データ)||API サーバ||?||-||Planned 27-30 Mar|
|本稼働(API変更)||API サーバ||?||-||Planned 27-31 Mar|
|読み書きAPIモード開始||API サーバ||?||-||ASAP 27-30 Mar
- Using WTFE tools, checking data in your area expected to be treated as clean using the "exceptional changesets" rule. Any still showing as dirty must be flagged to Frederik Ramm and Simon Poole.
- ramoth上へのAPIサーバ環境のインストール(it is hoped to exploit the licence change to migrate onto ramoth and this may also facilitate some of the tests)
This is a ruby toolset containing object representations of all OSM object types with methods to migrate them from their CC incarnations to ODbL versions, performing any required edits, deletions and/or redaction of historical versions.
Required by: Test suite (needs object interface), Redaction of production DB (needs full implementation)
The most critical and hard to validate aspect of the rebuild is the correct application of the rules. A flawed ruleset could allow non-ODbL-clean data to endure or cause perfectly valid data to be removed. Because of this, prior to the final rebuild of the production database, we wish to develop the rebuild code in a test-driven fashion. The tests of the rebuild rules are therefore broken out into a separate task.
Tests are written in Ruby, but are still quite intelligible to non-Ruby coders. They manipulate the same Ruby Rebuild Objects as the rebuild itself will. Each test defines the edit history of a single OSM object (node, way or relation), calculates the rebuild actions that will be applied by the Rebuild logic and tests whether the resulting actions are those expected.
Having a comprehensive suite of tests is currently the single highest priority in the rebuild project. Tests are welcome from all comers - if you cannot provide a test case in Ruby code, please write your test case as best you can in prose or pseudo-code and post it to the rebuild list. The tests can be executed locally with very few prerequisites - no OSM rails port installation is required. Please see the code for more details.
Not used for actual rebuild, but test suite must be stable and complete before we can safely commence actual redaction.
Required for: Redaction in production
A post-rebuild database will contain, at least initially, a mixture of ODbL-clean content and non-clean content marked as "hidden". This will require that any API operations that access historic versions of objects change their behaviour to correctly suppress redacted data.
High impact. The updated API will support suppression of redacted changes as indicated in the revised schema. As such, the updated code will depend on the necessary DB schema migration. The revised API code can be safely deployed prior to actual redaction and put live in a single step, although the database changes involved may incur some downtime.
Requires: Knowledge of final DB schema changes and representation of redacted objects.
Required by: Redaction of production data (or, if the existing API code will safely ignore redactions, can wait until ODbL declaration)
Since API changes are to be made, the most important OSM editors should be tested for non-breakage after the API code is deemed stable and before it is deployed to production. Any issues ought to be confined to functionality that interacts with object history, with revert plugins and undelete support particularly at risk.
The API changes are being developed in such a way that no change in editor behaviour should be required. API calls dealing with historical versions will be returned exactly the same format of data, but with troublesome content obscured, replaced with generic placeholders. Similarly, no new API version will be declared unless a compelling reason to do so can be identified.
Independent of most of the process, but has to be right once the new API code is live.
Requires: Revised API deployed to a test instance
Required by: Deployment of revised API to production
"Suspect Object" 一覧の保持
The processing in-place of each OSM object will consume time and resources. However, the vast majority of objects in the database are known to be clean. The rebuild process will leave such a clean object untouched, allowing us an optimisation. Instead of processing every object, knowing that most will involve do-nothing, we intend to process only those objects that are deemed "suspect" - that is, those having at least one non-agreeing mapper in their history.
It is hoped that the suspect objects list can be derived from existing WTFE logic, though it should take a more conservative view than WTFE. Only objects with agreeing mappers throughout their history should be excluded from the list.
Requires: Source dataset from which to extract
Required by: Redaction of production DB
Once the test harness is considered comprehensive enough to warrant it, the rebuild code can be deployed to a test instance of the API database, currently most likely to be hosted on the dev server. This can be seeded with a subset of the OSM database in an interesting area. In-place conversion can then be run against some or all of the test database, with the resulting "cleaned" data examined to test that the logic has been applied as expected.
This is a control gate before the production DB is touched
Requires: Completed test suite
Required by: Redaction of production data
An exceptional changeset is one of the following:
- One that will be considered ODbL-clean although the mapper has not agreed to the licence change (for use in cases where there are grounds for overruling the mapper's normal preference, often with the specific consent of the mapper).
- One that will not be considered ODbL-clean even though the mapper has agreed to the licence change (for use in cases where it is known that the changesets contain non-OBbL-safe data).
New information: In the specific case of Poland it seems that we may be receiving details of ODbL-clean data at object level (sub-changeset) as a consequence of the way data imports from UMP data are being relicensed at the granularity of individual UMP contributors. If we are to support this, and the benefit is significant, the exceptional changeset support will need to be extended to cover this case. It may be appropriate to split this to a separate task.
A gate prior to production redaction
Requires: Final decisions by community on exceptional changesets and (for Poland) single objects
Required by: Redaction of production DB
最後のCC Planet ファイル
Prior to any automated data removal, with the actual date dependent on the expected running time of the redaction process, the last CC Planet File will be generated. This will be made available for download, possibly shortly after the actual rebuild has taken place.
None, as daily planets are generated anyway. LWG will declare the latest "useful" planet file to be the last CC planet.
The chosen in-place modification of the DB allows, in theory, for redaction to take place against a running database. Similarly, it is expected that both the existing and the updated API code will behave gracefully with an updated database, other than the fact that the existing code will be unable to filter non-ODbL-clean data. This allows the flexibility to redact the database before deploying the API updates as long as the data set is not declared to be under ODbL until the API changes are made.
However, for reasons of speed, it is proposed to disable API writes during the redaction process. Again, to boost speed, redaction itself is also likely to occur using a private interface to the database rather than going through the API.
The API will be held read-only for the duration of the redaction process. It is expected (though not required) that the updated API code will have been deployed by the time read-write mode is reinstated.
Once the tools are complete and deemed to function correctly and stably, they can be deployed to the production API server and the required DB migrations performed.
Once the code is deployed, it is possible to commence redaction on all objects not known to be already clean.
Once the database contains ODbL-clean data, we will wish to switch attribution of the tiles we serve (Mapnik layer), requiring in turn a reimportation of rendering data and flushing of tiles, in addition to a new coastline run. Downstream users of our tiles and others involved in attribution of Mapnik tiles (Openlayers devs...) must also be informed.
The rebuild process will touch objects in the OSM production database, some of them in such a way that data will be removed (in accordance with the tested criteria). This section considers the scope for error and the options to recover from any such errors.
Incorrect data criteria applied
This can happen in one of two directions - the deletion of clean data or the failure to delete problem data. Since the methodology will not destructively edit any existing versions of an object (all changes applying instead to the current version), any object may be reprocessed if such an error is identified, if required using improved selection criteria or perhaps on the basis of a changed decision for exceptional treatment of a changeset or single object.
This approach does have the weakness that conflicts (similar to normal edit conflicts) could arise if such flawed redaction is noticed after a large passage of time. For that reason, vigilance in the early stages is urged, including spot checks during the read-only phase.
Redacting DB proves very slow
This would prolong the read-only phase. No actual data would be damaged, but the impact on mappers would be unfortunate. More on this after the tests yield some benchmarks.
Changesets (or objects) requiring exceptional handling are discovered late
Every effort should be expended to avoid this. It will be possible, though inconvenient, to reprocess objects later discovered not to have received exceptional handling when they should have. In the case of smaller data sets you can expect the do the resolution work yourself if the administrative burden is not warranted.
For larger data sets, reprocessing may be considered, but this will likely require either additional downtime or the extension of the tools to support live redaction. In addition, the comments above about a risk of edit conflicts will also apply.
There are no promises that this remedy will ever be considered, so proceed on the assumption that you have one chance only to get your exceptional handling list right first time round.