OSMFJ/License/Rebuild Plan

From OpenStreetMap Wiki
Jump to navigation Jump to search

原文はこちら

背景

我々は2012/4/1のライセンス切替に向けて作業を進めています。興味のある人々が再構築(リビルド)メーリングリスト 上で進捗を追跡しています。このメーリングリストは必要なコーディング、システム管理者、それにコミュニケーションの努力を相互に調整することも意図しています。このメーリングリストの参加者は「再構築チーム」と呼ばれるでしょう。このチームは興味ある人なら誰でもこのチームに参加できます。

「データベース再構築」では我々のマップデータベースのコンテンツ(オブジェクトの履歴も含む)を全ての公開データがODbLの下でライセンスできることが確認できるような形式に変換するのに必要なステップを議論しています。再構築の期間(WTFE(訳注:ASAPの反語)として知られています)CCデータが保有されるであろう抽出条件は、今やいくつかの境界線上の事例を除いてすばらしくよく定義されており、多くのツールがどのオブジェクトがODbLクリーンでないか、デモ表示するために存在しています。一連のテストは残る境界線上のケースが識別され、向き合わされるための手段です。

再構築チームの中には、主に、オブジェクトの特定の過去のバージョンのもののうち新しいライセンスと互換がないものについてユーザへの公開を抑制する新しい項目の追加を通じて、配置済みデータベースを変更する再構築のアプローチに関するコンセンサスがあります。このアプローチはもともとMatt Amosが提案したもので、彼はその実装の本体をコーディングしました。これにはマップデータ用の様々な取捨選択する基準のストレステストを行うためのテストハーネスも含まれています。

このドキュメントはWTFE基準とMattによる配置済み方法論に基づいて、再構築を成功させるために必要な詳細タスクを説明するために書かれました。このドキュメントはまた、建設的な批判とこの作業を進めるための実際の提案の両方を推進するために、実行しなければならないタスクの再構築チームが共有した理解を文書化する目的で作成されました。

タスク

この表の目的は再構築ツールの機能をできるだけ分割することであるため、下記に名付けた単位の中には、より大きい作業本体の一部であるものもあります。

タイトル 技術 リーダ 貢献者 状態
オブジェクトの再構築 Ruby on rails Matt Amos - Expected 23rd Mar
再構築規則用のテストハーネス Ruby Matt Amos Frederik Ramm, Dermot McNally, Richard Fairhurst Expected 23rd Mar
校訂をサポートするAPI Ruby on rails Matt Amos - Needed 26-29 Mar
エディタのテスト varied Editor maintainers - Needed 30 Mar


「疑わしいオブジェクト」一覧の保有 WTFE Frederik Ramm? - Needed 25 Mar
テストラン dev server, new API Matt Amos - Planned 24-25th Mar
例外変更セットの凍結一覧 wikiより Frederik Ramm - Before 25 Mar
最終 CC Planet ファイル 通常通り ? - N/A
読込専用APIモード開始 API サーバ ? - Planned 27 Mar
本稼働(データ) API サーバ ? - Planned 27-30 Mar
本稼働(API変更) API サーバ ? - Planned 27-31 Mar
読み書きAPIモード開始 API サーバ ? - ASAP 27-30 Mar


支援求む

[Draft] あなたの参加が必要なタスクの一覧:

  • 既存テストケースの健全性チェックとその追加(説明)
  • 改版されたAPIに対するエディタテスト
  • API環境と校訂テスト用のテストDBのセットアップ
  • 校訂ルールの正しいアプリケーション用のテストDB内の選択されたオブジェクトの前後での状態の検証。
  • Using WTFE tools, checking data in your area expected to be treated as clean using the "exceptional changesets" rule. Any still showing as dirty must be flagged to Frederik Ramm and Simon Poole.
  • ramoth上へのAPIサーバ環境のインストール(it is hoped to exploit the licence change to migrate onto ramoth and this may also facilitate some of the tests)

オブジェクトの再構築

This is a ruby toolset containing object representations of all OSM object types with methods to migrate them from their CC incarnations to ODbL versions, performing any required edits, deletions and/or redaction of historical versions.

本稼働のインパクト

Requires: Nothing

Required by: Test suite (needs object interface), Redaction of production DB (needs full implementation)

リビルド規則用のテストハーネス

The most critical and hard to validate aspect of the rebuild is the correct application of the rules. A flawed ruleset could allow non-ODbL-clean data to endure or cause perfectly valid data to be removed. Because of this, prior to the final rebuild of the production database, we wish to develop the rebuild code in a test-driven fashion. The tests of the rebuild rules are therefore broken out into a separate task.

Tests are written in Ruby, but are still quite intelligible to non-Ruby coders. They manipulate the same Ruby Rebuild Objects as the rebuild itself will. Each test defines the edit history of a single OSM object (node, way or relation), calculates the rebuild actions that will be applied by the Rebuild logic and tests whether the resulting actions are those expected.

Having a comprehensive suite of tests is currently the single highest priority in the rebuild project. Tests are welcome from all comers - if you cannot provide a test case in Ruby code, please write your test case as best you can in prose or pseudo-code and post it to the rebuild list. The tests can be executed locally with very few prerequisites - no OSM rails port installation is required. Please see the code for more details.

本稼働のインパクト

Not used for actual rebuild, but test suite must be stable and complete before we can safely commence actual redaction.

Requires: Nothing

Required for: Redaction in production

校訂用のAPIサポート

A post-rebuild database will contain, at least initially, a mixture of ODbL-clean content and non-clean content marked as "hidden". This will require that any API operations that access historic versions of objects change their behaviour to correctly suppress redacted data.

本稼働のインパクト

High impact. The updated API will support suppression of redacted changes as indicated in the revised schema. As such, the updated code will depend on the necessary DB schema migration. The revised API code can be safely deployed prior to actual redaction and put live in a single step, although the database changes involved may incur some downtime.

Requires: Knowledge of final DB schema changes and representation of redacted objects.

Required by: Redaction of production data (or, if the existing API code will safely ignore redactions, can wait until ODbL declaration)

エディタのテスト

Since API changes are to be made, the most important OSM editors should be tested for non-breakage after the API code is deemed stable and before it is deployed to production. Any issues ought to be confined to functionality that interacts with object history, with revert plugins and undelete support particularly at risk.

The API changes are being developed in such a way that no change in editor behaviour should be required. API calls dealing with historical versions will be returned exactly the same format of data, but with troublesome content obscured, replaced with generic placeholders. Similarly, no new API version will be declared unless a compelling reason to do so can be identified.

本稼働のインパクト

Independent of most of the process, but has to be right once the new API code is live.

Requires: Revised API deployed to a test instance

Required by: Deployment of revised API to production

"Suspect Object" 一覧の保持

The processing in-place of each OSM object will consume time and resources. However, the vast majority of objects in the database are known to be clean. The rebuild process will leave such a clean object untouched, allowing us an optimisation. Instead of processing every object, knowing that most will involve do-nothing, we intend to process only those objects that are deemed "suspect" - that is, those having at least one non-agreeing mapper in their history.

It is hoped that the suspect objects list can be derived from existing WTFE logic, though it should take a more conservative view than WTFE. Only objects with agreeing mappers throughout their history should be excluded from the list.

本稼働のインパクト

Requires: Source dataset from which to extract

Required by: Redaction of production DB

テストラン

Once the test harness is considered comprehensive enough to warrant it, the rebuild code can be deployed to a test instance of the API database, currently most likely to be hosted on the dev server. This can be seeded with a subset of the OSM database in an interesting area. In-place conversion can then be run against some or all of the test database, with the resulting "cleaned" data examined to test that the logic has been applied as expected.

本稼働のインパクト

This is a control gate before the production DB is touched

Requires: Completed test suite

Required by: Redaction of production data

例外的な変更セットの凍結一覧

An exceptional changeset is one of the following:

  • One that will be considered ODbL-clean although the mapper has not agreed to the licence change (for use in cases where there are grounds for overruling the mapper's normal preference, often with the specific consent of the mapper).
  • One that will not be considered ODbL-clean even though the mapper has agreed to the licence change (for use in cases where it is known that the changesets contain non-OBbL-safe data).

More information

New information: In the specific case of Poland it seems that we may be receiving details of ODbL-clean data at object level (sub-changeset) as a consequence of the way data imports from UMP data are being relicensed at the granularity of individual UMP contributors. If we are to support this, and the benefit is significant, the exceptional changeset support will need to be extended to cover this case. It may be appropriate to split this to a separate task.

本稼働のインパクト

A gate prior to production redaction

Requires: Final decisions by community on exceptional changesets and (for Poland) single objects

Required by: Redaction of production DB

最後のCC Planet ファイル

Prior to any automated data removal, with the actual date dependent on the expected running time of the redaction process, the last CC Planet File will be generated. This will be made available for download, possibly shortly after the actual rebuild has taken place.

本稼働のインパクト

None, as daily planets are generated anyway. LWG will declare the latest "useful" planet file to be the last CC planet.

読込専用フェーズ

The chosen in-place modification of the DB allows, in theory, for redaction to take place against a running database. Similarly, it is expected that both the existing and the updated API code will behave gracefully with an updated database, other than the fact that the existing code will be unable to filter non-ODbL-clean data. This allows the flexibility to redact the database before deploying the API updates as long as the data set is not declared to be under ODbL until the API changes are made.

However, for reasons of speed, it is proposed to disable API writes during the redaction process. Again, to boost speed, redaction itself is also likely to occur using a private interface to the database rather than going through the API.

The API will be held read-only for the duration of the redaction process. It is expected (though not required) that the updated API code will have been deployed by the time read-write mode is reinstated.

本稼働(データ)

Once the tools are complete and deemed to function correctly and stably, they can be deployed to the production API server and the required DB migrations performed.

Once the code is deployed, it is possible to commence redaction on all objects not known to be already clean.

本稼働(タイルサーバ)

Once the database contains ODbL-clean data, we will wish to switch attribution of the tiles we serve (Mapnik layer), requiring in turn a reimportation of rendering data and flushing of tiles, in addition to a new coastline run. Downstream users of our tiles and others involved in attribution of Mapnik tiles (Openlayers devs...) must also be informed.

リスク

The rebuild process will touch objects in the OSM production database, some of them in such a way that data will be removed (in accordance with the tested criteria). This section considers the scope for error and the options to recover from any such errors.

Incorrect data criteria applied

This can happen in one of two directions - the deletion of clean data or the failure to delete problem data. Since the methodology will not destructively edit any existing versions of an object (all changes applying instead to the current version), any object may be reprocessed if such an error is identified, if required using improved selection criteria or perhaps on the basis of a changed decision for exceptional treatment of a changeset or single object.

This approach does have the weakness that conflicts (similar to normal edit conflicts) could arise if such flawed redaction is noticed after a large passage of time. For that reason, vigilance in the early stages is urged, including spot checks during the read-only phase.

Redacting DB proves very slow

This would prolong the read-only phase. No actual data would be damaged, but the impact on mappers would be unfortunate. More on this after the tests yield some benchmarks.

Changesets (or objects) requiring exceptional handling are discovered late

Every effort should be expended to avoid this. It will be possible, though inconvenient, to reprocess objects later discovered not to have received exceptional handling when they should have. In the case of smaller data sets you can expect the do the resolution work yourself if the administrative burden is not warranted.

For larger data sets, reprocessing may be considered, but this will likely require either additional downtime or the extension of the tools to support live redaction. In addition, the comments above about a risk of edit conflicts will also apply.

There are no promises that this remedy will ever be considered, so proceed on the assumption that you have one chance only to get your exceptional handling list right first time round.