Toronto/Import/AddressPoints
Status
- Stage: Proposal — feedback window open
- Last revised: 2026-05-13
- Contact: toronto@comentality.com
- OSM import account: skfd imports — dedicated to this import; the maintainer's personal OSM account (skfd) is not used for any upload from this tooling.
- Discussion: OSM Community Forum thread — the announcement venue under the current OSM Import Guidelines. Thread tagged
import; this proposal was posted with the wiki-page link on 2026-05-01. - Tooling: toronto-2-address-import (this repo) + toronto-addresses-import (upstream scraper)
- Pilot evidence: changeset 182585291 (2026-05-13) — Phase 1 pilot, tile
high-park-swansea-sw-se, 252 source candidates (176 uploaded, 72 skipped as ranges, 4 manually rejected). Read-only review-UI snapshot at run 1298. - Upload manifest: uploads/all.csv — cumulative
(address_point_id, address_full, osm_node_id, changeset_id)CSV covering every uploaded item. Per-tile files alongside: high-park-swansea-sw-se.csv. - Sample dev-sandbox upload with errors: changeset 616162 — one upload generated by the import tool against
master.apis.dev.openstreetmap.org, evidence the upload mechanics work end-to-end. - Sample of manual revert: changeset 617602
- Sample of good changeset: changeset 617603
- Live status: tracked in the project README.
This proposal targets the OSM Import Guidelines. All upload activity to date has used the OSM dev sandbox; no production edits will be made until this proposal has cleared the customary 14-day feedback window on the OSM Community Forum (the announcement venue under the current Import Guidelines — the imports@openstreetmap.org mailing list has been deprecated) and on this wiki page.
Summary
One-time, human-reviewed import of missing civic address points from the City of Toronto's "Address Points (Municipal) – Toronto One Address Repository" open dataset into OpenStreetMap, conflated against a fresh OSM snapshot so that only addresses OSM does not already have are created. Every run is reviewed in a local web UI before it leaves the machine; every upload is a distinct, tagged changeset; every action (automatic or manual) is written to an append-only audit log.
Geographic scope: the City of Toronto, Ontario — OSM admin boundary relation 324211, covering the post-1998 amalgamated city (former municipalities of Toronto, East York, Etobicoke, North York, Scarborough, and York). The 1,297 review tiles named in § Schedule partition this boundary.
Scope numbers (active source snapshot #28, 2026-04-18):
| Address class | Active rows | Disposition |
|---|---|---|
Land |
479,966 | Candidate for import as a pure address node. |
Structure |
28,031 | Candidate for import as a pure address node. |
Structure Entrance |
14,354 | Candidate for import as a pure address node. The source's "door" semantics are not preserved on upload — see § Tagging plan. |
Land Entrance |
573 | Excluded from this import (driveway/gate concept, not an address). |
| Total considered | 522,351 | Before conflation. |
| Total excluded upfront | 573 | Land Entrance.
|
Expected output after conflation is materially smaller than the above — any address OSM already carries is dropped at conflation time, not uploaded.
Goals and non-goals
Goals
- Raise OSM's civic-address coverage in the City of Toronto (former municipalities of Toronto, East York, Etobicoke, North York, Scarborough, and York) to match the City's authoritative address roster for addresses OSM is missing today.
- Do so without creating duplicates of addresses already mapped in OSM, and without stamping over existing address data.
- Preserve a per-candidate audit trail (source row → verdict → reviewer decision → changeset id → resulting OSM id) for post-hoc inspection by any OSM contributor.
Non-goals (explicitly out of scope for this import)
- No deletions. If OSM has an address that the City snapshot does not, we do not flag, propose, or remove it. Rationale in § Deferred work.
- No mutation of existing OSM objects. This import creates new nodes only. Any future postcode or tag-enrichment work on matched OSM nodes will be a separate proposal with its own review. A design sketch exists in
future-work/postcode-enrichment.mdbut is not part of this import. - No
addr:interpolationway cleanup. Even where per-address points now cover the same segment as an existing interpolation way. Rationale in § Deferred work. - No polygons. We do not add
addr:*tags to existingbuilding=*ways/relations, and we do not create new buildings. Out-of-scope both for this tool and this proposal. - No geometry editing of any existing object.
- No
Land Entrancerows: the source models driveway/gate entry points; OSM's closest concept isbarrier=gate, not an address. Excluded at ingest.
Schedule
Contacts and reviewer roster
- Primary maintainer / first-line reviewer: toronto@comentality.com
- OSM import account: skfd imports — a dedicated account created solely for this import, per the OSM Import Guidelines recommendation. The maintainer's personal OSM account (skfd) is not used for any upload from this tooling.
- Additional reviewers: named on this wiki page before Phase 1 begins.
- Joining the roster: local Toronto mappers contact the email above.
- Upload rule: no run is uploaded without at least one named reviewer's approval in the web UI.
Phased roll-out
Each phase is shippable and independently reversible. Dates are earliest-start, pending community review and feedback incorporation.
- Phase 0 — community review. Create the proposal page on the OSM wiki at
Toronto/Import/AddressPoints, add a link to it from the Import/Catalogue index, and post the announcement on the existing OSM Community Forum discussion thread (taggedimport) — under the current OSM Import Guidelines, the Community Forum has supplanted the deprecatedimports@openstreetmap.orgmailing list as the import-announcement venue. Minimum 14-day feedback window per the Guidelines, measured from the wiki-page publication / forum-announcement post. Incorporate feedback, revise. - Phase 1 — pilot (1 tile). Completed 2026-05-13. Tile
high-park-swansea-sw-se— a depth-2 quadrant of the High Park-Swansea neighbourhood, bbox(43.633436, -79.480592, 43.639157, -79.469502), 252 source candidates pre-conflation. 176 uploaded as changeset 182585291 (source snapshot #42), 72 skipped (range housenumbers), 4 manually rejected. End-to-end human review by the maintainer; the 8 candidates flagged by any check received explicit operator decisions in the review UI (4 approve, 4 reject), and the AUTO_APPROVED queue was surveyed by eye before upload was triggered. Hold for one week for community response before Phase 2. - Phase 2 — ward-level rollout. Proceed ward-by-ward, working through the 1,297 tiles one at a time. Retain manual approval of a random sample (≥5%) even for auto-approvable items.
- Phase 3 — remaining tiles. Same cadence, same review gating.
- Phase 4 — closeout. Final reconciliation: re-fetch the bbox, publish a post-import report (counts, rejection reasons, outstanding
REVIEW_DEFERREDitems).
Upload rate target is ≤1 changeset per minute by manual cadence (the upload.changesets_per_minute config value is advisory — uploads are operator-triggered per run, not throttled by the tool). One run uploads as one changeset, and the tile design above keeps each run at 250–750 source addresses, so changeset size is bounded by the tile rather than a separate cap. The practical schedule is dominated by human review of the queue, not by upload throughput.
Import data
Source
- Dataset: "Address Points (Municipal) – Toronto One Address Repository", published by the City of Toronto.
- Portal: open.toronto.ca/dataset/address-points-municipal-toronto-one-address-repository
- Consumption path: the City portal feed is scraped and normalised into a SQLite DB by the sibling project toronto-addresses-import. This import consumes that SQLite DB read-only. See
SOURCE_DATA.mdin the tooling repo for the exact schema and the fields we rely on.
Licence
- Upstream licence: Open Government Licence – Toronto.
- ODbL compatibility: compatible. OGL-Toronto is a permissive attribution licence modelled on the Canadian federal OGL, with no share-alike and no non-commercial clauses; attribution is satisfied by the
source=City of Toronto Open Datatag on both the uploaded node and its containing changeset.
Type and volume
- Point data only. No polygons, no lines.
- Per-address civic points with housenumber, street, municipality, ward, lat/lon, and a class descriptor (Land / Structure / Structure Entrance / Land Entrance).
- Post-conflation volume will be substantially smaller than 522k — the actual count depends on OSM's current Toronto address coverage at the moment each tile is run.
Freshness
- OSM source for conflation: Geofabrik Ontario PBF, refreshed via
t2/osm_refresh.py. - Cadence: re-pulled before each run — the "OSM already has this address" signal is always based on a fresh snapshot.
- Staleness rule: if more than 24 h elapse between conflation and upload, re-fetch and re-conflate before opening changesets.
Tagging plan
Per-node tags written
All uploaded elements are nodes. No ways, no relations. The tag set is uniform across all address classes:
| Tag | Source | Notes |
|---|---|---|
addr:housenumber |
address_number |
Copied verbatim after trim. Suffix letters (46A, 710 1/2) preserved.
|
addr:street |
linear_name_full |
Short suffix and trailing direction expanded to the OSM full form at ingest (Amelia St → Amelia Street, Bloor St W → Bloor Street West); the proper-noun part is copied verbatim, including a leading "St" standing for "Saint" (St Clair Ave E → St Clair Avenue East). A standalone Mc followed by a surname token is glued (Mc Caul St → McCaul Street), matching OSM Toronto's convention. See STREET_SUFFIX_EXPAND / expand_street_name() in t2/conflate.py. Normalisation is used only for conflation matching, not for the written tag.
|
source |
static | City of Toronto Open Data. On the node and on the containing changeset.
|
addr:postcode |
enrichment | Written only when a same-address POI in the OSM snapshot already carries one. Never invented, never extrapolated. We adopt the postcode from the nearest same-address POI when present; absent that, we emit no postcode. Details in § Conflation. |
The tag set is uniform across address classes — every uploaded node is a pure address regardless of whether the source row was a parcel, a building centroid, or a door. Structure Entrance rows in particular are not uploaded with entrance=yes: the City source coordinate sits at or near a door, but standalone (un-snapped, not a member of any building way), so the door semantics aren't reliably representable in OSM without the building-way edits that this create-only import excludes (see § Non-goals). The source's class is preserved in our internal audit trail but not emitted into OSM.
Fields deliberately not emitted:
addr:housename,addr:unit,addr:flats,addr:block— source does not carry these in a reliable form.addr:city— the source'smunicipality_namereflects pre-amalgamation former municipalities (Toronto, East York, Etobicoke, North York, Scarborough, York), which are historical rather than current civic entities, and a uniformaddr:city=Torontoadds no information that the bbox doesn't already imply. Pre-amalgamation municipality is kept in the internal audit trail. See § The addr:city question below.addr:country,addr:province,addr:state— omitted per OSM Canadian convention.addr:neighbourhood,addr:suburb,addr:ward— the source'sward_nameand neighbourhood overlays are modelled better as OSM admin polygons than as per-node tags.ref,name,place— none apply.- Any
toronto:*/t2:*custom namespace — rejected on principle.
The addr:city question
The source carries a municipality_name column that reflects the pre-amalgamation former municipalities (Toronto, East York, Etobicoke, North York, Scarborough, York). These are historical, not current civic entities — the City of Toronto is one city since 1998. We do not emit addr:city at all: a uniform Toronto value is redundant given the bbox, and the former-municipality string is historical rather than the current civic name. The pre-amalgamation municipality is preserved in our internal audit trail (used for intra-source duplicate disambiguation; see § Conflation) but not emitted into OSM.
Per-class tagging matrix
| Class | addr:housenumber | addr:street | source | addr:postcode |
|---|---|---|---|---|
Land |
yes | yes | yes | if colocated POI has one |
Structure |
yes | yes | yes | if colocated POI has one |
Structure Entrance |
yes | yes | yes | if colocated POI has one |
Land Entrance |
excluded upfront | |||
Changeset tags
Each changeset opened against the OSM API carries:
| Tag | Value |
|---|---|
comment |
Toronto Open Data address import, run=<run_name> (template in config.toml)
|
source |
City of Toronto Open Data
|
import |
yes
|
bot |
no
|
created_by |
t2-address-import
|
import:client_token |
random per-run UUID — used only for server-side idempotent retry after a network failure; looked up before reopening a changeset so a dropped connection never results in two parallel uploads of the same run. |
import_plan |
https://wiki.openstreetmap.org/wiki/Toronto/Import/AddressPoints
|
The import=yes / bot=no combination matches the OSM Wiki's guidance: these are one-shot, human-reviewed imports, not ongoing automated edits.
Conflation
Algorithm
For each source address we look at the OSM snapshot and classify:
- MATCH — OSM has a pure-address node or polygon with the same normalised housenumber and street, within 15 m of the source point.
- MATCH_FAR — same housenumber/street found 15–100 m away. Surfaced to the human reviewer; never auto-approved.
- MISSING — no OSM address with the same housenumber/street within 100 m. Candidate for upload.
- SKIPPED — housenumber is a range (
100–110 Main St) or contains digit-confusable letters (I,O,Q). Not imported by default; reviewer may opt in per-item.
Search radii are set in config.toml under [conflation]: match_radius_m = 100, match_near_m = 15.
Street-name normalisation (STREET → ST, AVENUE → AVE, NORTH → N, etc.) collapses both short and long forms to a single canonical short and is used only for matching — so an OSM neighbour spelled Avenue and a source row spelled Ave still match. For the written addr:street tag we go the other direction: at ingest, expand_street_name() rewrites the source's short suffix and trailing direction to the OSM full form (Foo Ave W → Foo Avenue West). Proper-noun casing is preserved, including a leading "St" standing for "Saint" (St Clair Ave E → St Clair Avenue East, not Street Clair…).
The normaliser cannot bridge proper-noun spelling differences (space-vs-no-space splits like source Deane Field Crescent vs OSM Deanefield Crescent), nor outright suffix mistakes where the source has the wrong street type. For each confirmed case we add the source string to STREET_NAME_OVERRIDES in t2/conflate.py, applied at ingest so the candidate's matching key and the eventual addr:street upload tag both carry the OSM name local mappers already use. Current entries: Deane Field Cres → Deanefield Cres, Golfcrest Rd → Golf Crest Rd, Forest View Rd → Forestview Rd, Greenhouse Rd → Green House Rd, Posthorn Grv → Post Horn Grv, and the suffix correction Kathleen Ave → Kathleen Cres. The nearby_street_mismatch check (see § Workflow and QA) surfaces fresh candidates for inclusion.
Match targets
Two kinds of OSM feature are valid match targets:
- Pure address nodes — nodes with
addr:housenumberthat do not carry POI keys (amenity,shop,office,tourism,leisure,craft,healthcare,building, plus theirdisused:*/was:*variants). Nodes additionally taggedentrance=*(≈675 across Toronto) count as pure-address match targets — their address is canonical, theentrancetag just records that the point sits on a door rather than the parcel centre. - Polygons with address tags — ways and relations carrying
addr:housenumber, including address-bearing buildings. Polygon centroids are used for the distance calculation.
POI nodes (amenity/shop/etc.) with addr:* tags are explicitly not match targets. Their address is a courtesy annotation and the canonical address point is typically absent. When a MISSING candidate is colocated with such a POI, the review UI acknowledges it with a pill and — if the POI carries addr:postcode — that postcode is adopted onto the proposed new node. This is the only case where we draw tag data off of an existing OSM object, and it is additive (never overwriting).
Nodes dropped from match index
Nodes referenced by an addr:interpolation way are excluded from the index. They are endpoints of a range declaration, not standalone addresses, and treating them as match targets would spuriously suppress candidates that fall between them.
Municipality disambiguation
Toronto absorbed five adjacent municipalities in 1998. Street names recur across the old boundaries — 48 Victor Ave exists as distinct civic addresses in more than one former municipality. Our intra-city duplicate check uses (address_full, municipality_name) as the identity key, not address_full alone. This only affects checks and dedup — the written tags do not carry the former municipality (see § The addr:city question).
Colocated duplicates within the source
- Shape: non-
Landrow (StructureorStructure Entrance) sharing(address_full, municipality_name)with aLandrow in the same run. ~289 rows city-wide on snapshot #28 (276Structure, 13Structure Entrance).Land Entranceis excluded at ingest (see § Non-goals) and so does not reach this pass. - Behaviour: dedup pass in conflation skips the non-
Landrow whenever a same-keyLandsibling exists; theLandrow is treated as the canonical record and is the only one that proceeds to review and upload. The check is purely key-based — no distance threshold — because within one former municipality the source treats oneaddress_fullas one civic address (the municipality component of the key handles cross-municipality string collisions like48 Victor Ave). - Tiebreak rationale:
Landis the parcel-level "this lot has this address" point and maps cleanly to a standalone OSM address node. Non-Landclasses (building centroid, door) are dropped only when a same-keyLandsibling exists; otherwise they flow through normally and carry a unique address.
Acknowledged duplicate-creation paths, deferred to a future phase
Three known OSM data shapes can cause this import to create a colocated duplicate of an address OSM already carries, because they're not representable in the current single-value match key:
addr:interpolationendpoints. Interpolation-way member nodes are dropped from the match index (they're endpoint-of-range declarations, not standalone addresses). A City candidate whose housenumber happens to coincide with one of those endpoint numbers will therefore be classified MISSING and uploaded — creating a node that duplicates the address the interpolation endpoint already asserts. The same effect, by construction, applies to every real City address that falls between the endpoints: the whole premise of the interpolation-replacement phase is that per-address points are better than a synthesised range.- Multi-value
addr:housenumberon a single OSM node. Canonical OSM uses;to separate multi-values, but the Toronto OSM extract additionally contains,-separated lists andN-M-style ranges packed into a single tag (seet2/multi_addresses.py). The match key is the literal string, soaddr:housenumber=100;102;104does not match a City candidate for100— and we'd upload a colocated duplicate for every sub-number the multi-value tag subsumes. - OSM buildings with
addr:housenumberbut no street anchor. ~1,580 elements in the Toronto bbox (1,576 with noaddr:street/addr:place/addr:housename— almost all building polygons — plus 4 ways carrying onlyaddr:housename) are indexed under an empty street, so no City candidate can match them. A City candidate at123 Foo Stsitting on top of one of these buildings is classified MISSING and uploaded. Cleanup is per-element local work — the correct street name needs human/imagery confirmation — and is deferred to a post-import MapRoulette challenge documented infuture-work/no-anchor-osm-buildings.md.
Disposition for this import:
- Accept the transient duplication.
- No algorithmic split at conflation time.
- No new reviewer check.
- Cleanup handled in the follow-up proposals listed under § Deferred work — those proposals enumerate these objects in place, cross-check them against newly-uploaded per-address points, and retire or normalise them as appropriate.
- Handling either shape now would mean editing existing OSM objects (different review bar, different rollback story) — out of scope per § Non-goals.
Workflow and QA
Pipeline stages
Each candidate advances through a deterministic sequence of stages, persisted in the local DB:
INGESTED → CONFLATED → REVIEW_PENDING → APPROVED → UPLOADED
↘ REJECTED / DEFERRED
↘ SKIPPED (range, MATCH, colocated dup)
Each stage is resumable — killing the process mid-run and restarting is safe. Re-running a stage skips work already done.
Checks
Seven automated checks (enabled in config.toml, all severity=info|warn|block):
| Check | Purpose |
|---|---|
match_far |
Matched housenumber/street is 15–100 m away — could be the same point, could be a different building. Always reviewed. |
suffix_range |
Housenumber is a range (100-110) or contains digit-confusable letters. Blocks auto-approval.
|
city_duplicate |
Another candidate in the same run is within a few metres and has the same housenumber. |
intra_source_duplicate |
Duplicate within the source dataset before conflation. |
missing_sample |
Every Nth MISSING candidate is force-reviewed even if it has no other flags. Provides ongoing validation that the auto-approval bar is well-calibrated. |
nearby_street_mismatch |
A MISSING candidate has an OSM address within ~20 m with the same housenumber but a different street name — likely a street-name spelling variant (e.g. source Deane Field Crescent vs OSM Deanefield Crescent) the normalizer can't bridge.
|
potential_amenity |
Matched OSM node carries non-address tags (name, ref, entrance, etc.) — hints the POI filter may need to grow. Not a block; feeds iteration on the match target rules.
|
Review queue
- MISSING candidates with no flags raised by the checks enter the review queue as
AUTO_APPROVED— the reviewer's action is one-click acknowledge-or-reject rather than full review. During Phases 1 and 2 we will hold evenAUTO_APPROVEDitems for a human click on a random sample (≥5%). - MATCH candidates bypass upload entirely (they are not in scope — OSM already has the address).
- Every other state (
MATCH_FAR,MISSINGwith any flag,SKIPPED) requires explicit reviewer action: Approve / Reject / Defer.
The review UI is a local Flask app (python run.py, http://localhost:5000/). It is not exposed to the public internet. Reviewers are the named people listed on this wiki page.
Audit log
- Scope: every automatic classification, every reviewer decision, every changeset open/upload/close.
- Key and durability: keyed by candidate id; append-only; survives reruns.
- Publication: this wiki page links per-run audit dumps; any OSM contributor can reconstruct what happened for any uploaded node.
Post-upload reconciliation
- Id mapping: OSM API returns local-id → OSM-id pairs per upload; stored in the audit log.
- Per-tile publication: every export run writes an upload manifest — a
(address_point_id, address_full, osm_node_id, changeset_id)CSV — at<deploy>/uploads/<tile_id>.csvfor each completed tile, plus a cumulative<deploy>/uploads/all.csvcovering every uploaded item across all tiles. This wiki page links to the cumulative file. Reviewers can audit any uploaded node within hours of the changeset closing — no need to wait for a phase boundary. This serves the same purpose as the MontréalAdresses ponctuellesreconciliation table, on a tile-grained cadence that fits the slower per-tile rollout. - Dispute traceability: any uploaded node can be traced back to its source row, conflation verdict, reviewer decision, and changeset id in a single query.
Deferred work (not part of this import)
These are documented so no reviewer has to ask whether we forgot them. Each would be proposed separately if and when we pursue it.
Deleting OSM addresses absent from the City source
- What it is: remove OSM addresses the City snapshot doesn't have.
- Why deferred: absence in the source is a weaker signal than presence — the feed has refresh lag, neighbourhoods with acknowledged coverage gaps, and retired-address lifecycle states that aren't cleanly separable from "never existed." Deletion on that signal alone would destroy real addresses on weaker evidence than we accept for additions.
- What a future proposal needs: its own review queue (the verdicts here don't fit); a street-level cross-check to suppress the common "Toronto's feed is missing a whole street" case; strictly human approval — no automation.
Replacing addr:interpolation ways with per-address points
- What it is: retire
addr:interpolationways where the City snapshot now provides real per-address points along the same segment. - Why deferred: replacement is a bulk structural edit, not an address addition — different changeset hygiene, different review bar.
- What a future proposal needs: cross-validation that every integer in the interpolation's range has a colocated City point; careful handling of tags the way carries on behalf of its endpoints (
addr:street,addr:postcode); resolution of any colocated duplicates this import created against interpolation endpoints (see § Conflation).
Normalising multi-value addr:housenumber nodes
- What it is: normalise OSM nodes that pack multiple street numbers into one
addr:housenumbertag (;-,,-, orN-M-delimited) — either by splitting into per-number nodes or by retiring in favour of newly-uploaded per-address points. - Why deferred: edits existing objects, needs per-case review, belongs in a mutation-capable pipeline rather than this create-only one.
- What a future proposal needs: enumeration source already exists (
t2/multi_addresses.py, surfaced on/osm/multi); cross-check against newly-uploaded per-address points; resolution of any colocated duplicates this import created against multi-value nodes subsuming a City housenumber (see § Conflation).
Adding addr:street to OSM buildings tagged with only a housenumber
- What it is: ~1,580 OSM elements in the Toronto bbox carry
addr:housenumberwith noaddr:street/addr:place/addr:housename. They cannot be matched by this import and are an acknowledged duplicate-creation source. - Why deferred: the fix is per-element local work, not algorithmic. Picking the right street requires imagery / local knowledge for each building. A pre-import sweep gains us nothing predictable; post-import, our own upload manifest becomes a high-confidence street hint for the colocated cases.
- What a future proposal needs: a one-shot enumeration script that emits a GeoJSON feed (cross-referencing our upload manifest and the City source for street hints), a MapRoulette challenge seeded from it, and — once mappers have added the missing
addr:streettags — a reconciliation pass to merge any per-address node we uploaded into the now-anchored building. Sketch infuture-work/no-anchor-osm-buildings.md.
Mutation of matched OSM nodes (e.g. postcode enrichment)
- What it is: add
addr:postcode(or other tags) to existing matched OSM address nodes, e.g. by copying from a same-address POI nearby. - Why deferred: this import is additive-creation-only; mixing
<modify>into the same changeset flow expands blast radius and changes the review bar. - What a future proposal needs: sketched in
future-work/postcode-enrichment.md— version-checked writes, separate changeset taggedimport:kind=postcode_enrichment, human approval per proposed enrichment (no auto-approve).
Risks and mitigations
| Risk | Mitigation |
|---|---|
| Duplicate creation against an OSM address we didn't see. | Conflation against a fresh Geofabrik snapshot; 100 m search radius with normalised housenumber/street; re-fetch + re-conflate if conflation-to-upload lag exceeds 24 h; post-upload reconciliation so any duplicate raised by the community can be traced to its source row and corrected. |
| Incorrect street name (pre/post amalgamation rename, City typo, spelling variant). | match_far catches near-coincident points whose street name normalizes the same; nearby_street_mismatch catches near-coincident points whose street name normalizes differently (the Deane Field / Deanefield shape — see § Conflation normalisation). city_duplicate surfaces remaining cases by colocation. Reviewer can defer; street-name renames without a corresponding OSM change are escalated to local mappers, not force-pushed.
|
| Rate pressure on OSM API / planet feed. | Operator-triggered cadence targeting ≤1 changeset/min; one run per changeset, with each run sized to a 250–750-address tile (see § Schedule); pilot tile first; no parallel uploaders from this tool. |
| Scripted or bulk auto-approval drift. | missing_sample check force-reviews every Nth MISSING; Phases 1 and 2 hold ≥5% of AUTO_APPROVED items for manual click; reviewer actions and their actors are in the audit log.
|
| Mid-upload crash leaving orphan changesets. | import:client_token tag on each changeset; on retry the client searches open changesets for the token before opening a new one (t2/osm_client.py). Changesets are explicitly closed after upload.
|
| POI filter too narrow — a matched "pure address" node actually represents a shop. | potential_amenity check surfaces these as severity=info; reviewer can defer; POI_TAG_KEYS in t2/conflate.py is iterated as we find cases.
|
| Community unaware of ongoing import. | This wiki page kept updated with per-run progress; changeset comments include run name; contact email published. |
Revert plan
Every run is uploaded as a single changeset tagged import=yes, created_by=t2-address-import, and import:client_token=<uuid>. The revert surface is therefore per-changeset, which makes routine rollback straightforward.
- Routine revert (one bad changeset). A changeset later identified as problematic is reverted using the JOSM Reverter plugin. The changeset id is available in the audit log for every uploaded candidate, and the published OSM-ID list (see § Post-upload reconciliation) groups candidates by changeset for quick lookup.
- Systemic issue mid-import. If the community flags a class of problems — a tag error, a conflation false-positive pattern, a streetname misspelling that slipped through — uploads pause immediately. The pipeline is fixed, conflation is re-run on the affected tiles, and resumption only happens after a fresh human review pass. Any already-uploaded runs produced by the defective code are reverted before we resume.
- Post-import community revert. We do not contest good-faith reverts by local mappers. Any revert we discover is recorded in our audit log; where the underlying candidate is still valid, it is re-enqueued for explicit human re-review rather than automatically re-uploaded.
- Freeze trigger. If a participant in the OSM Community Forum import thread, a Toronto local-community channel, or a reviewer with commit rights on this page files a stop-work request, we freeze within one business day and do not resume until the concern is addressed and acknowledged on this page.
We do not rely on <delete> osmChange blocks to roll ourselves back — a hand-edit made on top of one of our uploads between upload and revert could otherwise be clobbered. The JOSM Reverter plugin handles that case correctly (it builds a conflict-aware inverse changeset), so it stays authoritative.
Open questions for the community
- Changeset comment template. Current format is
Toronto Open Data address import, run=<run_name>. Any information we should add or remove? - Post-import monitoring. How long after the final upload should we commit to watching for community-raised issues? Proposing 90 days.
- Empty lots and recently demolished buildings. Some source rows describe addresses where no building presently stands — empty lots awaiting construction, or recent demolitions where the City feed hasn't yet retired the record. Distinguishing a real current civic address from stale source data here requires local knowledge. Preferred handling: upload all such rows (the address is a civic record regardless of whether a structure stands), skip those without a visible building on recent imagery, or route them through per-tile review with a local mapper?
- Address ranges (
4611-4619 Steeles Ave W). Source has 1,639 active rows wherelo_num ≠ hi_num, plus 49 lettered ranges (49A-59A,361A-415A…361J-415J) where the same letter sits on both endpoints. There is no parity flag and no enumerated unit list — the source stores only the two endpoints and a single(latitude, longitude). The current pipeline SKIPs every range row; reviewers can opt in per-item, in which case the verbatim string is uploaded asaddr:housenumber=4611-4619on a single node at the source's coordinate. Three options: (a) keep skipping by default (current behaviour); (b) upload the verbatim range string on the single source-provided coordinate; (c) expand into one node per implied housenumber (e.g.{4611, 4613, 4615, 4617, 4619}) — coordinates would have to be synthesised since the source gives only one point per range. Two facts that bear on (c): 98.7% of range rows have matching parity onlo_num/hi_num(so a step-2 expansion is well-defined), but 22 rows are cluster-style sequential numbering (1-96 Red Cedarway, the elevenCantle Pathblocks) where the implied step is 1, and 49 rows are lettered subdivisions of larger complexes where multiple parallel rows share the same numeric span. Preference between (a), (b), and (c)? Option (c) would also need a convention for coordinate placement (single point with all nodes stacked, jittered, or interpolated along the centreline).
Answers to each will be incorporated into this page and, where they change pipeline behaviour, into config.toml and the relevant code.
References
OSM process and policy
- OSM Import Guidelines
- OSM imports catalogue
- OSM contributor terms
- JOSM Reverter plugin (routine-revert tool named in § Revert plan)
Source data
This import's tooling
- toronto-2-address-import — review UI, conflation, uploader
- toronto-addresses-import — upstream City scraper (City feed → SQLite)
- Internal terminology (Candidate / Verdict / Status / Stage):
README.md§ Terminology in the tooling repo - Source-side facts verified against snapshot #28:
SOURCE_DATA.mdin the tooling repo
Benchmark proposals used while drafting
These are the prior OSM import proposals this document was compared against. Section coverage and convention choices (changeset tagging, publication of created OSM ids, revert plan wording) draw on all three.