Alameda County Address Import

From OpenStreetMap Wiki
Jump to navigation Jump to search

Goals

Import Alameda County California address points and merge with current data

Schedule

The plan is to finish the initial import in September 2020.

The code can be later periodically re-run to add addresses to newly mapped buildings.

Import Data

Data source site: https://data.acgov.org/datasets/86b6da3837a34f10b8493ea0d22f517a?showData=true
Type of license: Public Domain

Imported tags

  • addr:unit
  • addr:housenumber
  • addr:street

Detailed overview of the import procedure

The current import matches address points from https://data.acgov.org/datasets/ to OSM buildings (ways and relations) and adds tags addr:street, addr:housenumber and addr:unit to buildings (multiple values of tags are separated with “;”). In the current import I don’t modify any existing tags or geometries – I only add missing addr:* tags to ways and relations of buildings.

The import is implemented using Python3, so the below mentioned libraries are those of Python3.

The imported address points are usually located outside the contour of OSM building but inside the corresponding parcel. Hence, a simple spatial join between address points and buildings will not provide enough matches. To increase the number of address-building matches I use a parcel layer to match addresses to parcels and parcels to buildings.

Match between buildings, parcels and address points


Inputs

  • address points
  • parcel polygons
  • OSM buildings (downloaded by the code using the osmnx library)
  • OSM street network (downloaded by the code using the osmnx library)


Pre-processing

Address points:

  1. Expand abbreviations like “St”, “Ave”, “Ln” into “Street”, “Avenue”, “Lane” using the official Postal Service standard suffix abbreviations crosswalk (as in Appendix C here).
  2. Convert to from the CAPITAL CASE to the Title Case using the titlecase library (this library can correctly handle double capitalization – it converts “MCGILL STREET” to “McGill Street” rather than “Mcgill Street” as a simple upper() function).
  3. Correct some names that are different in the OSM street network and address points using the street images from Mapillary/Bing. E.g., address points may have “Bay Shore Boulevard”, while the street signs and the name of the street in OSM say “Bayshore Boulevard”. Address points without a nearby matching OSM street will be excluded from import and should be processed later manually as they are likely to contain errors.

Parcels:

  1. Filter active parcels.
  2. Among active ones, keep parcels that overlap with each other for the area below 20 square meter (checked manually that no address points belong to these small overlays, hence, no addresses will be matched simultaneously to two different parcels)


Matching 1 – address points to parcels (many-to-1)

  1. An example of address points that require buffers in order to be matched to parcels
    Spatially match address points to parcels. Most address points are located at centroids of parcels, so the geopandas.GeoDataFrame.sjoin() command is sufficient for matching them to parcels. Some address points are located outside any parcel, yet, one can visually unmistakably match such points to a corresponding parcel. To match such “outside” points I do the matching iteratively by generating buffers of the sizes [0, 0.01, 0.02, …, 3.0] meters around unmatched address points and intersecting the corresponding buffers with parcels. If a buffer intersects more than one parcel then such point is excluded from import and should be processed later manually.
  2. Check that all points that are matched to a given parcel have the same street name. If there are points with several street names matched to a parcel, then exclude all such points and parcel from import. E.g. if there are two points – say, “1 High Street” and “1 Low Street” – that are matched to a parcel, then it will be problematic to combine these points into addr:* tags for a building (a possible solution for importing addresses with different street names for the same building could be using the addrN scheme, but, as far as I know, it is not widely used/supported). On the contrary, if a parcel has “1 High Street” and “2 High Street”, then they can be combined into addr:street=High Street and addr:housenumber=1;2 for a single building.
  3. Keep addr:unit tag if needed. If addr:unit helps precisely identify the building then it is imported; if addr:unit is redundant then it is omitted from the import. In other words, if an address with [addr:street = X & addr:housenumber = Y & addr:unit = Z] is matched to one building, while the address with only [addr:street = X & addr:housenumber = Y] corresponds to multiple buildings then addr:unit = Z is indispensable and, hence, it is imported; otherwise it is omitted.


Matching 2 – parcels to buildings (1-to-1)

  1. Intersect the buildings and parcels layer and compute the areas of resulting intersections. Consider a building belonging to a given parcel if more that 70% of this building’s area belongs to a given parcel.
  2. If the first step results in more than one building matched to a given parcel, then, among such buildings that share the same parcel, delete the ones with area below 1% lowest threshold of buildings’ area distribution in the county – this will allow to exclude smallest buildings (like garages) from matching. E.g., on the picture below two buildings share the same parcel, yet, it is more plausible that the larger building with more complex geometry should be assigned the address that is matched to a parcel. If, after the smallest 5% of buildings are dropped, some parcels still have more than 1 matched building, then such parcels will be omitted from import as they need some extra manual effort for parcel-building matching.
    Multiple buildings within a parcel

Matching 3 – merge the “address-to-parcel” and “parcel-to-building”

  1. Merge the “address-to-parcel” and “parcel-to-building” concordances by using parcel ID’s (blklot).


Post-processing

  1. Download all addresses currently assigned to OSM buildings, combine them with the resulting “address-building” layer that is generated above and check if the same address would be assigned to more than one building. This step will both identify the existing duplicated addresses in OSM as well as address points, import of which could create such duplicates. In the later case, exclude the “problematic” points (those that would create duplicates) from the import.
  2. Download the OSM data for SF using geofabrik.de. Use osmfilter to generate an osm-file with buildings that don’t have addr:street or addr:housenumber tags. Parse the resulting osm-file with xml.etree.ElementTree. Iterate over the elements of the resulting XML-tree (buildings) and assign addr:street, addr:housenumber and addr:unit (if needed) to buildings with corresponding OSM-id (ways and relations). Save the modified osm-file.
  3. Open the resulting osm-file in JOSM and upload (using the import-specific account).

Code and inputs

Code and inputs are located here.

Contacts

Please, feel free to contact Yury Yatsynovich for any questions.