Import/Colorado Addresses

From OpenStreetMap Wiki
Jump to navigation Jump to search

The Colorado Addresses Import is an import of Colorado Addresses (covering Colorado in the United States).

Goals

  • Import addresses from the Colorado Address set into OpenStreetMap, preferentially using more localized datasets when available. If you know of a dataset that is Public Domain, or the locality will allow its use in OpenStreetMap (e.g., waiver), it still needs to go through the import process, although I am open to using that dataset in my import over the Colorado Address set.

Current Status

Import is in planning


Schedule

  • January, 2020 - Checked license of data (Public Domain/permission obtained)
  • January, 2020 - Evaluate the data quality
  • January, 2020 - Upload data to an OSM-like server for use with MapWithAI (JOSM plugin)
  • January, 2020 - Begin the import

Outreach, QA, Review & Feedback


How to Respond

I'll be watching for feedback on:

  • the discussion side of this page
  • the email thread

Data Source

StateWide

Website: https://data.colorado.gov/, specifically: https://data.colorado.gov/State/Statewide-Aggregate-Addresses-in-Colorado-2019-Pub/n7je-akky
Data license: Public Domain
Type of license (if applicable): Public Domain
OSM attribution (if required): http://wiki.openstreetmap.org/wiki/Contributors#Colorado
ODbL Compliance verified: yes

Mesa County

Website: https://emap.mesacounty.us/DownloadData/, (search for E911 address points)
Data license: TODO check -- I have previously gotten permission to use any of their data in OSM
Type of license (if applicable): TODO check
OSM attribution (if required): http://wiki.openstreetmap.org/wiki/Contributors#Colorado
ODbL Compliance verified: yes

Permission: https://github.com/osmlab/editor-layer-index/blob/gh-pages/sources/north-america/us/co/Mesa_County_Data.pdf

Data Processing Plan

All data manipulation is performed using ogr2osm

Translating Tags

See Scripts.

Conflating into OSM

The translated files will be served with https://gitlab.com/smocktaylor/serve_osm_files for use with the JOSM MapWithAI plugin.

Current issues:

  • No conflation takes place server side, so while all addresses in an area may be added, the service will still provide the addresses (possible duplicates, JOSM's address checker should help with this).

Manual Review

Same as with the MapWithAI road/building datasets.

ChangeSet Tags (MAY CHANGE -- DEPENDS ON TOOL, source shouldn't change, however)

Sample Data

Initial Data

Post Conflation Data -- note 1525 Lola Court and 1674 Myers Lane

Risks & Known Issues

TODO

Scripts

Colorado

"""
A translation function for Colorado Public Domain address data in ogr2osm

"""
import generic_addresses

PREFIX_DIR = "PreDir"
PREFIX_TYPE = "PreType"
ADDR_HOUSENUMBER = "AddrNum"
ADDR_HOUSENUMBER_SUFFIX = "NumSuf"
STREET_NAME = "StreetName"
STREET_TYPE = "PostType"
POSTFIX_DIR = "PostDir"
ZIPCODE = "Zipcode"
STATE = "STATE"
CITY = "PlaceName"
COUNTY = "County"
UNIT = "UnitNumber"


def filterTags(attrs):
    if not attrs:
        return
    tags = generic_addresses.parseAddressTags(
        attrs,
        prefix_dir=PREFIX_DIR,
        prefix_type=PREFIX_TYPE,
        addr_housenumber=ADDR_HOUSENUMBER,
        addr_housenumber_suffix=ADDR_HOUSENUMBER_SUFFIX,
        unit=UNIT,
        street_name=STREET_NAME,
        street_type=STREET_TYPE,
        postfix_dir=POSTFIX_DIR,
        city=CITY,
        state=STATE,
        zipcode=ZIPCODE,
    )
    generic_addresses.removeEmpty(tags)
    return tags

Mesa County Specific

With the following script, the January 2020 data set output https://drive.google.com/open?id=1Sg2mRCED9PGpPjp3vQCK38JPf8In8xXY.

Please note that (a) everything should be expanded, (b) there is an actual "E Road", "N Road", and so on, and (c) streets and addresses really do have "1/2", "3/4", and other fractions in them.

"""
A translation function for Mesa County E911 address data in ogr2osm

"""
import generic_addresses

PREFIX_DIR = "PREFIX_DIR"
PREFIX_TYPE = "PREFIX_TYP"
ADDR_HOUSENUMBER = "HOUSE_NUMB"
ADDR_HOUSENUMBER_SUFFIX = "HOUSE_SUFF"
STREET_NAME = "STREET_NAM"
STREET_TYPE = "STREET_TYP"
POSTFIX_DIR = "SUFFIX_DIR"
ZIPCODE = "ZIP"
STATE = "STATE"
CITY = "CITY"
COUNTY = None
UNIT = "UNIT"


def filterTags(attrs):
    if not attrs:
        return
    tags = generic_addresses.parseAddressTags(
        attrs,
        prefix_dir=PREFIX_DIR,
        prefix_type=PREFIX_TYPE,
        addr_housenumber=ADDR_HOUSENUMBER,
        addr_housenumber_suffix=ADDR_HOUSENUMBER_SUFFIX,
        unit=UNIT,
        street_name=STREET_NAME,
        street_type=STREET_TYPE,
        postfix_dir=POSTFIX_DIR,
        city=CITY,
        state=STATE,
        zipcode=ZIPCODE,
    )
    generic_addresses.removeEmpty(tags)
    return tags

Generic Script

This should be named generic_addresses.py in the translations directory. (This should probably be added to ogr2osm proper).

"""
    A generic address parsing function
"""
import re
def parseAddressTags(
    attrs,
    prefix_dir=None,
    prefix_type=None,
    addr_housenumber=None,
    addr_housenumber_suffix=None,
    unit=None,
    street_name=None,
    street_type=None,
    postfix_dir=None,
    city=None,
    state=None,
    zipcode=None,
):
    tags = {}
    if city and city in attrs and attrs[city].strip():
        tags["addr:city"] = attrs[city].title()
    # COMB_HOUSE is HOUSE_NUMB + " " + HOUSE_SUFF
    if addr_housenumber and addr_housenumber in attrs and attrs[addr_housenumber].strip():
        addr = []
        addr.append(attrs[addr_housenumber])
        if addr_housenumber_suffix and addr_housenumber_suffix in attrs and attrs[addr_housenumber_suffix].strip():
            addr.append(attrs[addr_housenumber_suffix])
        tags["addr:housenumber"] = " ".join(addr)
    if unit and unit in attrs and attrs[unit].strip():
        tags["addr:unit"] = attrs[unit]
    if state and state in attrs and attrs[state].strip():
        tags["addr:state"] = attrs[state]
    if street_name and street_name in attrs and attrs[street_name].strip():
        address = []
        if prefix_dir and prefix_dir in attrs and attrs[prefix_dir].strip():
            address.append(translateName(attrs[prefix_dir]))
        if prefix_type and prefix_type in attrs and attrs[prefix_type].strip():
            address.append(translateName(attrs[prefix_type]))
        address.append(streetNameCasing(attrs[street_name]))
        if street_type and street_type in attrs and attrs[street_type].strip():
            address.append(translateName(attrs[street_type]))
        if postfix_dir and postfix_dir in attrs and attrs[postfix_dir].strip():
            address.append(translateName(attrs[postfix_dir]))
        tags["addr:street"] = (" ".join(address)).strip()
    if zipcode and zipcode in attrs and attrs[zipcode].strip():
        tags["addr:postcode"] = attrs[zipcode]
    return tags


def streetNameCasing(name):
    if len(name) <= 2 or (
        len(name.split(" ")) == 2
        and len(name.split(" ")[0]) <= 2
        and isDigitOrFraction(name.split(" ")[1])
    ):
        return name.strip()
    return mc_name(numberCasingInName(translateName(name, letterNames=True))).replace(
        " And ", " and "
    ).strip()

def mc_name(name):
    returnWord = []
    for word in name.split(" "):
        if word.startswith("Mc"):
            char = word[2].upper()
            word = word[:2] + char + word[3:]
        elif word.startswith("Mac"):
            char = word[3].upper()
            word = word[:3] + char + word[4:]
        returnWord.append(word)
    return " ".join(returnWord)

def isDigitOrFraction(word):
    pattern = re.compile("([0-9/])+")
    return pattern.match(word)

def numberCasingInName(name):
    cased = ""
    number = False
    for char in name:
        if number:
            cased += char.lower()
        else:
            cased += char
        if char.isdigit():
            number = True
        else:
            number = False
    return cased


def removeEmpty(tags):
    """
    Remove empty tags
    """
    toRemove = []
    for i in tags:
        if not tags[i].strip():
            toRemove.append(i)
    for i in toRemove:
        del tags[i]


def translateName(rawname, letterNames=False):
    """
    A general purpose name expander.
    """
    suffixlookup = {
        "Ave": "Avenue",
        "Blvd": "Boulevard",
        "Cir": "Circle",
        "Cl": "Close",
        "Conn": "Connector",
        "Cres": "Crescent",
        "Crt": "Court",
        "Ct": "Court",
        "Div": "Diversion",
        "Dr": "Drive",
        "E": "East",
        "Gr": "Grove",
        "Hwy": "Highway",
        "Lane": "Lane",
        "Ln": "Lane",
        "Lndg": "Landing",
        "N": "North",
        "Pkwy": "Parkway",
        "Pl": "Place",
        "Pt": "Point",
        "Rd": "Road",
        "Rwy": "Railway",
        "S": "South",
        "Sq": "Square",
        "St": "Street",
        "Sw": "South West",
        "Trl": "Trail",
        "W": "West",
    }
    # Assume letter names can have A and AA, but not AAA
    if letterNames:
        toDelete = []
        for entry in suffixlookup:
            if len(entry) <= 2:
                toDelete.append(entry)
        for entry in toDelete:
            del suffixlookup[entry]

    newName = ""
    for partName in rawname.title().split():
        newName = newName + " " + suffixlookup.get(partName, partName)

    return newName.strip()

Further Readings

A similar import in MA: Import/Catalogue/MassGIS Addresses