Address Data Import for Athens-Clarke County

From OpenStreetMap Wiki
Jump to navigation Jump to search


This Import Plan Outline is intended to help ensure that your "Import plan" document covers as many of the common questions about imports as possible. Just create your own page and cut and paste the wiki text from here (starting from below the line).

Please! If you identify ways that this outline didn't meet the needs of your import (key evidence of this: tons of questions or alarm bells on mailing lists!), please return and fix this page.

The link to find this outline is here.



(Address Data Import for Athens-Clarke County) is an import of Athens-Clarke County Open Data Address dataset which is of type (address) covering (Athens-Clarke County in United States). The import is currently (as of June 26, 2022) in the last stages of Q&A.

Goals

The goal of this import is to augment existing address data in Athens-Clarke County.

Schedule

Planning for the import began in the Winter of 2021 and is scheduled to be completed around end of July 2022.

Import Data

Github Project Page

The github project page for the address import is located here.

Background

Provide links to your sources.

Data source site: https://data-athensclarke.opendata.arcgis.com/
Data license: ACC OpenData page for data license: https://data-athensclarke.opendata.arcgis.com/maps/AthensClarke::acc-address-point/about and page on exact license terms: https://creativecommons.org/licenses/by/4.0/
Type of license (if applicable): CC BY License.
Link to permission (if required): e.g. link to mail list reference url - http://lists.openstreetmap.org/pipermail/imports/2012-December/001617.html

OSM attribution (if required): http://wiki.openstreetmap.org/wiki/Contributors#yourdataprovider
ODbL Compliance verified: Yes.

OSM Data Files

The .osm file for import is here.

Import Type

This is a one time import and will be done semi-automatically through GQIS, R-Studio and JOSM. The method for importing the data will be done by using JOSM and the Open Data and Conflation plugins of JOSM to semi-mechanically add in the addresses of Athens. This will take time but will allow for careful and consistent mapping methods of the addresses.

Communication

Emails for Receiving Appropriate Permission and Licensing

Ian Van Giesen <ianvangiesen@gmail.com> Sat, Sep 19, 2020 at 12:03 PM
To: joseph.dangelo@accgov.com

Hello Joseph Angelo,

Hope all is well with you.

My name is Ian Van Giesen and I am a current resident and student in Athens. I am also an avid mapper of Athens-Clarke County through OpenStreetMap (OSM), a project to create and distribute free geographic data for people across the globe. To my knowledge it is the largest open-source, volunteer-managed alternative to other large GIS and map providers in the world. I have spent a good deal of time the past 5 months greatly improving the quality of the map around the Athens-Clarke County area, having added several thousand new building outlines that have enhanced the quality of the map around the county.

I am emailing you because I am at a point now where I am interested in adding addresses to all the buildings and houses that I have added in. I recently became aware of the ACC Open Data portal which appears to provide a great deal of useful information, including address points. As far as I can tell, the ACC Address Point data set would be perfect for importing en masse all the address points needed to get a majority of addresses added into OSM.

My question for you is whether or not there are any copyright restrictions on the data or if the data is in the public domain.

In any case I look forward to your response to this inquiry.

Thank you very much,

Ian

Joseph D'Angelo <Joseph.D'Angelo@accgov.com> Mon, Sep 21, 2020 at 8:29 AM
To: Ian Van Giesen <ianvangiesen@gmail.com>

Good morning Ian,


Thank you very much for your inquiry, your request, and most of all your contributions to OSM. We strive to offer as many up-to-date, clean, and accurate open datasets as possible for just the kind of purpose you’re describing. Please feel free to use any information you find on the portal in any way you see fit. I am glad you asked!


Also available via the portal are all known building footprints. They typically run about a month behind new construction, and like the addresses, are available to all.


Sincerely,


Joseph D’Angelo

Emails to Import Listserv for Communicating Intent

Ian Van Giesen <ianvangiesen@gmail.com> Wed, May 18, 2022 at 9:34 AM
To: imports <imports@openstreetmap.org>

Good morning all,

I am sending this email to an import I began organizing a year and a half ago, that I am finally getting back to.

I have the data filtered, transformed and ready to import (as described on the import wiki page here: https://wiki.openstreetmap.org/wiki/Address_Data_Import_for_Athens-Clarke_County#Import_Data). I will add the .osm later today to the wiki page once I have access to my computer again.

If anyone has any questions or comments please let me know. I will wait about a week or two, before I start in earnest.

Thanks everyone for their time and effort.

Best,

Ian

Data Preparation

Data Reduction & Simplification

To reduce the amount of data I will need to import I will filter and transform the data to delete fields that are not active and other fields that will require manual conflation checking to ensure they are reasonable addresses to add.

As there are a number of existing addresses in Athens-Clarke County, I will use the conflation plugin in JOSM to speed up the import process for nodes that do not match any other existing address tags in the county. The conflation plugin will also help me avoid adding addresses that already exist.

Tagging Plans

In order to map source attributes to OSM tags, I have gone through and removed all tags that do not have common counterparts in the OSM community n my area. For example I decided to use addr:unit, instead of other possibilities (e.g. addr:flats) to describe how unit numbers will be mapped, as it seemed to be more customary and appropriate in Georgia and in the U.S.

I plan on only having the necessary addr:*=* key-value pairs present in the OSM file before upload.


I will use the following tags for the addresses I import in:

addr:housenumber=*

addr:unit=*

addr:street=*

addr:postcode=*

addr:city=*

ref:athensclarkeaddress=*

Changeset Tags

I will use following to track the changesets associated with this import:

I will also be importing under the account and username IanVG_Import, to keep the changesets in a separate account.

The link to this wiki page will also be added under source, in order to link curious and interested mappers to this page.

Data Transformation

Brief overview of my data transformation process

  1. Download .csv file from ACC Open Data website.
  2. Transform and filter data in R-studio using R notebook script.
  3. Load into JOSM and filter out key-value pairs with *="NA" values.
  4. In JOSM use Overpass-Turbo to download all objects in Athens-Clarke County.
  5. Filter out objects that do not have a addr:*=* key in them.
  6. Use the conflation plugin in JOSM to search for matching addr: tags and conflate data that is more complete and/or accurate.
    1. This process will also add the ref:athensclarkeaddress ID data to addresses, which could make future imports easier.

The transformation I am performing on the .csv file containing the address data includes:

DELETING THE FOLLOWING COLUMNS:

  • OBJECTID
  • ParcelID
  • FullAdd
  • HouseNum
  • HouseNumEx
  • PreDir
  • PostDir
  • FullStreetName
  • UnitType
  • Building
  • POSTALCITY
  • State
  • AddClass
  • AddType
  • AddStart
  • AddEnd
  • EditDate
  • JOINID
  • GlobalID

THEREBY KEEPING THE FOLLOWING COLUMNS:

  • X
  • Y
  • AddressID
  • FullHouseNum
  • FullHouse
  • StreetName
  • StreetType
  • Unit
  • Floor
  • City
  • Zipcode
  • Status
  • Anomoly
  • Comments

FILTERING AND THE FOLLOWING FIELDS:

  1. Include only 'Active' from the Status column and delete the rest.
    • Still waiting for response on the meaning of 'Retired', 'Reserved', and 'Potential'
    • 2214 rows not with 'Active.'
  2. Potentially only including blanks from 'Anomoly' column
    • I'm going to only include blanks and NA (just "") from the anomoly column for now. Later, I will go and individually add in the addresses from the anomoly column afterwards.
    • 55 rows not with NA or blanks.
    • Saving rows with anomoly values in ACC_Address_Other.csv
  3. Potentially only including blanks from 'Comments' column
    • I deleted all entries (rows) with any comments. I may go back in later to update these addresses. Some of the comments indicated that the address was soon to change or that it needed to be merged with another.
    • 4990 rows with comments.
    • Saving rows with comments in ACC_Address_Other.csv
  4. Street type: "er" for 250 Oglethorpe Er, not sure what this means, I deleted it
    • 1 row with 'er.'
    • Saving rows with "er" values in ACC_Address_Other.csv
  5. Filter out FullHouseNum with values of 0.
    • 693 rows with 0.
    • Savings rows with 0 in ACC_Address_Other.csv
  6. The filtered csv had 332 addresses that had a duplicate. I used the distinct() function to remove one address from each of those duplicate addresses (thereby removing 332 addresses.
    • Saving the duplicate addresses as well in ACC_Address_Dup.csv

TRANSFORMING THE FOLLOWING COLUMN NAMES:

  • AddressID -> ref:athensclarkeaddress
    • Depends, likely do not include if mechanical update process is used. Waiting for clarification on guidance on this issue.
  • HouseNumber -> addr:housenumber
  • StreetName -> addr:street
  • Unit -> addr:unit
  • Floor -> level
  • City -> addr:city
  • Zipcode -> addr:postcode
  • Changing the floor column to "level" as this is not officially part of the complete address for the data points.
    • Changing the BASEMENT to -1. (as per the level page).
    • I subtracted one (minus one) from each level row with a numerical value. (Eg changed all the 1's to 0's)
      • This is accordance with the level page that -1 is for basements, 0 for ground levels and 1 for the first level above ground level.
    • Deleted the field (but not the row) for the ".." level field
    • Changed the one BOTTOM value to -1
    • Changes the 396 "GROUND" values to 0
    • Delete rows with the values "D" and "X" and "SIDE"

IMPORT .CSV INTO JOSM:

  • I will use the West Georgia NAD-1984 projection as specified in the original file on the ACC OpenData website.

FILTERING THE FOLLOWING IN JOSM:

  • Filter out addr:unit=NA
  • Filter out level=NA

Data Transformation Results

Post a link to your OSM XML files.

Data Merge Workflow

Team Approach

I'll be working solo on this project. Reaching out to others regarding this import, but likely only IanVG will be involved in this project. Reached out to Randal Hale about bringing him back onto the project. He was the one that initiated the data import originally.


I will create a new account; IanVG_Imports to facilitate the import.

Workflow

  • Load shapefile into JOSM using the opendata plugin.
  • Use automatic download as I move feature to download areas of interest
  • Use Overpass Query wizard to run:
    • "addr:housenumber" = * OR building=* in "Athens-Clarke County"
  • Use filter tool to filter out everything except:
    • Buildings

Nodes and ways with addresses

  • In addition filter out "type=relation" kinds of objects as the conflation plugin cannot handle them
  • Use the conflation plugin (as specified below) to conflate the right points into the map.
    • Use: Disambiguating Method with Centroid Distance set to 10.0
  • Manually go about importing and merging (when possible) the address nodes with building ways
  • Merge without Warning: addr:city, addr:postcode, addr:street, level, ref:athensclarkeaddress

Conflation

Semi-Manual process in JOSM using the Conflation Plugin. Because of some plugin updates that were not yet pushed to the JOSM (here), I downloaded a custom .jar file for using the conflation. I will upload this to the GitHub website for this project (here). I will use the conflation plugin with the following settings applied:


tags: addr:housenumber, addr:unit, addr:flats, addr:city, addr:postcode, ref:athensclarkeaddress

Select option to merge all tags.

The conflation plugin will confirm that no existing way and nodes already have identical address data.

There a few dozen relations in Athens that have address data associated with them. Those will be handled on a case-by-case basis.



This query works as expected! Remember to set the bbox using the nominatum search for a bounding box around an area. Make sure to select Athens-Clarke County Administrative Boundary.

[out:xml][timeout:90][bbox:{{bbox}}];
(
  nw[~"^addr:.*"~"."];
);
(._;>;);
out meta;

Make sure to unselect the untagged nodes using the search function (see picture for the search query to do this). I was also having trouble using the conflation plugin and it turns out that I had duplicate nodes in the subject layer (i.e. in OSM). I will need to fix this through validator before I can import.
Conflation Plugin Example Level 3.JPG ACC Address Import Search Settings.png Filter Plugin Setting.jpg

QA

No QA plan.

See also

The email to the Imports mailing list was sent on 2020-09-19 and can be found in the archives of the mailing list at [1].

Failed Queries and Lessons Learned

In QGIS

Previously (sometime in 2021) I used QGIS to do some filtering and transformation of the data, but I found it to be less replicable and more time consuming that simply writing a script in R-studio. So I include this section as a lesson learned experience.


The first two queries do not work as expected.

[out:xml][timeout:90][bbox:{{bbox}}];
(
  nw[~"addr"~"in"]/*athens-clarke*//*county*/;
);
(._;>;);
out meta;
[out:xml][timeout:90];
{{geocodeArea:Athens-Clarke County}}->.searchArea;
(
  nw["addr:housenumber"](area.searchArea);
  nw["building"](area.searchArea);
);
(._;>;);
out meta;


Old Building Footprint Import Discussion

Location: Athens-Clarke County, Georgia, USA

About: We received a data donation from the GIS department from Athens-Clark County. OUt of all the data - the most useful for our efforts is the building foot print data. In that data layer there appears to be about 55,688 polygons. Some are multi-polygons and will have to be dealt with individually.


Plan: We consider this import to be large (at least for us).

1. Segment the county into a grid. Since we want to do a neighborhood at a time we are using the census to create a "fluid" grid.

PDF Showing Grid

2. Data will be separated out approximately 75 projects since there are 75 grids that were created. That gives us a manageable upload size since the census will have smaller grids in more densely populated areas. Larger grids will be in the rural areas.

3. We plan on using the open data tool with JOSM. It gives us a chance for validation checks before an upload.We also plan on testing with OGR2OSM

TAGS that will be used: building=yes and source=ACCGIS

If the opportunity arises we may do some classification by census block into houses, commercial, residential, shed, etc.