User:Lonvia/GSoC 2016 Nominatim Projects

From OpenStreetMap Wiki
Jump to navigation Jump to search

Dear GSoC student,

you have shown some interest in the geocoding project proposals for this year's Google Summer of Code. This page will give you a few pointers where to start getting acquainted with the project.

Preparation for all Nominatim projects

Get familiar with OpenStreetMap

You are hopefully already familiar with the OpenStreetMap project and maybe have even contributed to the map. If not, this is the time to do so. Go through the beginners pages on this wiki, create an account and start mapping.

As this project is about geocoding, you should in particular familiarize yourself with mapping of addresses. Spend a few hours collecting house numbers in your neighborhood and enter them into OSM. Now play around with the search function on the main page. Find out which searches work to find your newly entered data and which don't. Bonus points if you can point out an example that does not work as you expect and you can explain why (for example by pointing to the relevant open issue on github).

Dive into the Nominatim code

Get the Nominatim source code and install your own database using an excerpt of one of the US states. You can get the OSM data from http://download.geofabrik.de/north-america.html. The Nominatim/Installation instructions are fairly detailed and should not pose many problems for anybody with a minimum knowledge of Unix system administration. If you find the process too difficult to go through on your own, then this project might not be for you.

Once the installation is working, get the US Tiger house number data for 'your' US state and add it too your database. Instructions for doing this are on the installation page as well.

Got everything up and running? Congratulations. It's time to learn more about the project of your choice.

About the OpenAddresses project

Starting point for this project is also the US Tiger data as this already implements an import process for external data. Have a look at the documentation for Tiger data to understand its format. Have a look at the import scripts, namely at utils/tigerAddressImport.py, and try to understand which data is actually used by Nominatim. Then familiarize yourself with the data of the OpenAddresses project and see if you can find similarities and differences. This should give you a better idea of the amount of work required for this project. If you get stuck in your research, feel free to join the https://lists.openstreetmap.org/listinfo/geocoding mailing list and we can discuss further details.

I hope this helps. Looking forward to your application.