Osmosis/Development

From OpenStreetMap Wiki
Jump to navigation Jump to search

Introduction

This page contains various pieces of information on osmosis development.

Project Structure

The information about the project structure is no longer up-to-date: see Article from 2010[1]

Source Code Repository

The source code of osmosis is hosted on github: https://github.com/openstreetmap/osmosis.

For greatest visibility, pull requests should be made to this repository.

Top Level Directories

/ Contains top level project files including eclipse project, ant scripts, licence information, readme, etc.

/build Temporary directory created and used by the build process.

/src Contains the main application java code and resources.

/test Contains jUnit tests and supporting resources.

/bin Osmosis launch scripts. There are scripts for multiple environments, and for multiple osmosis applications.

/config Build related configuration files.

/dist Temporary directory created by the build process and used to store final distribution artifacts.

/doc All included documentation resides here. This includes the automatically generated javadoc.

/ivy A semi-permanent directory initially created by the build process to store the ivy dependency manager.

/lib All java library dependencies are stored here. Note that these are populated using the ivy dependency management tool.

/repo Any java libraries not available in public repositories. These are copied into the lib directory by the build process.

/script Various scripts included with the distribution. These primarily include database schema creation scripts.

Key Java Packages

Within the src directory, all code is organised into java packages. The key common packages are outlined in the table below.

Package Name Purpose
org.openstreetmap.osmosis All code is namespaced into this package. No code is directly stored at this level.
org.openstreetmap.osmosis.core The majority of code is nested within the core package. Code that is used by the standard osmosis command line tool including all standard task implementations are grouped within this package. Most code is grouped into sub-packages within this package. The top level package only contains the main Osmosis entry point and other key classes.
org.openstreetmap.osmosis.core.cli The command line parser for the standard osmosis application.
org.openstreetmap.osmosis.core.domain The osmosis data model. Data types such as Node, Way and Relation are contained in this package.
org.openstreetmap.osmosis.core.container Osmosis data model wrapper classes that are used when passing data through the osmosis pipeline. The main reason for their existence is to allow type specific code to be invoked without requiring the use of instanceof type functionality.
org.openstreetmap.osmosis.core.plugin Osmosis external plugin support.
org.openstreetmap.osmosis.core.task Most osmosis functionality is grouped into tasks that perform a single operation on osmosis data. For example the --read-xml task reads data from an xml file, the --write-mysql task imports data into a mysql database. This package doesn't contain task implementations, rather it defines the interfaces that each task must implement. Some tasks may consume data (will have Sink in their name), others will produce data (will have Source in their name), and others will be various combinations of both. Tasks may support different types of data such as standard entities, changes, and datasets.
org.openstreetmap.osmosis.core.pipeline The osmosis pipeline implementation is implemented in this package. It provides task type specific managers which integrate each task type into the pipeline. These managers are created for each task interface in the task package that requires pipeline support.
org.openstreetmap.osmosis.core.store Osmosis provides a custom form of serialisation in order to achieve satisfactory performance when dealing with temporary data. All support for this storage mechanism is provided in this package.

Other packages typically contain task implementations. Tasks that are logically related (eg. mysql tasks) will be grouped into a package. They will rely on many or most of the packages outlined above, but are usually independent of task implementations from other packages.

Development Environment

Eclipse Setup

Install the latest Eclipse from eclipse.org. Any eclipse bundle with Java development support will be sufficient, but I use the Eclipse IDE for Java EE Developers. The smaller Eclipse IDE for Java Developers should also be fine. The latest version at the time of writing is version 4.2.

Install GIT support. This is available via standard Eclipse repositories.

Install the 5.x version of the Eclipse Checkstyle Plugin via the Eclipse marketplace.

Import the build_support/osmosis_formatting.xml eclipse code formatting file into eclipse by opening up Window->Preferences, then selecting Java->Code Style->Formatting and importing the file from there.

Using a standalone Git client, checkout the Osmosis Source to a working folder, run the "ant build" command to ensure all jar files are available, then import all projects into Eclipse.

Database Setup

Two PostgreSQL databases are required for full unit testing; an apidb database, and a pgsql "snapshot" database.

Pgsql Database

  • Create a new PostgreSQL database called pgosmsnap06_test. Set the database owner to be a user called "osm" with a password of "password", or with the authorisation mechanism set to "trust".
  • Enable hstore and postgis extensions in the database. On current versions of PostgreSQL there should be no need to run SQL scripts manually, enabling the extension should be sufficient.
  • Enable the pl/pgsql language extension if it isn't already enabled.
  • Run the following scripts to create the schema.
    • package/script/pgsnapshot_schema_0.6.sql
    • package/script/pgsnapshot_schema_0.6_action.sql
    • package/script/pgsnapshot_schema_0.6_bbox.sql
    • package/script/pgsnapshot_schema_0.6_linestring.sql

Apidb Database

  • Create a new PostgreSQL database called api06_test. Set the database owner to be a user called "osm" with a password of "password", or with the authorisation mechanism set to "trust".
  • Enable the pl/pgsql language.
  • Run the following scripts to create the schema.
    • package/script/contrib/apidb_0.6.sql - The main schema creation script. This has been taken from a database built using the ruby rake command.
    • package/script/contrib/apidb_0.6_osmosis_xid_indexing.sql - This enables additional indexing required for efficient replication processing.

Software Packaging

Creating a Release

Before you start, ensure that your ~/.gradle/gradle.properties file is correctly configured. See the gradle.properties in the Osmosis source tree for more information. Specifically, the following properties need to be set (Note that they can also be specified by Gradle -P command line options if required):

signing.keyId={GPG key ID}
signing.secretKeyRingFile={absolute path to ~/.gnupg/secring.gpg}
signing.password={GPG key passphrase}
osmosisSigningEnabled=true
sonatypeUsername={username}
sonatypePassword={password}

Configure the correct user.signingkey ID for your GPG key.

Follow These Steps

  1. Ensure that osmosis/src/dist/changes.txt is up to date with key changes since the last release.
  2. Ensure that the full test suite is passing within the docker-based dev environment, takes ~5min:
    • ./docker.sh ./gradlew clean build
  3. Create a new signed tag using the command "git tag -s <tagname>", and push to the openstreetmap/osmosis github repository:
    • git tag -s {major}.{minor}.{patch}
    • git push --tags
  4. Perform a build, takes ~6min:
    • ./docker.sh ./gradlew -PosmosisBuildType=RELEASE clean build
  5. Upload the two package project distribution files in osmosis/build/distributions to a new release on Github.
  6. Upload the artifacts to the OSS Sonatype repository, takes ~5min:
    • ./docker.sh ./gradlew -PosmosisBuildType=RELEASE publish
  7. Log into https://oss.sonatype.org, open Staging Repositories, and find the auto-created staging repository. Close it, then Release it. Refer to the OSS Sonatype Usage Guide for more information.
  8. Ensure the wiki is up-to-date with the latest version.
  9. Send an email to the Osmosis development list.

Version Management

The versioning numbering schema of osmosis is 0.a.b where "a" is the major version, and "b" is the minor version. The "b" minor number is optional and only used when multiple releases are made to fix issues with a single major release.

All releases are tagged in git. The point at which a major release is made is currently ad-hoc and chosen when one or more significant enhancements have been made and stabilised.

Coding Guidelines

Formatting

Osmosis code formatting is closely to the eclipse defaults. The main exceptions are some minor changes to add double blank lines at several points in the class file.

Follow the instructions at Osmosis/Development#Eclipse_Setup to configure Eclipse.

Checkstyle

config/osmosis_checks.xml contains the current checkstyle rules file. Only a few basic checkstyle rules are being enforced at this point due to the time required to update existing code to pass. This will be further refined over time.

Checkstyle is enforced in the standard ant build. Follow the instructions at Osmosis/Development#Eclipse_Setup to configure Eclipse.

Logging

Configuration

Osmosis uses the JDK logging facilities. This decision was made originally to reduce dependencies on external libraries such as log4j. As osmosis evolves, the library requirements are increasing and tools such as ivy have now been employed to manage dependencies so this is less of a concern than it once was, but at this point there is no reason to change.

The logging configuration is programatically configured on application startup. If osmosis is used as a library the command line launcher can be bypassed and an external logging configuration used.

Coding

Logger instances should be statically created at the top of source files and should look similar to the following code snippet.

private static final Logger log = Logger.getLogger(Osmosis.class.getName());

The name of the logger should always be the name of the class containing the logger.

The JDK logging provides a number of logging levels. These should be used as follows:

  • SEVERE - Used for application errors. Typically this should not be used, but an exception thrown instead. Exception logging should typically be performed at thread entry points.
  • WARNING - This should almost never be used in practice. Err on the side of caution and throw an exception, even if the error condition is recoverable.
  • INFO - Used to log information about major application events such as startup and shutdown. Information messages should not be used during normal application execution because an application logging at INFO level should be silent for the majority of application execution.
  • FINE/FINER/FINEST - The majority of application logging should use one of these logging levels. To provide some guidance on their use, FINE should be used within task implementation during major events such as start and stop, FINER should be used most of the time, and FINEST should be used for detailed logging that may produce high volumes of output.

Design Decisions

Entity Mutability

As of version 0.31, all entities are now mutable. This was made to simplify development and avoid the need to clone objects whenever changes are made.

Tag Storage

Tags are currently stored as a Collection of Tag objects on the Entity class. A collection is used because tags are an unordered collection. Internally the tags are stored within an ArrayList which is typically the most efficient collection type.

In 0.6, each tag is unique within its entity. For example, a highway tag can only occur once on a way. This would allow tags to be stored within a Map. This will not be done due to the overhead of calculating hashCode values for every key prior to storing in the map. This overhead can be incurred as required through the use of utility methods that return tags within a map.

Entity Minimal Data

Entities do not contain optional attributes, they only contain the essential attributes that are always populated. This keeps the osmosis core simple and easy to understand. Adding optional attributes adds considerable complexity to task implementations and affects reliability.

It is possible to extend the entity classes within task implementations, but core osmosis tasks won't support the additional attributes and may not propagate them through the pipeline.