upload.py

From OpenStreetMap Wiki
Jump to: navigation, search


upload.py and its associated scripts are a set of tools for performing bulk imports for API v0.6. The utils are lower level than the Bulk_upload.pl and bulk_upload.py scripts and, as such, give the user more control over the different parts of the process. They are best used from inside a bash script.

It is currently available in SVN (click to view online). The other possibilities for importing data are Bulk_upload.pl and bulk_upload.py and a PHP version thereof, which are generally easier to use. So, if you don't need fine-grained control, use them instead.

In case of a network error during an import, upload.py and friends allow recovering the process from a given point without creating unwanted duplicates or having to revert more than the minimum of the last changes made. This avoids polluting the database with unwanted objects (which, even when deleted, are available in the edits history).

The bulk_upload_sax.py described at the bottom of the bulk_upload.py page still needs to be used for datasets that are too big for the amount of available memory.

To check the scripts out of subversion, use

svn co http://svn.openstreetmap.org/applications/utils/import/bulkupload/

The scripts are described, and their usage explained, here.


Usage

osm2change.py

(This script is for python version 3. For python version 2, use osm2change-python2.py instead.)

./osm2change.py osm-file.osm

This converts an OSM file with some or all objects modified / added / removed, such as saved by JOSM, into an OsmChange changeset file. The resulting file doesn't contain unmodified objects - only the changes. This kind of file can be operated on by the other scripts below.

upload.py

(This script is for python version 3. For python version 2, use upload-python2.py instead.)

./upload.py [-u username-or-email] [-p password] [-c yes] [-m changeset-comment] [-s changeset-id] [-n] [-t] files.osc ...

This script in the most basic form takes an .osc (OsmChange) file and uploads it to OpenStreetMap, asking the user for any necessary information that has not been provided on the command line using the optional switches. It will create a new changeset and close it at the end if an open changeset is not specified (with -s).

Note that files are not split automatically like with bulk_upload.py, so API v0.6's 50000-element limit applies. Even with fewer elements, it's good for stability to not upload more than about 1000-5000 elements in one call to the script. (bulk_upload.py sets this limit at 1000 elements, but the splitting is automatic).

After a successful upload, a .diff.xml is created containing the server's response, which is useful for continuing a multi-part upload or making further edits to your original .osm file. See next section for how this can be used. The format of the response is described at API v0.6.

-u and -p let you specify your OpenStreetMap user name and password for authentication (necessary for an automated / scripted process). If not provided, you will be asked for it after starting the script.

-m lets you specify the changeset comment if you don't want to be asked for it at run-time. Instead of providing it on the command line or typing it in, you can provide a file named identically to your .osm file but ending in .comment instead. For example, name your OsmChange file roads.osc and name the comment file roads.comment. The file should be a text file containing nothing else but the comment text. Please provide meaningful comments for your changes. Setting other changeset tags is also a good practice - see more below.

Specify -n to cause the script to open a new changeset and exit without uploading anything. It will then print the changeset Id, which can be used on subsequent calls to the script. Even though nothing is uploaded, an .osc file still needs to be given and the .comment file, if present, will be used to set the changeset comment.

-s is then used to pass the Id of an open changeset to which changes are uploaded.

-c yes lets you confirm the upload if you don't want to be asked to type yes before starting the upload (necessary for an automated / scripted process).

-t (not available in upload-python2.py) tells the script to automatically retry upload after a conflict is returned from API, skipping the conflicting element. Currently only the 409 Conflict: Version mismatch type of conflict is handled. The IDs of conflicting elements are printed out for manual resolution but not stored anywhere, so it's a good idea to capture the script's output.

close.py

(This script is for python version 3)

./close.py Id

This closes an open changeset such as one opened with upload.py -n ....

set-changeset-tag.py

(This script is for python version 2)

./set-changeset-tag.py Id key1 value1 key2 value2 ...

This is used to set tags on an open changeset. The only other option for adding tags to your import is to manually edit them in the upload.py script. This script downloads all tags for the changeset, adds to them the new tags provided or overwrites their values with the new values, and writes them back to OpenStreetMap. The comment tag can be overwritten too.

split.py

(This script is for python version 3)

./split.py file.osc [number-of-pieces]

This splits an .osc file in pieces of equal size (default 2 pieces). These files can then be uploaded to OpenStreetMap taking into account the .diff.xml files (see section below). If there's a file named <file>.comment, .comment files will be created for each part, with ", part N/M" appended.

diffpatch.py

(This script is for python version 2)

./diffpatch.py file.diff.xml file.osc

This patches an OsmChange file with changes that resulted from an upload. For example, every part of a multi-part upload needs to be patched with the results of uploading the previous parts of the same split .osc file. This is because when objects are created, new Ids are assigned to them by the OpenStreetMap API in place of the id placeholders. Then if one part of the upload created nodes and these nodes are part of a way or a relation that is in another part of the multi-part upload, the id placeholder previously assigned to that node needs to be replaced with the new Id given to it by the API. See section below about how to handle multi-part uploads.

The resulting patched file is written as file.osc.diffed

osmpatch.py

Performs the same operation as diffpatch.py on a .osm file, so that it can be, for example, edited using JOSM after the changes in it have been uploaded.

change2diff.py

This converts a changeset file obtained from the OpenStreetMap server to a .diff.xml that could have resulted from uploading that changeset with upload.py. This is useful in the rare case where a network error occurs after uploading but before an API response is received. Such situations are possible and unavoidable because of the way the API calls are designed. The API normally will not commit the changes you uploaded if you disconnected (e.g. due to network error) before all of the data has been uploaded. However, it will proceed to commit the changes if you uploaded 100% of the data and are waiting for a response (the .diff.xml file) and disconnect at that point. These cases are difficult to handle because the waiting period is often many times longer than that taken to upload the data to the server. THIS SCRIPT IS NOT UNIVERSAL and will not always work - knowledge of the way the API works is necessary.

When everything else fails, revert only the last part of the upload and continue, beginning with that last part again.

smarter-sort.py

This script re-orders changes in an OsmChange file so that they appear a little more human-like and minimize the probability of conflicts during a multi-part upload.

Depending on the origin of your .osc files, the order may already be as needed. For example, files produced by osm2change.py will have a suboptimal ordering, with creations first, modifications next, deletions last. Inside creations, relations come after ways, and ways come after nodes. Inside deletions, the order is reversed. If you then split such a file, the first couple of parts will have only nodes, and ways or relations connecting them will come later and in bulk. As a result, if another user deletes one of these new nodes before you upload the ways that use them, then a conflict will occur. This is entirely realistic, for example, if a newbie user blindly follows the advice of the JOSM validator to delete what it thinks are untagged/unconnected nodes. Many other similar situations are possible.

When you use this script before splitting the big changeset, then the most independent changes will appear first. A dependency tree is calculated for creations, modifications and deletions. Additionally, each deletion operation must depend on any modification and deletion that appears before it in the original file because there may exist a dependency that is not implied by the contents of the file alone. For example, when a node is removed from a way or relation, the way or relation must first be modified to not include the given node, and the actual deletion of the node must come after it. But since only the new version of the way or relation is present in an .osm or .osc file, the dependency can not be inferred from the contents of the file).


Multi-part uploads

Here's an example bash script that does a multi-part upload of a file named roads.osc, this script can be use as a base for a script for an individual import. Errors should be handled manually.

#! /bin/bash

# For example: sh batch.sh roads "This is the import of all roads in Antarctica"

[ $# -eq 2 ] || exit -1

input=${1%.osc}
comment=$2

ident="-u joe.mapper@gmail.com -p secret"
parts=50

echo "$comment" > $input.comment

./split.py "$input.osc" $parts || exit -1

chgset=`./upload.py $ident -c yes -n "$input.osc"`
[ -z "$chgset" ] && exit -1

./set-changeset-tag.py $chgset import yes reviewed yes source "Antarctic Bike Club"

for num in `seq 1 $parts`; do
        ./upload.py $ident -c yes "$input-part$num.osc" -s $chgset || exit -1

        for rnum in `seq $num $parts`; do
                ./diffpatch.py "$input-part$num.diff.xml" "$input-part$rnum.osc" || exit -1
                mv "$input-part$rnum.osc.diffed" "$input-part$rnum.osc" || exit -1
        done
done

# Optional: ./close.py $chgset