Draft/Edit Policy

From OpenStreetMap Wiki
Jump to: navigation, search
That's a very difficult thing to separate, as there appears to be much disagreement (even within DWG) about what the current practices should be. I'd suggest just moving to the topic of a new policy. I think there's enough momentum to support it right now. Jeffmeyer 00:52, 27 December 2012 (UTC)
  • It is not focused on one particular type of edits but gather common practices/policy of any automated, manual, import or whatever type of edit (while still being focused on automated edits)
We should add a definition of "mass edits" - Jeffmeyer 00:52, 27 December 2012 (UTC)
"automated edits" is defined below, I prefere the "automated edits" wording as "mass" isn't really/only what we are targetting here. sletuffe 11:03, 30 December 2012 (UTC)
  • Its goal is also to explain why some practices are recommended others are mandatory, to avoid entering the "us against them" by explaining why we (DWG) are doing it this way.
  • As much as we can, we should try to explain things with examples and in an understandable manner for every one, trying to write a "law style text" to cover any corner case will lead to people trying to find flaws and start playing on details
All text above isn't meant to stay, it's here just for the draft period or should be moved down there sletuffe 11:51, 30 December 2012 (UTC)

The Data Working Group

The Data Working Group (data@osmfoundation.org) is authorised by the Foundation to deal with accusations of copyright infringement, bad imports, and serious Disputes and Vandalism.

They are volunteers trying to help protect good data from other contributors or mass edits gone wrong, especially those who have the technical knowledge to mass modify/change/delete things, that is why the DWG ask anyone planing to do edits in some automated way to read those recommendations before proceeding.

Types of edits

Because not all edits have the same potential at creating problems, policies can be loosen or harden depending on the case.

Rather than discussing types of data first, maybe it would help to describe sources of data & then the type of edit:
    • Sources of data: visual observation, GPS tracks, aerial imagery tracing, map tracing, external data sets, OSM itself
    • Types of edits: recording of visual observations, importing of GPS tracks, tracing of aerial imagery or maps, importing of external data sets, corrections of OSM errors
Jeffmeyer 00:52, 27 December 2012 (UTC)

Reviewed edits

I'd recommend changing the title of this section - "Low potential of problems" is not a "Type of edit" Jeffmeyer 00:52, 27 December 2012 (UTC)
What about this ? sletuffe 11:14, 30 December 2012 (UTC)
What I refer as "problems" has been seen as unclear, I need to find a way to express what problems I'm referering to : This is not about tagging errors (like unconnected streets or intersecting buldings) it is about things related to a mass of any single problem. Two intersectings buildings are not what I want to refere to as problem in this context, because anyone can fix it easily (juste delete the two buildings), but hundreds of those problems make it harder for every one to fix and that is what I refer as "problems", because it was an automated edit in the first place. sletuffe 11:17, 19 December 2012 (UTC)
I'm thinking about adding something about the fact that "the problems comes from the fact that there are no easy (one clic) tool to revert any number of problematic edits" like it exists in wikipedia for instance. Which mean a "power user" who adds tons of bad data can only be reverted by other "power users", it excludes several other people who know the place and know it is a bad edit to be able to deal with "problems" themself, that is why automated edits can become potential problems because it unbalance the power toward those who are technically more able. sletuffe 11:39, 19 December 2012 (UTC)

Manual edits

Those are the less potentially problematic edits, because mostly done in small* scale and beside the common recommendations of "don't delete valuable data," "avoid disputes and discuss," and don't use copyrighted data as source there isn't anything special about those. (This is a sweeping statement. Jeffmeyer 00:52, 27 December 2012 (UTC))

* (What is "small"? Look at some of the manual bing edits done - they can be quite large. Jeffmeyer 00:52, 27 December 2012 (UTC))
What about "mostly done in small scale" ? What I want to do is define "manual", as opposed to "automated". A better option might to to express that "manual" are all edits that are not "automated" ? sletuffe

Manual edits are what contributors do most of the time* when they interpret and trace aerial images, import and convert their own gpx tracks, and convert their own ground knowledge to data and/or vector drawings.

* (Do we have any statistics about this? Jeffmeyer 00:52, 27 December 2012 (UTC))
I don't. That's wet finger guess, but if I watch [1] for a few minutes, that's what I see. sletuffe 11:14, 30 December 2012 (UTC)

Automated edit of previously reviewed data

When you convert a tag on all objects you previously edited (because you change your mind), or import data that you have created elsewhere, data you create with an helper tool (like lakewalker) those are not so problematic edits because you don't change other's data.

Same as manual ones.

I think this should be dropped since it is a niche case, and also it is untrue that you automatically have the right to make automated edits if they only apply to "your" data - you could still mess this up technically so at least some of the guidelines apply. --Frederik Ramm 22:02, 18 December 2012 (UTC)
+0.5. What I wanted to do is avoid people arguing about what is or isn't a automated edit by defining what is "automated" in this context, and explaining that those automated edits are not in need of complex policy. But you are right, it might still, rarely mess up technically. sletuffe 11:21, 19 December 2012 (UTC)
As a mapper, I would prefer the focus on other's data to remain in place in some manner. I don't need anybody's permission to tag something as a=b, so why should I need it to retag these objects to a:b=yes? I could have so right away with a bit of foresight, to the exact same ultimate result. And I could have botched it then, too. --Tordanik 14:42, 2 January 2013 (UTC)

Automated edits

Those do not involve verification of each object with some other source and no individual control of each objects. (They can be imports, changing/adding/deleting/copying in one go of tags or shape modification on multiple objects)

The software you plan to use for those edits is irrelevant, may it be JOSM, your own, those are classified as "automated edits".

Those edits are over human acceptable checks/review, what will be edited obey to predefine rules that you have set.

A note on imports : Imports are vector data that you get from elsewhere and convert to put it into OSM, read imports in addition to the following recommended/mandatory practices.

Why are they always vector data? Why not just use the strict definition of import - it is a transfer of a data set from one format to another. Jeffmeyer 00:52, 27 December 2012 (UTC)
Both my definition and yours are close, I said "vector" to make it clearer. But When you say "data" in the context of OSM, it will be vector data as opposed to raster data. Your definition is much general than mine, adapted to OSM. "a transfer of a data set from one format to another" : If I just convert shapefile format to osm, that isn't an import if I don't upload it to OSM. sletuffe 11:19, 30 December 2012 (UTC)
Unclear what the type 1/type 2 distinction is good for. Description difficult to understand. --Frederik Ramm 22:04, 18 December 2012 (UTC)'
Later I reuse the short description "below ~100" to make distinction and tolerance apply. But I'm unhappy with such a definite description of "100" because the type of edit I refer to "type 1" is a more comple explanation of what I meant. You once said that if someone was to import 50 buildings in his own town, noone will complain nor the DWG will force him to create a 2nd account because this is human reviewable because it is a low quantity. Even if that is still an automated edit, policy or recommendations will be lowered not to bother too much those users. (But I admit it is poorely expressed and worded) sletuffe 11:27, 19 December 2012 (UTC)
Any use of numbers in the definition of an import is almost certain to end in frustration. A quick review of a recent week of top OSM contributors (https://docs.google.com/a/gwhat.org/spreadsheet/ccc?key=0As7xcVwkvHhtdHJoRzVtbzVNMmNvSjR1LWFQQm9SZ2c) shows that many are not what we would consider to be imports, such as Bing tracing, conversion of shoreline and border data, etc., HOT projects, and others. These include updates of over 5K / per day for entire weeks and months at a time. Jeffmeyer 00:52, 27 December 2012 (UTC)
This is a bot, right? Jeffmeyer 00:52, 27 December 2012 (UTC)
Could also be Overpass query + Josm for mass-retagging, not sure everyone would call this a Bot --Yvecai 11:43, 29 December 2012 (UTC)
+1 The tool used for the edit shouldn't be relevant, what is done is. (A large mess can easily be done with JOSM) sletuffe 11:23, 30 December 2012 (UTC)

Mandatory requirements

Not complying to those rules is one of the reasons that could lead to your account be temporarily blocked by the DWG

Human Requirements

Discuss your plans

If you plan to make an automated edit, outline it beforehand and discuss it on a suitable mailing list (a regional, national or international list depending on the scope of your planned change). We do not require or recommend a formal vote, but if there is significant objection to your plan - and even minorities may be significant! - then change it or drop it altogether.

If you find that your plan is widely accepted but there are a few people opposing it, then try to get them on board by offering to make an exception for their area or for any objects last edited by them. OpenStreetMap can handle a bit of diversity; we will rather keep those people happy and contributing than overrule them and give them a reason to leave.

When your bot has a new feature, execute only a small number of edits, and get feedback on this feature before going further.

OpenStreetMap is very much built on consensus. A majority of voices on a mailing list does not give you the right to do whatever you please to the data created by the minority. Also, things that you may read in the Wiki are not a carte blanche for you to change everything so that it fits the Wiki "rules". Individual mappers have every right to tag things differently from what is stated in the Wiki, and it is not OK for anybody to turn the suggestions contained in the Wiki into strict rules that are applied automatically.

Also remember to take your time let time to other contributors to know about your plans, argue about it. This is very interesting to pin point things that might go wrong later.

Note that for low number of changes and locally limited it might be acceptable not to contact anyone, but it is still highly recommended, not only you will get feedbacks, but that will be less frustrating for others to discover what you have done after you actually did it.

Technical requirements

Use a dedicated user account for imports

Create a new user for the import. You must not use your standard OSM user account. The user page for the account should be used to collate data relating to the source and contact details for the import. Furthermore, it means that attribution can often be carried in the account's display name, or in the account's user page, which is better than putting it as a tag, as the user's editing history is a permanent record of the source and doesn't interfere with tags or increase the size of the database as much. For these reasons, creating a dedicated user account is preferable to using a source=* tag. For distributed/community imports, have each person make their own import account, for example "your osm user name"_import. It is not required that each import be done under the same user account.

  • It is currently acceptable for low numbers (humanly reviewable in a acceptable time) of object added/modified that you skip this mandatory requirement

A few explanation of why

Recommended practices

Using the term "Recommended" is too vague & doesn't help clarify disputes, questions, or conflicts. Jeffmeyer 00:52, 27 December 2012 (UTC)
I'm not native speaker, what should we use ? sletuffe 11:25, 30 December 2012 (UTC)

Human recommended practices

Checklist before you run an automated edit

Please keep in mind that the current OSM database represents an extremely large amount of work by the volunteers of the OSM community. Because automated edits usually refer to a large amounts of data modified/added, whether through an automated process by 1 person or a carefully curated process by a team of people, there's an increased risk of larger-scale damage to the database. Hence, it is critical that all automated edits are approached with caution and the proper amount of planning.

Here's a quick check-list to get you started on the right path. If you don't have the recommended skills to follow that check-list, you'd better don't start your automated edits unless you get help from other people willing to back you up.

  1. Gain familiarity with the basics of OpenStreetMap, including editing, such as adding details of your neighbourhood. Consider following the beginners' guide.
  2. If you are going to import data, read Import and register your permissions and project by adding a line to the table at Import/Catalogue.
  3. Write a plan for your automated edit. Create a wiki page outlining the details of your plan or send an email to the appropriate forum/mailing list, see bellow. This plan should include information such as plans for how to convert imported data to OSM XML, sample of edits you are going to do, dividing up the work, how to handle conflation, revert plans, changeset size policies, and plans for quality assurance.
  4. Discuss your plan. This is important, many future errors can be avoided by talking with other people who might think otherwise. Email the OSM community and/or the local community concerned by your automated edits to notify them of your plans, including a link to your wiki page. You can do this with an email to imports@openstreetmap.org, talk-(your country)@openstreetmap.org, and the OSM group specific to the the area directly impacted by the edits. This will help gain the benefit of past experiences, which may include having already reviewed the data you're considering for import or are not willing to apply the automated edit you are planing. Check for local user groups, local chapters, and country-specific mailing lists.
  5. When a consensus about your automated edit was reached, follow your plan.
  6. Track your progress.
  7. Provide updates to the community on your efforts.
  8. If something went wrong, warn the community and act according to your revert plan (you have one right ?).
  9. Let everyone know when you're done.
Stuff below here is not a sub-point of "imports", rather applies to all kinds of mechanical/automated edits. --Frederik Ramm 21:41, 18 December 2012 (UTC)
The checklist was copied "as was" but I think it needs to be re-wroten in order to move specific import requirement to the imports page and keep only those valid for all automated edits (automatic edit imports included). sletuffe 11:33, 19 December 2012 (UTC)

Be cautious!

While the OSM community does encourage the ordinary user to "be bold" and just try something out or do it halfway if he is not sure, this does not apply to automated edits. Automated edits will often change a large number of objects in large area and affect the work of many other mappers. They should be planned diligently and executed in a professional manner. If you are unable or unwilling to adhere to this code of conduct, then please consider not making an automated edit and ask someone else to do it for you.

Respect the work of others
Don't think this can be dropped; to many people it is not obvious; some would even think that official government data is certainly worth more than what some poor guy has added by hand! --Frederik Ramm 21:41, 18 December 2012 (UTC)
All right, maybe this isn't obvious for every one, but this is true for every edit may it be automated or manual. Maybe we could move this upper or elsewhere in a "don't use copyrighted data as source", "don't change others's work without carrefuly thinking about what they've done or ask then in case of doubt" list ? sletuffe 11:33, 19 December 2012 (UTC)

Mappers are what makes OSM work. They are the ones spending countless hours out in the open collecting data, or hammering their notes into an editor and uploading data to OpenStreetMap. They take pride in their work and often have a sense of ownership for their contribution. If your script changes something that is obviously wrong, e.g. a typo where someone wrote "hihgway" instead of "highway", then it surely does them a service. However, if you start judging their work and modifying it according to what you think is right then you might actually break something that someone has put in there on purpose. Be very careful with such edits and ask yourself: "Am I absolutely sure that I'm not cluelessly interfering with something that is well thought out?"

Beware that even if 99% of your edits are genuine typos, 1% of "false positive" (edits that harm the works of others) can make your bot unpopular. If your bot cannot be sure that it is doing obvious correction, consider contributing to quality assurance tools. What your bot find could be really useful for finding potential errors for examination by someone with local knowledge.

Your script does not have local knowledge. It wasn't there with a GPS and a pair of eyes. Neither were you. If you believe that nonetheless you know better than the mapper who has contributed the data, the mapper might well feel a little bit offended by "some guy with a script lording it over" - especially if the mapper is outgunned because he isn't also a programmer and he cannot make changes the way you can. Use tact and sensitivity when dealing with the work of others.

Execute your plans with caution

If you run a script against the database, make sure to minimize the risk of accidentally overwriting something that has been just modified by someone else (i.e. do not simply use yesterday's planet file as a basis for uploading changed objects as this would break changes made in the mean time). Also make sure to keep all data you need in case you have to revert your change when something goes awry. (If you do not feel up to the task of reverting everything you have done, then don't start making changes.)

If you plan to make a very large number of edits (in the six-digit range or more), it may make sense to double-check with the admins (try IRC) - ask whether there is something else going on that you might interfere with, or check the Munin graphs to find out at which time the servers are not busy.

Make sure that there is some way of identifying that a certain change has been made by your script. You could create a special user account for the script, or you could add a "source", "created_by", or "note" tag or something.

Plan your changesets sensibly. If your bot creates one changeset for each edit, that becomes extremely hard to read for people. If your bot creates one changeset for a bunch of changes covering the whole planet, that, too, becomes hard to read. Changes grouped into small regions are easiest to digest for human mappers (e.g. "fixed highway tags in Orange County"). Choose good changeset comments - "Some fixes" is not a good comment!

Always remember that local knowledge beats a couch potato from central command any time!

Avoid edit warring

If someone should revert your changes, do not start a tit-for-tat reversal of edits. This only annoys everyone and blows the database history out of proportion. Seek a dialogue with the people involved, and get someone else to mediate if you cannot find a solution.

This applies to everyone who edits OpenStreetMap, including people who are "combating" automated editing.

Document what you have done, or are doing

Make sure that people who check an object's history later on can find out about your script and what it has done, exactly. Document the exact scope of the change, the geographic area where you applied it, the date and time it was run, and the number of objects affected. If you run your script on a regular basis, or even continuously, then document exactly what algorithms you are using and what triggers your script. Suitable places to document your bot might be

  • The page Automated Edits/Log
  • The www.openstreetmap.org user page for your bot account
  • A wiki page that has the name of your bot
  • or any other place if you link there from your changeset comment.

The community expects that you make this documentation available in English or in the national languages of all countries where you edit something. So if your bot runs only in Spain, then it is ok to document it in Spanish. If it runs in Spain and France and Germany then you either have to document it in Spanish, French, and German; or you can opt to write the documentation in English which we presume enough people will understand.

Problems, Complaints

It is always possible that people will be unhappy with the edit, even after extensive discussion. So be ready for this, and handle all user complaints seriously and politely. If you have followed this policy then this means your account will not be blocked right away when someone complains, but you might still have to change or stop what you're doing if people dislike your actions and / or their side-effects.

Your edit may be reverted even if you have followed this policy; this doesn't guarantee your edit will be accepted.

Data working group will, on suspicion of mechanical edits not following this policy, either block the account immediately or send out a warning message (depending on how intense the editing activity is). All mechanical edits not following this policy are liable to being quickly reverted when they are discovered.

Technical recommendations

Be able to revert your modification

In case something goes wrong, just don't hope for the best, be prepared, with the help of someone you have talked to before your automated edit to have the technical resources to handle a revert. In case you don't know how to do a revert, and don't know anyone to help you with that, then it might be a good idea to find that person first, or to not do any automated edit.

Add tags to your changesets

It is recommended that you add tags to describe your changesets. It helps, for future operations to know where that come from, why you did it, and, in case of problems revert your edits based on those tags. Do try not to mix other types of edit with those, as they would be reverted with the rest in case something goes wrong.

  • a bot=yes tag is really interesting to helps other review, watch those types of potentially risky edits
  • a web=http://url tag is nice in order to link to some wiki page, mailing list archives explaining why you did it, how and where one can know more how to contact you.