From OpenStreetMap Wiki
Jump to: navigation, search

Discuss Import/Guidelines

Declaration of consent

If people who are grabbing data and running imports are following these instructions properly (Import/Guidelines#Make sure data license is ok) then we won't have a problem, but we should perhaps formalise the process a little more, so that the Foundation has something in writing from people who have supposedly agreed to give us their data. Something along the lines of wikipedia's 'Declaration of consent' form

-- Harry Wood 16:09, 2 February 2009 (UTC)

Some interesting examples of this already on the wiki: Category:Authorisation_of_Use -- Harry Wood 02:15, 9 May 2011 (BST)

Are imports bad?

The first wording seems to me a bit too negative and more expressing personal taste. So I've rewritten it in a hopefully more neutral way, keeping the original idea "community and publicity is fundamental", and hoping not to rebuff potential collaborators. We should reach a consensus on this point before more editing of this section on the main page.

Many imports have been done and there is no reason to belittle those contributions. For instance the Coastline import is great and quite complementary to mapping parties (anyone wants to walk along the cliffs of the whole country to replace the coastline import? - just kidding). I think it is fine if people work on importing data if it is their preferred way of contributing. And some of them will get addicted and become regular, outdoor mappers!

Of course not everything should be imported. Importing is a community work too that can foster collaboration. Jrouquie 15:46, 10 August 2009 (UTC)

Are imports bad? Actually there is some evidence to suggest that the answer to that is "yes". I don't think they are always bad, but clearly they are not as obviously good as a lot people seem to think they are.
When you say "And some of them will get addicted and become regular, outdoor mappers" your thinking is all back-to-front. We're talking about somebody who has the audacity to be running scripts plonking lots of data into OpenStreetMap. I expect this person to have a full and complete understanding of how the OpenStreetMap community works, how we build maps (the community way), what kind of accuracy we work towards, and what kind of work would be involved within editing software to fix up data after an import. As far as I'm concerned, anyone who is not an addicted regular outdoor mapper is absolutely not welcome to run import scripts, and this page should make that clear.
Don't get me wrong. Clearly somebody who comes along with great technical skills can contribute a lot to the project, and obviously we shouldn't turn them away. But running import scripts... that requires a good understanding of more than just how to use our API. If the text of this page puts somebody off running import scripts, then maybe we didn't want them doing it anyway.
-- Harry Wood 15:10, 19 September 2009 (UTC)

source tag

I dislike the recommendation of adding a source= tag at each node of a way during imports. This adds a lot of overhead to the data. I would find it better to recommend uploading imports with a separate user account dedicated to that import. The source of any given data would still be clearly identifiable, the resulting data in OSM would be smaller. source= tags on ways are reasonable and appreciated.

User:Bass 20:45, 20 September 2009

Unfortunately, you can't create more than one user account with one e-mail address. -- DENelson83 05:15, 5 June 2012 (BST)
The middle-way is that the Key:source tag should go on the changeset. That should happen anyway in fact. Source tag on individual objects is debatable. It adds a lot of overhead as you say.
And then we have the recommendation to create a new account for imports. Maybe this should only apply to quite sizeable imports. If it's small enough that you'd be manually checking over all the data afterwards, then a new account could be overkill. Folks who are doing sizeable imports should be few and far between really. They should be people who are very tech-savvy. Maybe it's a bit of an assumption, but of often the same kind of people would also have access to a number of different email addresses (or a domain name with infinite email addresses) Certainly such people should be in communication with other tech-savvy folks in OSM, so this email restriction is surely surmountable.
...but yes the recommendation to create a new account is somewhat troubled by this restriction. Of course it's mainly a restriction designed to stop forgetful people accidentally registering twice, when they should be keeping one account and getting a password reminder. I suppose we could somehow allow import type people to be able to override the restriction somehow (A little link saying "No really create another account with the same email") But then other feature ideas are to have a special account type for bots. If that was an explicit feature added to the rails app, it could eliminate the problem.
-- Harry Wood 12:53, 6 June 2012 (BST)

reviewed tag

I'd like to suggest that all bulk import data (maybe not nodes...) have a reviewed=no tag, possibly prefixed with the source, and have JOSM and Potlatch display these differently (as with tiger:reviewed=no). It's impossible for the uploader to review all the data they are uploading and inevitably there is erroneous or outdated data. I find this functionality (with TIGER data) very useful because I can quickly identify regions which need attention and focus on those. -Oleklorenc 21:31, 12 November 2009 (UTC)

I don't find it useful. In fact I fix up quite a lot of TIGER data and don't bother changing/deleting the reviewed tag, thus making it less useful for everyone else. Why do I do this? Because a lot of TIGER fix-up is mind-numbingly tedious, but can be done quite quickly. The last thing I would want to do is add several extra clicks to the process (deleting tags) for every element I work on. In the case of TIGER it seems like just another waste of database space. When creating the TIGER Edited Map Matt found it much more useful to just look at when the data was last edited.
Mind you other people feel differently about it, and I guess other types of import are different too. We have a review tag on the NaPTAN/Import (just UK bus stops) which seems more reasonable for example. Even then though, I'd hate to think new users might be put off editing by the perception that there are complicated review processes to be followed. If a bus stop is in the wrong place, move it to the right place. That's it! You've made a useful contribution!
-- Harry Wood 11:32, 13 November 2009 (UTC)
I agree that we don't want to have a complicated review process for editing data, but I don't find selecting a few ways I've edited and deleting the reviewed tag that tedious. And, of course, those who don't want to bother with it don't.
Maybe it's just me and my OCD, but I actually enjoy reviewing TIGER data against GPS tracks and aerial imagery and removing the yellow border. I feel that the real value of the reviewed=no tag is identifying data that was not drawn by someone with specific knowledge about the data they were creating (I don't mean to put down the GIS departments that created the data...), be it GPS tacks, imagery, etc., but an unreviewed bulk import. When I look for the next area of the map to work on, I assume data without the yellow border as being consciously created by someone with some amount of accuracy and focus my attention on areas with lots of yellow borders.
The reason for bring this topic up on the talk page is to get a sense of how different people feel about and use the reviewed tag. But the "This page has been accessed 62 times" at the bottom of this page makes me feel that this won't get much exposure... -Oleklorenc 23:23, 13 November 2009 (UTC)

dedicated account

On November 2011 the recommendation for a separate account "consider creating" move to a requirement "create". Who decided this ? What has been the process that lead to this major change ? Do we need a separate account for each dataset imported ? What's the benefit when source=* is required by the original data provider ? What is the benefit when hundred/thousands of contributors are importing subsets of a larger dataset ? Cquest 19:07, 16 September 2012 (BST)

I'm in favor for returning the wiki texte from "mandatory" to "recommanded" regarding a new dedicated account sletuffe 13:40, 18 September 2012 (BST)
After reading several emails from the talk mailing list at [1] what I do understand is that this mandatory dedicated account need was added with fully automated, bot based imports in mind in order to to help bad imports isolation (in addition to easier source identification/factorisation). And I think this is a safe barrier to keep it in order to give DWG/OSMF the right to enforce this. But other semi-manual integration/imports borned from a (local) community based consenus with other rules should be allowed to overwrite those general rules. (IMHO, the French cadastre semi-manual integration case falls into this category).
So, let's move on, here is my proposal for a new text replacing the Use a dedicated user account section :
== Most imports need a dedicated user account ==
For all fully automated imports, you must create a new user for the import and not use your standard OSM user account. The reasons are : ease of control, identification, information about the source and also blocking the account as fast as possible in case something goes wrong. Furthermore, it means that attribution can often be carried in the account's display name, or in the account's user page, which is better than putting it as a tag, as the user's editing history is a permanent record of the source and doesn't interfere with tags or increase the size of the database as much. For these reasons, creating a dedicated user account is preferable to using a source=* tag.
The Data Working Group , volonteers in charge of detecting broken data imports, may decide to block your account in case of non respect of that rule.
In the case of regionaly limited imports (inside a country), it is highly recommanded to get in touch with the local community to discuss your planned import and ask them if you should, shouldn't or must use a dedicated account. In some cases other written guidelines may exist for specific kind of regional imports that you must follow. You should then add in your changesets's comments a word/link explaning that you are importing following this or that guideline in order for surveillying people to understand what kind of import you are doing and what kind of guidelines your are following.
Feel free to propose changes to my text, the goals beeings:
* Don't make dedicated account mandatory fo every cases
* Give local comunities more power and more trust for handeling some of their imports
* Make it clear that those rules could be enforced
sletuffe 15:05, 19 September 2012 (BST)
There is nothing special about an import where the importer integrates with existing data - in fact that is the expected course of action Pnorman 21:23, 19 September 2012 (BST)
I'm sorry I don't get what you mean. Does my proposal change could be understood as a separation between "imports that integrates with existing data" and "imports that don't" ? That wasn't my intention. sletuffe 10:33, 20 September 2012 (BST)
Well, it sounds like you're trying to distinguish some imports where the data is integrated with existing data. There's nothing unusual about those imports, all imports should be integrated with existing data. Pnorman 11:35, 21 September 2012 (BST)
This isn't my intention at all. What sentence in my change make you think it is so ? What I'm trying to distinguish is "automated or large scale (objects on more than one country) imports" and "limited to one country for wich their was a local consensus and a rule of import". Of course, what I have in mind as a concrete example is the french cadastre building import for wish a "more than guideline" is here : WikiProject_France/Cadastre/Import_semi-automatique_des_bâtiments sletuffe 11:51, 21 September 2012 (BST)
The French cadastre buildings that you use as an example are really no different than CanVec or typical imports like other European cadastre. They differ from TIGER, but TIGER was in many ways an example of how not to do an import. I'm not aware of any recent imports that ran with as little user intervention as TIGER and no one in recent memory has proposed an import where existing data contributed by users is deleted without review Pnorman 05:27, 22 September 2012 (BST)
I fail to understand what connection there is between my proposal change and "imports where data contributed by users is deleted without review". I'm not refering to those, and those do not exists anymore (as far as your memory can remember). Or are you impliying that french cadastre building import is one of those ? If yes, then that's not true, our guidelines doesn't says so, and if someone does that, it's either because he found that no usefull data was added to the previous import and decided it was faster to clean and upload again. Which is something we are still strongly discouraging unless the contributor knows why he is doing it, and has done all required checks that is not destroying any single infromation. sletuffe 13:05, 24 September 2012 (BST)
I didn't say there was a connection between your proposed change and "imports where data contributed by users is deleted without review". In fact, I said that CanVec, French cadastre and other european cadastre differ from TIGER. CanVec, French cadastre and other european cadastre are all large scale imports that this documentation is clearly targeted at Pnorman 05:58, 25 September 2012 (BST)

Revisions underway

The DWG is looking at how to best combine the policies on imports, mechanical edits and bots into one bulk edit policy. There is nothing to show anyone at this time. Pnorman 11:37, 21 September 2012 (BST)

Is it open to suggestions "before" it becomes final ? sletuffe 11:54, 21 September 2012 (BST)
Suggestions are welcome. The main difficulty is breaking down imports between different types. e.g. traditional large imports that do not involve verification of each object with imagery or surveys (e.g. PGS, CanVec, TIGER, French cadastral, CORINE) and other types of imports (e.g. cyclestreets). Another problem is how to handle ongoing imports and when the source data changes (e.g. canvec) and making sure that there's review of the changes.
Some of the interesting scenarios that have come up recently:
- The Czech community used a bot to fix problems with the borders in the area. They also used external sources of data in this process. Is this an import? Mechanical edit? Automated edit?
- I surveyed a new interchange but didn't get a usable GPS trace for all of the overpasses. I had a data source that agreed with my survey and was more complete, so I used it instead. I spent a significant amount of time making sure all the relations and tagging not in the source (e.g. bike access) was carried over. Is this an import? In this case it was CanVec so I could handle it either as an import or not, but I have other sources are legally compatible used that haven't gone through any import consultation.
- There is a threshold somewhere between using (j)xapi to identify problems and fix them vs. performing a mechanical edit with (j)xapi and JOSM. Where is it?
- People frequently upload >25k objects in one changeset with JOSM. This is a bad practice because changesets this large cannot generally be retrieved from the API with a /download call. Although a long-term solution likely involves reducing the maximum changeset size or a cgimap version of changeset/#/download, should these difficult to handle changesets be allowed?
- Aside from downloading a single changeset, people also upload >25k objects at once, sometimes >100k objects. This is a problem because a connection interruption at the right time can lead to a massive number of stray nodes and/or a significant amount of work to avoid duplicate objects; however there are some cases (mainly mechanical edits simplifying ways, changing tagging or deleting objects) where this can safely be done. What rules should there be around this?
I have thoughts around some of these questions but don't have answers yet. Pnorman 05:23, 22 September 2012 (BST)

Authorization / contributor tracking

Not sure if this is totally comparable, but something worth evaluating for tracking authorizations to use external data in OSM:

Jeffmeyer 23:42, 7 December 2012 (UTC)

The DWG uses OTRS for internal ticket-tracking and it's honestly a horrible pain and overkill. I doubt it'd be good for tracking legal permissions. Of course it's not like our current system is any better. Pnorman 23:37, 15 December 2012 (UTC)

Merging policy, guidelines, code of conduct about all types of mechanical edits

These 3 pages Import/Guidelines, Mechanical Edit Policy and Automated Edits code of conduct have lots in common, merging them somehow would help in understanding better what needs to be done in most cases of automatic/mechanical edit types. sletuffe 13:16, 14 December 2012 (UTC)

I'm not sure that imports==mechanical edits. We're doing a pretty extensive import in Seattle that I don't expect to be mechanical. Jeffmeyer 18:56, 15 December 2012 (UTC)
My idea is only to merge what is similar to avoid spreading over several wiki pages and try to make things clearer about what are current soft and hard rules. If there exists a non automated import (given that we define what is "automated"), then that "guideline" will not apply, and only the import section need to be read. As I understand it, the Import/Guidelines page is intended for automated import and not for someone hand re-drawing osm data by copying some vector data + comparing to other sources. But I think those cases are rare, most "import" in real life will be automated. sletuffe 10:12, 17 December 2012 (UTC)
Note that a related discussion is currently taking place at the DWG, maybe we could open that discussion onto the imports@ mailing list. sletuffe 10:13, 17 December 2012 (UTC)
I have an issue with merging content of these pages insofar as one of them, the Mechanical Edit Policy, is much more broad and stricter than the other (community created) pages and I'm worried that DWG will claim the right to decide the content of the merged page. I believe that many of the specific rules that the DWG has made up (requiring real names, imposing bureaucracy on commonplace edits such as typo fixes using JOSM's search features, and so on) are not well thought out and there hasn't been much response to my comments regarding the issue on the talk page. On any other page, I would have long since implemented the changes, but with an "official OSMF policy" it's not so easy, isn't it? --Tordanik 15:40, 17 December 2012 (UTC)
I do agree with what you said, but the fact is that even if the DWG does not "claim the right to decide the content of the merged page" a page that doesn't represent what is done isn't helpfull. The DWG is allready imposing, by blocking users, some policy. Not writing what is the current practice isn't helpfull at all because we don't know what is "allowed" and what not. For your information, as beeing a member of that said group, I can tel you that a document is in preparation, and because of such attitude of refusing wiki changes, that policy will be wrote on a separated media, unaccessible to usual wiki users with fewer consensus. And, IMHO, this is worst, and it would be better to work all together to make a better OSM.
That includes as a first step : writing a page of what the DWG currently do, try to avoid conflicting pages, and write what are recommended and mandatory practices according to that group. Once wrote down, here on the wiki, as a base for discussion, we could then start making it better, according to what contributors would like, and according to the DWG experience of what has gone wrong in the past. sletuffe 16:14, 17 December 2012 (UTC)
I have one problem with your suggestion: The step "start making it better" will require the DWG to change their policies based on community feedback. This hasn't happened so far, despite the fact that there already exists a text authored by DWG members on this wiki that could serve as a base for discussion. Can you promise that this will change? --Tordanik 14:38, 18 December 2012 (UTC)
IMHO "starting to make it better" doesn't necessarly mean to change it. It can become better by just beeing "clearer".
And no, I can't promise that it will change, but 1) Yes, I'd like to, but I'm not alone on board. and 2) I can still promise that it will be clearer if we make it clearer ;-)
What I think thought, is that if you don't describe what is, it will be even harder to change it. (How can we change something we don't know everything about ?) sletuffe 15:17, 18 December 2012 (UTC)
Here is an early draft Draft/Edit Policy sletuffe 17:16, 18 December 2012 (UTC)

For the record, I have recently merged the 'Mechanical edits policy' into Automated Edits code of conduct having discussed the change on the relevant talk pages over a period of time. As you will see from that article, imports are defined as a class of automated edit, and in my view this type of edit does justify a self-standing article. PeterIto (talk)

Proposal to move article to 'Import guidelines'

Would the title not read better as 'Import guidelines' than 'Import/Guidelines'? This would make it more consistent with other articles, most of which are now in the main namespace. If so then it would be very easy to move it. Thoughts? PeterIto (talk) 20:13, 31 May 2015 (UTC)

Given that there are other pages which are sub-pages of imports, it's probably best where it is. It's also currently in the main namespace. In any case, a change should probably be discussed on the imports@ list, where more people will read it. Pnorman (talk) 20:34, 4 June 2015 (UTC)