User:Tuxayo/Automated edits code of conduct and DWG: Mailing list discussion summary and proposals

From OpenStreetMap Wiki
Jump to: navigation, search

Introduction

This page started as an attempt to summarize some of the points discussed and make some proposals to progress on some of the issues discussed. Hopefully that way, this discussion won't be forgotten and buried in the mailing list.

It got heavily mixed with my vision of the issues so many things are subjective and opinionated.

The discussion started here: https://lists.openstreetmap.org/pipermail/talk/2016-July/076291.html

List of messages in chronological order: https://lists.openstreetmap.org/pipermail/talk/2016-July/subject.html

Objectives

Everyone wants long term data quality. The disagreements are in what trade-offs to make on automated edits.

Strictness of the rules

They are guidelines and shouldn't be enforced strictly like if they were laws, that would create more issues than it solves.

It's less clear about the "discuss your edit" point, due to it's importance. It could be enforced like a rule (see proposal 4) but that's mostly my opinion.

Effectiveness of automated edits

An interesting point has been made by Nicolás Alvarez: For a part of automated edits like typos and leading and trailing spaces for example, fixing them manually allows to find and fix a significant number of other issues. The principle is to use automatically detectable errors to spot related but more difficult to find errors.

That might not apply to reformatting phone numbers or adding https:// or http:// to URLs for example. [a]

So for the first type of automated edits, it might be more effective to use tools like MapRoulette to fix the main (automatically detectable) errors and related (not automatically detectable or not easily) errors while keeping a reasonable efficiency in fixing these issues. That would be a good balance between DB quality and time spent. It would also allow to work on errors that are not possible to detect without false positives and thus enable to work on more complicated ones.

Anyway, the second type of automated edits [a] is still relevant so accepting automated edits and doing them well is still necessary.

Issues

The DWG is often the last stage of the AE issue handling process so they are forced to:

  1. Drop and leave errors/data loss
  2. Handle even when there is no time to do it without issues

Depending on the moment or the person, not enough time to systematically:

  • Do an adequate study of the changeset to avoid collateral damages when reverting
  • Dialogue with the contributor to fully understand their motivations and mistakes
  • Dialogue with the contributor to explain why e.g. not discussing the edit in advance is important. As someone optimistic and inexperienced might underestimate the risk of damage.

This has for consequences:

  • Disputes
  • Controversial reverts
  • Legitimacy questioning
  • Time loss
    • For the contributor which has put hours of work to execute it's problematic AE (or other edits in case of collateral damage) and who might often give up when it could have fixed the AE to still be able to help cleaning the base.
    • For the DWG member handling the case when endless arguments happen after a too controversial decision.
  • Motivation loss
    • For the contributor
    • For the DWG member spending a lot of it's time in arguments/disputes and having to act like a policeman.

Few people are in the DWG so time is extremely limited. It due to the blocking power that comes with being in the DWG. Which is not necessary to handle AE issues. Someone else could do it and report to the DWG it blocking is needed.

Moderation transparency issues

One can't see all the AE cases handled by the DWG so:

  • We only see the cases that are controversial enough so someone "publicizes" them. So the rest of the community can't judge their overall work and only see the problems.
  • We can't see how much work is done, which would show how hard it would be to try to handle correctly all the cases.
  • It's harder to understand the AECoC as it's the result of years of handling these issues so one must see a lot of them to get the big picture and not just the controversially handled ones.
    • As a consequence it seems much harder to make proposals or edit the AECoC without being in the DWG as one would lack many elements.

Proposals

These are some independent ideas.

1. Have a Wiki page that lists all the automated edits

Contributors doing an AE should add a link to their changesets (or related ML/forum discussion or wikipage). But anyone finding an AE could add it there.

It would allow to know the current practices and tools used for AEs. Which would be useful for someone preparing an AE (in addition to the AECoC and other wiki pages) as well for people wanting to review automated edits.

2. Have a Wiki page that lists all the automated edits issues

Including all the DWG's interventions but not only.

It could be used to publicly report issues without sending them to the DWG. Then it would be an intermediate layer allowing other contributors to handle these issues and share the work with the DWG.

3. A public mailing list dedicated to AE issues to offload the DWG

3.1 Motivations

Because DWG membership implies moderation rights, it restricts the number of possible members due to the responsibility coming with it. Which prevents some contributors having the time, willingness and skills (to do almost all — the rest requires moderation rights — of the tasks in the AECoC issues handling process) to participate.

The reporting of AE issues could be done here so anyone could contribute to handling these issues.

Hopefully this will lower the workload of DWG members and allow more time to be spent on each case.

The DWG would no longer be the only group handling AE issues. They are already the last step of the process so at least we could try to not make them the first.

That would also have the benefits of the idea 2.

3.2 Steps to set this up
  • Draft guidelines to handle automated edit issues to encourage contributors interested in QA to try the "meta QA" level.
  • Decide the scope: It could include generic bad edits issues because:
    • A part of the usual process is the same: If one sends a changeset comment or a private message to a contributor about a bad changeset, one could forget it and it could remain unfixed. Then having a place when one could send an email saying: «this contributor mapped a lot of dubious stuff, I posted a comment to ask for clarification/fix» would help keeping track of that. While also providing the community an overall state of these issues.
    • In the case where one took the time to contact a contributor about issues and doesn't get a response and the situation needs a revert. But one doesn't have the time/knowledge/confidence to perform a revert. Having a place to ask someone else to perform the revert or to confirm that it's justified would be nice.
    • I guess the DWG must also receive many requests like:
      • A contributor doesn't respond to various issues and a revert is needed. (if the contributor is active then blocking could be needed, then calling the DWG is necessary in that case)
      • Someone fearing to forget about following up on the issue or who don't want to bother contacting the other contributor would directly report the issue to the DWG.
    • It's very likely that the people being interested in contributing to handle these more generic issues overlaps a lot with those interested in automated edits issues. If the needs above would justify creating a similar structure, then it would be simpler to have one with a broader scope. However, we might want to keep the scope narrow enough if it's too complicated to find a consensus about a broader scope.
  • Find how to direct all (or a fraction) of the automated edits issues that are currently reported to the DWG. It would be effectively a proxy for that kind of work that hopefully will absorb a significant fraction of it.
  • Find a relevant list name: depends on the scope and should be future proof. Remember that the wiki pages about "mechanical edits" were renamed "automated edits". So as the name could change and "automated edits" might mislead about the scope (because the intuitive definition might suggest that mass search and replace is excluded and that it's for bots) It must be chosen with care.
  • Submit for comments and approval
  • Ask the appropriate people to create the list

4. Ask a contributor who didn't discuss an automated edit to do it before any other contribution

  • It will save time for the contributors handling AE issues so they won't have to argue about the content of the AE.
  • It will be more efficient to educate inexperienced contributors doing AEs to integrate into the community.
  • Contributors handling AE issues don't always have the knowledge in the domain covered by the AE.
  • A refusal to discuss the AE with the community even post edit would give a greater legitimacy to revert.
  • Not reaching consensus after discussing with the community will give more legitimacy to revert.

In other terms, it will disconnect the contributor handling the AE issue in terms of time and responsibility: less time spent, less arguing about the AE, less controversies about reverts.

Caveat: can't be done if the decision of reverting must be taken very quickly. (time will quickly make reverting hard).