User:Tuxayo/Mailing list discussion summary: Automated edits code of conduct and DWG

From OpenStreetMap Wiki
Jump to: navigation, search

Introduction

This page is an attempt to summarize the points that can help build concrete proposals to some of the issues discussed.

It's also mixed with my vision of the issues so not everything comes from the discussion.

This is a work in progress. Expect errors, biases and inconsistencies.

The discussion started here: https://lists.openstreetmap.org/pipermail/talk/2016-July/076291.html

List of messages in chronological order: https://lists.openstreetmap.org/pipermail/talk/2016-July/subject.html

Objectives

In the end, everyone want long term data quality. The disagreement come in how to set some cursors regarding automated edits.

Strictness of the rules

They are guidelines and shouldn't be enforced strictly, that would create more issues than it solves.

It's less clear about the "discuss your edit" point, due to it's importance. It could be enforced like a rule (see proposal 4) but that's mostly my opinion.

Effectiveness of automated edits

An interesting point has been made by Nicolás Alvarez: For a significant part of automated edits like typos and leading and trailing spaces for example, fixing them manually allows to find and fix a significant number of other issues. The principle is to use automatically detectable errors to spot related but more difficult to find errors.

That might not apply to reformatting phone numbers or adding https:// or http:// to URLs for example.

So for the first class of automated edits, it might be more effective to use tools like MapRoulette to fix the main (automatically detectable) errors and related errors while keeping a reasonable efficiency in fixing these issues. That would be a good balance between DB quality and time spent.

Anyway, the second class of automated edits is still relevant so accepting automated edits and doing them well is still necessary.

Issues

The DWG doesn't have time to

  • Examine enough individual changetsets among a big (+70) series of problematic AEs to prevent collateral damage when mass reverting the changesets.
    • Here enough means that if the AE changetsets don't explicitly state that they are AEs one must at least check the number and natures of objects and the nature of changes to see if it's an AE.

So the problem is more when enforcing the guidelines rather than the guidelines themselves.

TODO

  • They are often the last stage of the AE issues handling process so they are forced to:
  1. Drop and leave errors/data loss
  2. Handle even when there is no time to do it without issues

Depending on the moment or the person, not enough time to systematically:

  • Do an adequate study of the changeset to avoid collateral damages when reverting
  • Dialogue with the contributor to fully understand their motivations and mistakes
  • Dialogue with the contributor to explain why e.g. not discussing in advance is important
  • Issues
    • Disputes
    • Controversial reverts
    • Legitimacy issues (no incidents wouldn't lead to these questions)
    • Time loss
      • For the contributor which has put hours of work to execute it's problematic AE (sometime it's not even problematic in case of collateral damage) and will often give up where it could have fixed the AE to still be able to help cleaning the base.
      • For the DWG member handling the case
    • Motivation loss
      • For the contributor
      • For the DWG members spending a lot of time doing the police and arguments/disputes.
    • Few people are in the DWG so time extremely limited
      • Due to the blocking power that comes with being in the DWG. Which is not necessary to handle AE issues. Someone else could do it and report to the DWG it blocking is needed.
    • One can't see all the cases handled by the DWG
      • We only see the cases that are controversial enough so someone "publicizes" it so the rest of the community can't judge their overall work.
      • Being able to see all the cases would allow to see how much work is done and who is doing it, which would allow to confirm if the volunteers doing it are overworked and must rush the cases.
      • Being able to see all the cases would allow understanding better the AECoC as it's the result of years of handling these issues.
      • The same reasons could be valid for all the automated edits.

Proposals

1. Have a Wiki page that lists all the automated edits

2. Have a Wiki page that lists all the automated edits issues

Including all the DWG's interventions but not only.

3. A public mailing list dedicated to AE issues to offload DWG

Because DWG membership implies moderation rights, it restricts the number of possible members due to the responsibility coming with it. Which prevent some contributors having the time, will and skills to do 90% (the rest requires moderation rights) of the tasks in AECoC issues handling process to participate.

The reporting of AE issues could be done here so anyone could contribute handling these issues.

Hopefully this will lower the workload of DWG members and allow more time to be spent on each case.

DWG would no longer be the almost only group handling AE issues. They are already the last step of the process so at least we could try to not make them the first.

That would also have the benefits of the idea 2.

Steps to set this up
  • Draft guidelines to handle automated edit issues to encourage contributors interested in QA to try the "meta QA" level
  • Decide the scope: It could include generic bad edits issues because:
    • If one send a changeset comment or a private message to a contributor about a bad changeset, one could forget it and it could remain unfixed. Then having a place when one could send an email saying: «this contributor mapped a lot of dubious stuff, I posted a comment to ask for clarification/fix» would help keeping track of that. While also providing the community an overall state of these issues.
    • In the case where one took the time to contact a contributor about issues and doesn't get a response and the situation needs a revert. But one doesn't have the time/knowledge/confidence to perform a revert. Having a place to ask someone else to perform the revert or to confirm that it's justified would be nice.
    • I guess the DWG must also receive many requests like:
      • A contributor doesn't respond to various issues and a revert is needed. (if the contributor is active then a block could be need so yeah calling DWG is necessary in that case)
      • Someone fearing to forget about following up the case or who don't want to bother contacting the other contributor would directly report the case to the DWG.
    • It's very likely that the people being interested in contributing to handle these more generic issues overlap a lot with those interested in automated edits issues. If the needs above would justify creating a similar structure, then it would be simpler to have one with a broader scope. However, we might want to keep the scope narrow enough if it's too complicated to find a consensus to set this up.
  • Find how to direct all (or a fraction) of the automated edits issues that are currently reported to the DWG. It would be effectively a proxy for that kind of work that hopefully will absorb a significant fraction of it.
  • Find a relevant list name: depends on the scope and should be future proof. Remember that the wiki pages about mechanical edits where renamed automated edits. So as the name could change and "automated edits" might mislead about the scope (because the intuitive definition might suggest that mass search and replace is excluded and that it's for bots) It must be chosen with care.
  • Submit for comments and approval
  • Ask the appropriate persons to create the list

4. Ask a contributor who didn't discussed an automated edit to do it before any other contribution

So the DWG member/contributor handling the issue won't have to time arguing with the contributor who did the AE.

  • It will save time for the contributors handling AE issues
  • Contributors handling AE issues don't always have the knowledge in the domain covered by the AE
  • A refusal to discuss the AE with the community even post edit would give a greater legitimacy to revert
  • Not reaching consensus after discussing with the community will give more legitimacy to revert

In other terms, it will disconnect the contributor handling the AE issue in terms of time and responsibility: less time spent, less arguing about the AE, less controversies about reverts