Import/Catalogue/Brazil IBGE Subnormal Agglomerates

From OpenStreetMap Wiki
Jump to navigation Jump to search

Published data source

The Brazilian Institute of Geography and Statistics (IBGE) provides a collection of census sectors corresponding to (generally illegal or only recently regularized) cluttered and dense poor settlements, most lacking basic urban utilities and essential services. The description is almost always equivalent to slums or other very poor settlements sufficiently remarkable to receive a name used by the press and by locals in day-to-day conversations. These areas were previously called "special interest zones" and have been called "subnormal agglomerates" since the 2010 census. Wikipedia currently has a Portuguese article on this here. Most Brazilian cities in OpenStreetMap currently do not contain this information (perhaps Rio de Janeiro is a notable exception). Mapping them is useful also due to typically higher crime levels within and near these settlements (but with a few exceptions).

Legality

IBGE has not provided a clear license statement, but informal communications have been recorded showing that IBGE's data is indeed considered "public domain" [1] [2] [3].

Planning

The data to be uploaded will mostly consist of simple areas with a landuse=residential tag. This has been the common practice in Rio de Janeiro, with no objections elsewhere so far, and it has been recommended in this answer with no objections and some approval.

The contours of IBGE's census sectors do not correspond exactly to slum borders, so further improvements by the local mapping community will be encouraged by adding a fixme=* tag. The import process will be primarily focused on improving obvious problems with the data. The process was first described in a forum thread (in Portuguese) which is inteded for contact and discussion with local mappers and other updates.

Because IBGE's license requires attribution, all polygons will contain a source=IBGE tag. It would invariably be added to help people know how reliable the data is.

Regarding import guidelines:

  • The import process was entirely conceived in contact with the Brazilian community
  • IBGE-specific tags (such as census sector ID) will not be imported because it is "difficult to understand how to manage when modifying (e.g. splitting, merging) objects"; in fact, some manual work during import involves merging some of these areas to improve the data
  • Conflation will not be required because this kind of data is missing everywhere, except for:
  • Quality assurance will be encouraged by adding a fixme=* tag as described previously
  • The special user ftrebienimports will be used for this procedure

Execution

The first part of the process consists of converting IBGE's KMZ files into OSM files using JOSM's OpenData plugin. This produces various untagged nodes and polygons, nodes with a name=* tag (those are intended for graphical labelling of corresponding polygons), and multipolygon relations containing IBGE-specific tags and normally only 1 member consisting of 1 polygon with "outer" role. Rarely, the relation can have more than one polygon when "inner" roles are necessary.

This data can be improved to reduce the amount of data to be uploaded and maintained. A custom short Python script was developed (and will soon be publicly available) that does the following:

  • copies the name=* tag from the multipolygon relation to its only member
  • removes label nodes when it is safe to do so (when their name matched the name in the multipolygon exactly)

Because these polygons will be mapped using the tag landuse=residential (already used in Rio de Janeiro, Belo Horizonte and Porto Alegre), the script also adds a prefix to the name ("Vila") to help users distinguish these areas from regular residential areas such as private residential condominiums. Special care is taken when IBGE's data already includes that prefix or several others considered equivalent.

Update: After some careful analysis, it was decided not to prepend a prefix when the name given by IBGE starts with: vila, núcleo, comunidade, bairro, invasão, loteamento, região, conjunto, ressaca, baixada, igarapé, aglomerado, agrovila, assentamento, ocupação, condomínio (in Distrito Federal), expansão, morro (in Rio de Janeiro, as suggested by the community).

The last part of the process includes manual improvements before submission. Some of them improve on the results of the script, others improve IBGE's source data. Suggested steps include:

  1. Handling leftover multipolygons (often those having members with an "inner" role): one must decide whether to keep the inner area by checking imagery
  2. Handling leftover label nodes (normally when more than one area has the same name, could also be an error in the data or a conversion error): normally may be removed, sometimes helps find nearby polygons that can be merged after checking imagery
  3. Adding the following tags to polygons and multipolygons: landuse=residential, source=IBGE and fixme=Ajustar contorno à área onde predominam moradias vizinhas. (meaning "Adjust contour to the area where neighbouring dwellings predominate.")
  4. Handling objects with Roman numerals in their name (used by IBGE when dividing a single settlement into multiple census sectors, useless for the end user): when the areas are close enough they can be merged, and the roman numerals can be removed in most cases
  5. Handling objects with "/" or " ou " in their name (used by IBGE to represent alternative names): the second name must be put in an alt_name=* tag
  6. Handling objects with "." in their name (used by IBGE for some abbreviations): the abbreviation normally should be expanded if it is a word, or the "." characters should be removed if it is an acronym and its letters should be all caps
  7. Handling objects with " e " or " - ": this is complicated and may be delegated to local collaborators; one must figure out if these separators are concatenating alternative names (alt_name=* should be used for that), if they indicate that multiple agglomerates where merged by IBGE for some reason (requires local knowledge, and if true, the name after the separator must also include the "Vila" prefix for clarity, and the separator must be " - " instead of " e "), or if it is just a specifier used internally by IBGE not used in practice (can be removed from the name)
  8. General review of all names, making obvious corrections where it appears necessary

The layer can then be pasted into any downloaded layer in JOSM (to properly convert coordinates), which can then be uploaded. A suggested upload comment is "Importação de aglomerados subnormais do IBGE." (meaning "Data import of subnormal agglomerates from IBGE.")

Improvements to be done by local mappers

Necessary instructions have been left to local mappers on the fixme=* tag of every imported area. Almost all of them simply tell the mapper to adjust the borders to the contour of poor dwellings.

Progress

The import process is complete and the following sets have been imported: Acre, Alagoas, Amapá, Amazonas, Bahia, Ceará, Distrito Federal, Espírito Santo, Goiás, Maranhão, Mato Grosso, Mato Grosso do Sul, Minas Gerais, Pará, Paraíba, Paraná, Pernambuco, Piauí, Rio de Janeiro, Rio Grande do Norte, Rio Grande do Sul, Rondônia, Roraima, Santa Catarina, São Paulo, Sergipe, Tocantins