WikiProject Corine Land Cover/Corine Data Import

From OpenStreetMap Wiki
Jump to: navigation, search
Available languages — Corine Data Import
Afrikaans Alemannisch aragonés asturianu azərbaycanca Bahasa Indonesia Bahasa Melayu Bân-lâm-gú Basa Jawa Baso Minangkabau bosanski brezhoneg català čeština dansk Deutsch eesti English español Esperanto estremeñu euskara français Frysk Gaeilge Gàidhlig galego Hausa hrvatski Igbo interlingua Interlingue isiXhosa isiZulu íslenska italiano Kiswahili Kreyòl ayisyen kréyòl gwadloupéyen Kurdî latviešu Lëtzebuergesch lietuvių magyar Malagasy Malti Nederlands Nedersaksies norsk bokmål norsk nynorsk occitan Oromoo oʻzbekcha/ўзбекча Plattdüütsch polski português português do Brasil română shqip slovenčina slovenščina Soomaaliga suomi svenska Tiếng Việt Türkçe Vahcuengh vèneto Wolof Yorùbá Zazaki српски / srpski беларуская български қазақша македонски монгол русский тоҷикӣ українська Ελληνικά Հայերեն ქართული नेपाली मराठी हिन्दी অসমীয়া বাংলা ਪੰਜਾਬੀ ગુજરાતી ଓଡ଼ିଆ தமிழ் తెలుగు ಕನ್ನಡ മലയാളം සිංහල ไทย မြန်မာဘာသာ ລາວ ភាសាខ្មែរ ⵜⴰⵎⴰⵣⵉⵖⵜ አማርኛ 한국어 日本語 中文(简体)‎ 吴语 粵語 中文(繁體)‎ ייִדיש עברית اردو العربية پښتو سنڌي فارسی ދިވެހިބަސް

The goal of this page is to explain how to import the Corine Land Cover dataset starting from the Shapefile to a final .osm file.

Import of Corine Land Cover shapefile

It will be assumed that a database has already been created in postgres/postgis with the name corine.

The first step is to import the shapefile using the basic tool that postgis provides:

shp2pgsql -s 4326 -c -g the_geom -I -S -W UTF-8 -N skip corine.shp > corine.sql
psql -h localhost -U user -p 5432 < corine.sql


This should ask you the password since the password is read from tty. In my case, I modified my pg_hba.conf on my test platform to be running as trust and therefore not needing any password. It is not advisable on a server, but it is convenient.

You can safely delete the corine.sql file that is likely to be big after you imported it.

As ever on Unix, replace the above commands by a pipe between them to avoid disk access and a big SQL file

shp2pgsql -s 4326 -c -g the_geom -I -S -W UTF-8 -N skip corine.shp | psql -h localhost -U user -p 5432

(not tested)

Import of OSM database in the area you are interested in

You can download on either Geofabrik or on Cloudmade site the OSM file corresponding to the country you are trying to process. This file will then need to be imported by osm2pgsql.

osm2pgsql -c -d corine -l -s -U user -H localhost -P 5432 france.osm.bz2

The parameter slim is usually needed when you don't have enough memory.


Executing the queries that will generate the necessary tables

The system needs to run queries like this one. In the example, I am showing what has been done with the forests:

-- Creation of the forest table
DROP TABLE IF EXISTS forest;
CREATE TABLE forest
(
   "CLC:id" character varying(18), 
   "CLC:year" integer, 
   "CLC:code" character varying(3), 
   source character varying(200), 
   landuse character varying(200), 
   "natural" character varying(200), 
   wood character varying(200),
   the_geom geometry,
   CONSTRAINT enforce_dims_the_geom CHECK (ndims(the_geom) = 2),
   CONSTRAINT enforce_geotype_the_geom CHECK (geometrytype(the_geom) = 'POLYGON'::text OR the_geom IS NULL),
   CONSTRAINT enforce_srid_the_geom CHECK (srid(the_geom) = 4326)
) WITH (OIDS=FALSE)
;

-- creation of the Forest index
CREATE INDEX forest_the_geom_gist
  ON forest
  USING gist
  (the_geom);

-- Creation of the town surface table
DROP TABLE IF EXISTS forestsurface;
CREATE TABLE forestsurface
(
  id character varying(18) NOT NULL,
  surfaceoverlap double precision,
  CONSTRAINT fsid_pkey PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
ALTER TABLE forestsurface OWNER TO postgres;

-- Filling the database
INSERT INTO forest
	(	"CLC:id",
		"CLC:year",
		"CLC:code",
		source,
		landuse,
		wood,
		the_geom
	)
	SELECT	c.id,
		2006 AS "CLC:year",
		c.code_06 AS "CLC:code_06",
		'Union européenne – SOeS, CORINE Land Cover, 2006.' AS source,
		'forest' AS landuse,
		(	CASE
				WHEN c.code_06 = '311' THEN 'deciduous'
				WHEN c.code_06 = '312' THEN 'coniferous'
				WHEN c.code_06 = '313' THEN 'mixed'
				ELSE 'mixed'
			END
		) AS wood,
		c.the_geom
	FROM	corine AS c
	WHERE	c.code_06 IN ('311', '312', '313');


-- We create a temporary table to hold the value for the id
DROP TABLE IF EXISTS forestid;
CREATE TEMPORARY TABLE forestid (id character varying(18), percentsurface double precision, CONSTRAINT id_pkey PRIMARY KEY (id)) WITH (OIDS=FALSE);

INSERT INTO forestid
	(	id,
		percentsurface
	)
	SELECT	DISTINCT f."CLC:id",
		MAX(ST_AREA(ST_Intersection(pop.way, f.the_geom)) * 100) / ST_Area(f.the_geom)
	FROM	planet_osm_polygon AS pop
		INNER JOIN
		forest AS f
			ON ST_intersects(pop.way, f.the_geom) = true
	WHERE	pop.landuse IS NOT NULL
		OR pop.natural IS NOT NULL
		OR pop.wood IS NOT NULL
	GROUP BY	f."CLC:id", f.the_geom
	ORDER BY	f."CLC:id";

-- We are inserting into townsurface
INSERT INTO	forestsurface
	(	id,
		surfaceoverlap
	)
	SELECT	fi.id,
		fi.percentsurface
	FROM	forestid AS fi
	UNION
	SELECT	f."CLC:id",
		0.0
	FROM	forest AS f
	WHERE	f."CLC:id" NOT IN 	(	SELECT	fi.id
					        FROM	forestid AS fi
        				);

-- We are dropping the temporary table
DROP TABLE IF EXISTS forestid;


Exporting the table to a 0.6 osm file

The next step is to make use of polyshp2osm.py. I modified the python script to run (there is a kill switch). I have removed all fixed tags except created_by. Also, I modified the code to remove the namespace handling in the python code, so it is keeping the OSM data. Also, this script is producing 0.5 OSM files and therefore we need to convert the file into a 0.6 file.

pgsql2shp -h localhost -u postgres -g the_geom -k -f nooverlapforest.shp corine "SELECT f.* FROM forest AS f WHERE f.\"CLC:id\" IN (SELECT id FROM forestsurface WHERE surfaceoverlap = 0)"
pgsql2shp -h localhost -u postgres -g the_geom -k -f smalloverlapforest.shp corine "SELECT f.* FROM forest AS f WHERE f.\"CLC:id\" IN (SELECT id FROM forestsurface WHERE surfaceoverlap >= 5)"
pgsql2shp -h localhost -u postgres -g the_geom -k -f overlapforest.shp corine "SELECT f.* FROM forest AS f WHERE f.\"CLC:id\" IN (SELECT id FROM forestsurface WHERE surfaceoverlap > 0 AND surfaceoverlap < 5)"
python polyshp2osm.py -o 10000000 -l poly_output nooverlapforest.shp
mv poly_output.1.osm nooverlapforest-nosort.osm
python polyshp2osm.py -o 10000000 -l poly_output smalloverlapforest.shp
mv poly_output.1.osm smalloverlapforest-nosort.osm
python polyshp2osm.py -o 10000000 -l poly_output overlapforest.shp
mv poly_output.1.osm overlapforest-nosort.osm
osmosis --read-xml file="nooverlapforest-nosort.osm" enableDateParsing="false" --migrate --sort-0.6 --write-xml-0.6 file="nooverlapforest.osm"
osmosis --read-xml file="smalloverlapforest-nosort.osm" enableDateParsing="false" --migrate --sort-0.6 --write-xml-0.6 file="smalloverlapforest.osm"
osmosis --read-xml file="overlapforest-nosort.osm" enableDateParsing="false" --migrate --sort-0.6 --write-xml-0.6 file="overlapforest.osm"
bzip2 -9 nooverlapforest.osm
bzip2 -9 smalloverlapforest.osm
bzip2 -9 overlapforest.osm

In this case, I am separating the data into 3 categories: no overlap, small overlap (from 0 to 5), and overlap. You have to be careful with the value of parameter -o: if it is too small, it will produce multiple files. I have taken the case where only one file has been produced.

Merging of all categories of files

Merging all xml files into is required for the next step which will remove duplicate nodes. This step is done manually at the moment where first 2 lines :

 <?xml version='1.0' encoding='UTF-8'?>
 <osm version='0.6' generator="polyshp2osm">

and last line:

 </osm>

are removed (excepted for the first file and last file where header and footer are preserved of course) then resulting files are concatenated together.

Important is to enforce the node ids order (the different .osm files use a different node id offset for each file in the previous step above).

Removing duplicate nodes

This step removes nodes that use the exact same lat/lon coordinates within the file. It's a java small program building a list of nodes with their coordinates and replace the node_id by its predecessor when it is a duplicate.

Usage:

 java -Xmx1300m Merge <osm_file>

This will create a new file <osm_file>.new which can be finally renamed as "clcf2006.osm".

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.HashMap;

// Source by Pieren, July 2009
// Licence GPL
// Usage: Merge <source_file.osm>
public class Merge {
    public final int HASHMAP_INIT_SIZE = 12000000;
    HashMap<LatLon, Integer> nodeList = new HashMap<LatLon, Integer>(HASHMAP_INIT_SIZE, 0.9f);
    HashMap<Integer, Integer> replaceList = new HashMap<Integer, Integer>();

    class LatLon {
        double lon;
        double lat;
        LatLon(double lat, double lon) {
            this.lat = lat;
            this.lon = lon;
        }
        LatLon() {
        }
        // inspired by Point2D class
        public boolean equals(Object obj) {
            if (obj instanceof LatLon ) {
                LatLon ll = (LatLon) obj;
                return lat == ll.lat && lon == ll.lon;
            }
            return super.equals(obj);
        }
        // inspired by Point2D class
        public int hashCode() {
            long bits = Double.doubleToLongBits(lat);
            bits ^= Double.doubleToLongBits(lon) * 31;
            return ((int) bits) ^ ((int) (bits >> 32));
        }
        public String toString() {
            return "lon:"+this.lon+", lat:"+this.lat;
        }
    }
    
    /**
     * @param args
     */
    public static void main(String[] args) {
        if (args.length != 1) {
            System.out.println("Usage: Merge <source_file.osm>");
            System.exit(1);
        }
        System.out.println("Now : " + now());
        Merge me = new Merge();
        int j = 0;
        
        try {
            File file_src = new File(args[0]);
            File file_dst = new File(args[0]+".new");
            FileReader fr = new FileReader(file_src);
            FileWriter fw = new FileWriter(file_dst);
            BufferedReader in = new BufferedReader(fr);
            BufferedWriter out = new BufferedWriter(fw);
            String line;
            while ((line = in.readLine()) != null) {
                boolean copyLine = true;
                j++;
                if (j%100000 == 0)
                    System.out.println(now()+ " "+j);
                if (line.indexOf("<node id=") >= 0) {
                    if (me.checkNode(line) != 0)
                        copyLine = false;
                } else if (line.indexOf("<nd ref=") >= 0) {
                    line = me.checkNd(line);
                }
                if (copyLine) {
                    out.write(line);
                    out.newLine();
                }
            }
            in.close();
            out.close();
            fr.close();
            fw.close();
            System.out.println("Finished at "+now()+" total nodes:"+me.nodeList.size()+", duplicates:"+me.replaceList.size());
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        System.out.println("Now : " + now());
    }
    
    private int checkNode(String line) {
        try {
            LatLon ll = new LatLon();
            int new_id = parseNode(line, ll);
            Integer old_id = nodeList.get(ll);
            if (old_id != null) {
                if (!replaceList.containsKey(new_id))
                    replaceList.put(new_id, old_id);
                return old_id;
            } else
                nodeList.put(ll, new_id);
        } catch (Exception e) {
            System.out.println("unable to parse \"node id\" line:"+line);
            System.exit(1);
        }
        return 0;
    }
    
    private String checkNd(String line) {
        try {
            int new_id = parseNd(line);
            Integer old_id = replaceList.get(new_id);
            if (old_id != null) {
                //System.out.println(new_id+" a remplacer par "+old_id);
                line = "<nd ref='"+old_id+"' />";
            }
        } catch (Exception e) {
            System.out.println("unable to parse \"nd ref\" line:"+line);
            System.exit(1);
        }
        return line;
    }

    private int parseNode(String line, LatLon latlon) {
        String[] str = line.split("'");
        int id = Integer.parseInt(str[1]);
        latlon.lon = Double.parseDouble(str[3]);
        latlon.lat = Double.parseDouble(str[5]);
        return id;
    }

    private int parseNd(String line) {
        String[] str = line.split("'");
        return Integer.parseInt(str[1]);
    }
    
    public static String now() {
        Calendar cal = Calendar.getInstance();
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss:SSS");
        return sdf.format(cal.getTime());
      }
    
}