User:LA2/Diary for Q2 2007
LA2's OpenStreetMap Diary for the 2nd Quarter of 2007
May 28, 2007: Google Techtalk video presentation by a person from Microsoft: What to do with Thousands of GPS Tracks? There is a reference to OpenStreetMap at 75% of the presentation.
May 24, 2007: All this focus on OSM isn't good for me. I should try to spend time on other projects instead.
May 22, 2007: No matter how clever I design my bounding boxes, our coastline data source appears to have its own interruptions at 60N and 18E.
May 20, 2007: I work with ojw to modify the tiles@home client (me) and server (ojw), so that metadata only has to be updated once for each tileset, rather than once for each little tile. However, at the end of the day ojw starts some MySQL operation (rename column, I think) that takes considerable time. Ojw has to go to bed, work tomorrow, and can only return to this tomorrow night. And no-one else can replace him. Meanwhile, the entire tiles@home system is out of order. Can I express my frustration over this? Anybody can have limited time, but "owning" an important part of the project is not a good combination. Perhaps I should abandon OSM for yet another half year, like I did October-March? But indeed, all I have to abandon is OJW and his little personal hobby project, the t@h server running on "dev". I think I can do without that. The picture below is his version of what I've been working on the last months:
May 17, 2007: The API server ("www") was running fine most of the day, except for an occasional lock-up where user processes took 196 % of the dual-CPU and everything else stalled. The graph below shows how cool the CPU used to be two weeks ago, when API 0.3 was running.
May 16, 2007: Starting at 06:30 GMT the "www" machine (API server) becomes unresponsive and the dual CPU spends 180 percent in iowait (of the 200 percent available). This looks very similar to Friday May 11. When Nick Hill becomes aware of the situation, he restarts the API process just before 11:00 GMT and all is back to normal. But it cost us over 4 hours of downtime. As soon as the API server ("www") wakes up, so does the t@h MySQL job on "dev" that appears to be that machine's cause of iowait.
My coastline import script now calls itself "coastbot" and at the end of each run outputs a statics report, like this:
Created 522 nodes, 521 segments, 18 ways, 14 closed loops; skipped over 145 nodes because they were straight in line. Made 1206 calls in 9.0 mins = 133.8 TPM, had 6 errors (0.495 %); response time min/avg/max = 156.0/430.9/25975.0 ms.
A simple "ping" (ICMP roundtrip without any TCP, HTTP or MySQL overhead) from my home computer in Sweden to wiki.openstreetmap.org in London takes 114 milliseconds, so network roundtrip makes up 3/4 of the minimum and 1/4 of the average response time. The over-all throughput I see is in the range 100-250 TPM (transactions per minute). Since the script runs a single thread, this corresponds to 240 to 600 milliseconds per API call. Between API calls, the script spends very little time in local processing such as arithmetics and file I/O.
May 15, 2007: The "dev" server is heavily loaded by tiles@home, various experiments and possibly a "broken" (?) disk. Having tiles@home running on this server is bad and wrong. Tiles@home should have its own server. Is anybody acting? Apparently everybody is on a silent holiday. People are cracking jokes about France beeing flooded from global warming. Meanwhile I'm told that OpenLayers now only loads tiles from one server at a time, but when I go to www.informationfreeway.org, my browser is still waiting for data to arrive from the various cache proxies. Everything is falling apart, but at least I'm importing coastlines again, this time around Umeå in northern Sweden. Most important, though: I have plenty to complain about, and that's my main contribution to this project after all.
Continue import at -s 73977 -r 64.00 65.12 15.00 22.00 22
May 14, 2007: This Google TechTalk video with Mary Poppendieck from December 15, 2006 on Competing on the Basis of Speed is something that the designers of the tiles@home queueing system should watch, especially the part of "honest queues" towards the end of the talk.
Continue import at shape 75456.
May 13, 2007: New "404" tiles are introduced in tiles@home for rectangular areas that have no database contents. Did I edit the archipelago outside Vaasa, Finland, or the streets of St. Petersburg, Russia, to see this result? I guess not. So what am I to do? Maybe add some fake islands and streets in the empty tiles?
May 11, 2007: I'm in shape 87183 of St. Petersburg's coastline at lunch time when the server becomes unavailable. I later restart at that shape, which might have been doubly imported. Next restart at shape 105646.
May 10, 2007: The import of western Estonia's coastline (Saaremaa, Hiiumaa) is complete and now being rendered in tiles@home. The fix-up after import was a minimal effort thanks to my improvements in the import script.
May 9, 2007: Continue Estonia's coastline import at shape 106303. The picture below shows the current state of mapping.
May 8, 2007: During continued import of western Estonia coastlines, the creation of the following objects failed because the server didn't respond:
<way id="0"><seg id="24846585"/><seg id="24846584"/><seg id="24846583"/><seg id="24846582"/><tag k="natural" v="coastline"/><tag k="created_by" v="almien_LA2_coastlines"/></way> <node id="0" lon="21.926700" lat="58.368770"><tag k="created_by" v="almien_LA2_coastlines"/></node> <segment id="0" from="0" to="28854058"><tag k="created_by" v="almien_LA2_coastlines"/></segment> <segment id="0" from="28854071" to="0"><tag k="created_by" v="almien_LA2_coastlines"/></segment> <node id="0" lon="23.075070" lat="58.379760"><tag k="created_by" v="almien_LA2_coastlines"/></node> --Found and fixed <segment id="0" from="0" to="28854240"><tag k="created_by" v="almien_LA2_coastlines"/></segment> <segment id="0" from="28854241" to="0"><tag k="created_by" v="almien_LA2_coastlines"/></segment> <segment id="0" from="28854539" to="28854538"><tag k="created_by" v="almien_LA2_coastlines"/></segment> <node id="0" lon="21.975480" lat="58.344480"><tag k="created_by" v="almien_LA2_coastlines"/></node> <segment id="0" from="0" to="28854923"><tag k="created_by" v="almien_LA2_coastlines"/></segment> <segment id="0" from="28854924" to="0"><tag k="created_by" v="almien_LA2_coastlines"/></segment>
These are rather trivial errors, and I implement a retry loop in the script. Continue from shape 106034.
May 7, 2007: As the server hesitatingly comes back to life, I have to struggle with an incompatibility between libcurl and the lighttpd server (last week, the server was Apache). The solution is to suppress the "Expect: 100-continue" HTTP header, like this:
$curl->setopt(CURLOPT_HTTPHEADER, ["Expect: "]);
After this I start to import coastlines for western Estonia. The server is still exceptionally slow, but the upload script should be able to survive this. However, at shape 70441 I get a "HTTP status code 0; Empty reply from server" for a node at lon="22.447680" lat="58.959140". Then everything freezes to a halt.
May 6, 2007: While the servers are unavailable, I improve the coast_upload.pl script a little:
- It used to be that every island and lake were created with a duplicate node where the shape started and ended. I had earlier fixed this by remembering the starting point and reusing the existing node for the last segment of the shape, if it returned to the same point, closing the loop.
- I had also earlier implemented a function that determines if two line segments are straight in line, and then avoids to create the node in the middle.
- I have adopted the script to OSM Protocol Version 0.4 by changing id="0" to "create" and version /0.3/ to /0.4/.
- After much trouble, I have finally succeeded to silence the HTTP headers that the WWW::Curl::Easy subroutine perform() wanted to print on STDOUT by setting CURLOPT_HEADERFUNCTION to a callback function that does nothing. This was not easy to get right, but this mailing list posting from July 2003 by Cris (Bailif, I suppose) helped me.
- The script now has a primitive progress indicator: A rolling counter on STDERR prints the shape numbers. The script's input is an ESRI Shapefile and the naive implementation always reads through all shapes in linear sequence. The Shapefile for region 22 (the Baltic Sea) contains 129756 shapes. I just print the shape number and not a percentage, because a percentage in a progress indicator should be the percentage of the time estimated to complete the job, and I have no way to estimate the time. Skipping over shapes outside the bounding box is so much faster than performing database uploads.
- At the end, the script reports better statistics on the number of created nodes, segments, ways and the number of in-line nodes it avoided to create.
- The new -n option (--nothing or --no-upload) does everything except the upload. This is useful for testing the script to see how much work there is to do within the given bounding box.
- The new -r option (--reverse) reverses all line segments and ways. This is useful in an archipelago, dominated by islands. With the -r option the islands don't have to be "turned around". Instead all lakes and the mainland coast will have to be turned around. Well, I guess you can't eat your cake and have it too. We have no algorithm or information to determine what is an island and what is a lake.
- You don't want to run an upload that takes several hours. Either the server, your own computer, or the network could experience problems in between. If you need to interrupt it and later restart it, you don't want any duplicates to be imported. So far, the only solution has been to specify smaller bounding boxes. Imported coastlines will be broken where the bounding box is split and need to be repaired manually. In an archipelago this is a great pain. So I implemented a limit of 30 minutes or 400 imported shapes, whichever comes first. When this limit is reached, the script halts and reports how it can be restarted from that point. This allows large archipelagos, such as Sweden's and Finland's, to be imported with a large bounding box and many restarts. For example, Åland contains 76159 nodes, 75712 segments, and 4800 ways (at least one way for every island). And this is after 14777 in-line nodes where skipped. Considering that each of these numbers require one REST (HTTP/XML/RPC) call, that is 171448 calls. If the server is able to handle 10 calls per second, this is still nearly five hours.
- The new -s option (--start NUMBER) allows the script to be restarted from a specified shape number. The default is 1, as that is the number of the first shape.
Some remaining issues:
- I still can't interrupt the script. Should I set up an interrupt handler that invokes the restart point?
- What should the script do if it receives an HTTP error? Currently it continues and treats errors as if all was OK, just printing the HTTP error response code on STDOUT. This is easily tested now that the server is disabled and returns HTTP 301.
May 5, 2007: Blue ocean tiles in tiles@home made the "dev" server's disk (hdc1) fill up, and everything stopped on Thursday May 3rd. On Friday a "planned" maintenance break of three hours is announced (11:00 - 14:00 GMT) with 90 minutes' notice, but lasts til after midnight. When now everything is back in operation, the server stats diagrams disclose not only a new disk (hdd1, only 1% full) for "dev", but also a whole new server called "db2". It appears from the Munin server statistics graphs that "db2" has 4 GB RAM and that "db" now has 8 GB of RAM. Later on Saturday it is announced that this is the weekend for the "Rails upgrade" and that the site will be unavailable Saturday and Sunday.
While fixing up the coastline, I find tracklogs in Almien coastlines (PGS)., also leading north and east. The whole coastline from Lübeck to Pärnu is fixed up, as well as Båstad to Oskarshamn and Gotland and Öland. I have made a plan for which remaining bounding boxes to import for the Baltic Sea on the page for
May 2, 2007: I import coastlines for Gotland, Gotska sandön, Latvia north of 57° N and Estonia's south coast west of 23.8° N (around Pärnu). I fix up the coast of Gotland, Gotska sandön and between Riga and Klaipeda.
April 30, 2007: Would it be useful to make a JOSM plugin for the coastline import? That would require a Java library to read the ESRI Shapefile format. Here are some available alternatives:
- Geotools is available under LGPL: Shapefile plugin
- NVision's JShapefile is a commercial try & buy API: com.nvs.shapefile
- BNN's OpenMap is an open source JavaBeans implementation, available under a license that isn't GPL: com.bbn.openmap.layer.shape
- org.deegree_impl.io.shpapi.ShapeFile appears to be rather old (1999-2001) and rather primitive.
April 26, 2007: The new tiles@home version (Glencoe, see April 24) that renders water as blue (everything on the right hand side of ways tagged with natural=coastline) is the cause of much frustration. A couple of fascinating bugs in the algorithm are pointed out. A whole new system for covering "ocean tiles" is designed. All coastlines must be pointing in the right direction, any gaps must be closed. I also remove tags from nodes and line segments that the import process has added, because they are marked as errors by the maplint layer of t@h. First I started with Öland, but the size of that project and the closeness to the complex archipelago of the mainland makes me give up. Instead I start with Bornholm and Falsterbo, and continue from there east and north along Skåne's coast that has long straight coastlines with only a few islands.
April 24, 2007: I compile a list of the 87 biggest cities in Africa with names, country, coordinates and population. All have a population (city proper, not metro area) of 478 thousand or more. This list is now found in my /Gazetteer. I also upload these 87 city names to the OSM map by first creating an OSM file with negative IDs and action=modify, open this file in JOSM, and click "upload". In some cases, but most probably less than 10, I might have created duplicates to already existing city names.
With the new version (Glencoe) of the tiles@home client, showing water as blue on the right-hand side of coastlines, I turn around all line segments around Öland and close some gaps in the coastline. The part of the island north of 57° N is missing, so I temporarily introduce a shortcut coastline there. The real nightmare is to check and fix the direction of line segments in the Swedish archipelagos and Norwegian fjords.
April 22, 2007: Of all places, I find some long tracks in, the biggest city in Kazakhstan. I draw the first roads there. I also add city names to the 30 biggest cities in Russia, all having more than 500 thousand inhabitants.
April 17, 2007: Now I need some better software to support my mapping. I currently use Beeline GPS to record the NMEA data. Apparently I can upload NMEA files from the PDA's wifi, but are they received and converted to GPX by OSM or should I set up my own website running Gpsbabel on the server side? How can I record street names while riding my bike? This PDA has a built-in microphone, but no outlet for connecting a headset or microphone, only an audio output. Perhaps I could use a bluetooth headset? Look at audio mapping. Can I edit the map from the PDA?
When in Lund, Sweden, I almost always stay at the youth hostel "STF Vandrarhem Tåget", shown on the map below. It's unfortunate that Osmarender lets the parking symbol shine so much brighter than the hostel symbol.
April 16, 2007: Here's my first attempt at mounting my PDA/GPS (Mio P550) for mapping from my bike. Click on the images for larger versions. The bike shakes a lot, and I don't want to expose the fragile PDA for this.
I went to Biltema to shop the following parts:
- Article 27-953. Bicycle basket, children size, SEK 16.90
- Article 80-406. Lead accumulator 12 volt 2.6 Ah, SEK 89.90
- Article 37-707. Motorcycle battery charger 6/12 volt, SEK 79.90
- Article 80-502. Battery cord 30 cm with lighter outlet, SEK 22.90
- Article 35-758. "Flatstifthylsa" (Female Faston/Blade connectors, Flachsteckerhülse) 4.8 mm, 5 red, 5 black, SEK 11.90
That's a total of SEK 221.50 = EUR 24.60. I also bought 2 metres of black elastic string from a sewing shop.
The PDA is powered with 5 VDC through the USB connector. In theory I could do fine with a 6 volt motorcycle battery, but I already own a power cord that has a lighter plug for 12 volt, so that made the choice easy. I don't know, but I guess the lighter cord uses a linear regulator that burns 7 volts into heat. That's no big deal. This battery weighs 1.08 kg and I could easily carry two of them.
While charging, the battery poles have 14.8 volts, which immediately adjusts to 13.2 volts when the charger is removed. When the PDA is connecte and switched off, it draws 380 mA. When the PDA is turned on with no programs running, it draws only 180 mA. With programs running it draws between 230 and 270 mA. Note that this is PDA+cord at 12 volts. According to the battery's data sheet, it can deliver 260 mA for 10 hours, which is of course exactly 2.6 Ah. The lighter-USB cord on its own without the PDA draws 22 mA (this includes a green LED). At that rate the battery would be drained in some 100 hours or half a week.
The lighter outlet comes with big battery clamps that I don't need. Instead I need something that fits the battery's poles. Instead of the standard dimension 6.3 mm (1/4 inch), these batteries have 4.8 mm male Faston/blade connectors (Flachstecker, flatstift). It turns out there are gold plated females (Flachsteckerhülse, flatstifthylsa) for audio gear.
Here's the battery cord with the new connectors mounted.
And this is how the battery and cord fit in the bottom of the basket under some elastic string. For this first round, I'm not encapsulating the battery.
On top of the battery I intend to mount the PDA in a little box of its own that hangs in a suspension of elastic string. I found this box in my kitchen and made some holes for the cords and strings. When the USB power cord is connected to the short side of the PDA, the unit becomes too long to be mounted in portrait mode in this children sized bicycle basket. However, the PDA easily flips to landscape mode, so that's how I will have to use it.
Here are all the parts, spread over the table.
And the same mounted in the bicycle basket, in a suspension of elastic string. On the top left is the external antenna that I normally put on my car's roof. I discovered it will need some better fastening, perhaps adhesive tape.
Just hang the basket on the bike's handle bar, and off you go. When you park, just lift it off and bring it with you. Nothing is mounted on the bike.
April 15, 2007: Instead of visiting the developer workshop in Essen, Germany, I map parts of southern Sweden. I also start experimenting with mounting my PDA/GPS on a bicycle. The bike shakes a lot and this is a very fragile unit, so some special suspension is probably needed. On Sunday afternoon, the DB+API servers are upgraded and nobody can fix them until Steve+Nick are back in London.
April 12, 2007: How to extend the battery life on your Mio P550, a PDA with built-in GPS and Windows Mobile 5:
- you will need to configure the ActiveSync to synchronize with server manually (it is chosen to synchronize each 10 minutes by default).
- open Programs/ActiveSync
- choose Menu / Add Server source (we need to create dummy server connection to enable the Schedule option).
- type "dummy" as server address, press next, then type anything (e.g. "a") as username, password and domain and press Next and then Finish in the next window
- now open Menu / Schedule and choose "Manually" in both combo-boxes.
- now open Menu / Options and delete Exchange Server
- ActiveSync should be now configured not to wake up from suspend mode.
Below is an Open Street Map of Europe, based on the Osmarender layer from Tiles@home. The grid shows zoom=6 tiles. The red dashes mark an area in Ukraine, Belarus, Poland and southern Russia touching 11 such tiles from Smolensk to Odessa, from Kharkiv to Szczecin, that was all white a week ago. (The motorway from Odessa to Kiev and Vitebsk already existed, having 25 km long segments, but was redrawn in finer detail by me.) There are still more tracklogs to be drawn in major cities (Kiev, Odessa, Minsk, Warsaw), in Russia (around Novgorod), Finland and Estonia, but Lithuania and the Kaliningrad area are still completely blank.
April 8, 2007: I'm now making progress also in southern Russia and Poland. I think I've drawn roads for all available tracks in the Ukraine and Belarus. But at the same time, I'm leaving some areas behind. The biggest job is probably St. Petersburg, so I could save that for last, and work from the periphery towards this center. To do:
Road south from Zaporizhia Roads east and west from Odessa Road north from Chernihiv (towards Velikie Luki?) Railroad and road south from Velikie Luki (towards Minsk? or Kiev?) Road south-west of Warsaw Road west of Poznan (towards Frankfurt/Oder and Berlin) Road north-east from Frankfurt/Oder into Poland Road south from Minsk (towards Brest?)
- Roads east from Pskov (towards Novgorod) (and north from Pskov?)
- Roads and railways east, north and south from Novgorod
- St. Petersburg, roads in all directions: Narva (E20), Pushkin (E95), Novgorod (E105), Viborg/Helsinki (E18)
- Roads around lake Ladoga and on the Karelian Isthmus
- Roads in Estonia and Finland
- Roads on Iceland
- Roads along Sweden's east coast
April 4, 2007: Below is an Open Street Map of Europe, based on the Osmarender layer from Tiles@home. The grid shows zoom=6 tiles. The red dashes mark an area in Russia covering 17 such tiles from Murmansk to Moscow, from Pskov to Arkhangelsk, that was all white two weeks ago. Last spring, I found tracklogs in this area and started to draw line segments using the Java applet, but I never added ways. Now I have added ways, using JOSM and the Tiles@home renderer, and found more tracklogs. There is still more to draw in this area, but to its south is even more white space in Poland and Ukraine.