From OpenStreetMap Wiki
Jump to navigation Jump to search

How can we make OSM work better?

OSM is a highly database intensive project. A technical barrier to performance is hard drive latency. The impact of this can be reduced by using plenty of memory on the database server. The more memory we have, the more likely the data being requested will be in low-latency main system memory rather than having to be fetched from hard drive. As the price of Flash memory continues to drop and the density increases, we could consider using flash main storage some time in the next few years. Being solid state, it removes the mechanical latency barrier of hard drives.

In the short-medium term, our API server needs a little more memory, and our database server needs as much memory as we can get. There are many non-hardware methods we can investigate to improve system performance. For example, mapping lat/lon to a single integer tile column in the database then selecting based on tile rather than lat/lon. Perhaps also a many-many table mapping objects to tiles will simplify and improve accuracy of select queries on OSM data at the expense of increased insert complexity.

Current plans are to build a new master DB server, then move current DB server to slave role. Slave DB server used for less responsiveness-sensitive roles such as serving larger OSM data sets and GPX tracks. Main database reserved for time-sensitive roles- delivering smaller chunks of OSM data to the applet and JOSM and receiving OSM updates.

What do we need?

 Dual core Athlon 64 socket AM2. 
 Motherboard Not yet selected
 8Gb DDR2 DIMMs 4×2Gb.
 4×76Gb high perf 10–15k hard drives
 Purchased LSILogic Mylex AcceleRAID 170 U160 SCSI RAID Controller on eBay for £27. Tests in multi-drive mode so far disappointing. 
   May buy/try a different controller. (such as an areca SATA RAID Controller ?? ;D )
 1Gb for API server
 Good performance drive and controller for tiles@home on dev.


If you would like to make a donation of hardware or money to update our servers, please contact one of our sysadmins. Probably best to contact me Nick Hill directly, as I buy the server hardware. Donations can be made with SWIFT and IBAN.

Recent Received Donations

 24/02/07 23:54 £100 Anonymous (NH funds)
 26/2/07 01:43 £450 Anonymous (money from original commitment to buy memory) (NH funds)
 26/2/07 £100 Anonymous (in central funds as of 26 Feb)
 26/2/07 £10 Anonymous (in central funds as of 26 Feb)
 28/2/07 £50 RalfZ (PayPal to NH)
 01/3/07 £30 Higgy (Cheque to NH)
 02/03/07 £100 Anonymous (Paypal to NH).
 2/3/07 £60.36 (orig 90 EUR) IBAN Anonymous
 3/3/07 £500.00 Anonymous BACS NH funds
 5/3/07 £19.38 (£10 Forex dealing fee applied and E5.16 by sending bank) (orig 50 EUR) Swift Anonymous
 9 Mar-
 9/3/07 £200.00 additional from original commitment to buy memory Anonymous Paypal (NH funds)

Above Donations Spent

 1509.74 Gross Balance NH funds 28/3/07
 -32.85 Paypal fees to 8/3/07
 1476.89 Nett balance NH funds 8/3/07
 -615.41 4x2Gb DDR2 Dimms IT247 order No 537-SO30149710 ETA 15/3/07
  -40.00 160Gb HDD supplied 2nd March for Tiles@home on dev
  -27.00 Mylex Acceleraid 160
 -420.72 Mainboard, processor, 430w Hiper PSU, memory for old DB/ exchange DIMMS to API machine - API to 2Gb old DB to 4Gb, SATA HDD for DB boot. Ebuyer order no 8285479
  -27.95 Used dual-channel adaptec 3200S RAID card possibly to use in place of Mylex. (Further tests to be performed).
 -250.00 Four 1200h used 10K RPM U320 SCSI 79GB hard drives. High performance Compaq. Bonnie ++ shows 478 seeks/sec on one drive.
  -16.00 Four SCA to UW SCSI adapters
  -10.00 4 drop UW SCSI cable
 102.66 in NH funds as of 28/3/07

Discussion on Hardware

sxpert wrote:

   areca SATA II controllers are probably a better choice, with respect to disk prices. 
   also more drives are possibly more interesting than faster drive with respect to speed (more independent heads == lower latency)

That is certainly what I would have expected. However. according to a review you pointed me to, the graphs suggest to get twice the performance of one drive in a RAID array, you need 4 or 5 drives. In effect, a law of diminishing returns:

You must also bear in mind the very high cost per port on the Areca SATA RAID controllers. Costs which likely exceed the price of the attached drives.

Which suggests that if financially accessible, fewer of the highest performance drives should yield the highest overall performance.

Notwithstanding my disappointing experience so far with SATA. I will, however, continue to test SATA implementations. As previously stated:

I have had disappointing experiences with SATA. For example, tile server has SATA drive, tilegen has PATA. PATA gives more than twice performance of SATA in recent test re-building T@H database, even though tilegen has half the memory. Also, SATA drive when connected to controller on tilegen, kept getting IO-APIC errors and was unusable. IMO, the SATA implementations, although theoretically good are in practice a mess. The specification is too lax and driver writers, controller designers and hard drive manufacturers can and do each leave out parts of the design specification through laziness or cheapness. Given so much effort needs to be invested to ensure each element in the SATA chain has been properly implemented, given many drives are totally incompatible with certain SATA controllers, I am convinced from hands-on experience that SCSI is still far better. NickH