YaCy

From OpenStreetMap Wiki
Jump to: navigation, search

A P2P based search engine, where every isntance is a crawler that grabs and indexes webpages.

OSM on YaCy

Cooperation between both projects can be a huge success:

  • OSM is a perfect georeference for webpages esp. for deeplinks
  • A search engine is a usefull application for all OSM embedded links and to present the map

Webmap

OSM in the search results in OSM

Already integrated as a small worldmap and if the search refers a OpenGeoDB dataset than similar to google a map of the place. To use this in your own search engine, watch the Tutorial how to integrate OpenStreetMap in YaCy.

Links

Unfortunatly there is currently no native interface in YaCy, so getting a plain list of all weblinks in OSM is:


osmosis --read-pbf germany.osm.pbf --tee --wk keyList=url,website --un --write-xml linksw.osm --nk keyList=url,website --write-xml linksn.osm

egrep "url|website" linksn.osm>links.txt
egrep "url|website" linksw.osm>>links.txt

This might need som times (for Germany ~10mins) Now you need to cut the urls out of the OSM XML key, value pairs. A guly hack is just to replace this lines with empty string:

"/>
    <tag k="url" v="
    <tag k="website" v="
    <tag k="contact:website" v="
    <tag k="url:en" v="

In present, YaCy seems to be unable to use this lists, directly, so we have to create simple HTML Linklists, to import them as bookmarks. Even if this would be easy to do with a script, I do it manualy for testing:

split --bytes=500k -d links.txt linksX
-Open each file with LibreOffice Writer
-Select all, apply Standartformatierung
-Go menu - Formating-AutoCorrecture->apply
-Save as .html

GeoReferences for pages