|This article or section may contain out-of-date information. The information is no longer correct, or no longer has relevance.
If you know about the current state of affairs, please help keep everyone informed by updating this information. (Discuss)
A P2P based search engine, where every isntance is a crawler that grabs and indexes webpages.
OSM on YaCy
Cooperation between both projects can be a huge success:
- OSM is a perfect georeference for webpages esp. for deeplinks
- A search engine is a usefull application for all OSM embedded links and to present the map
Already integrated as a small worldmap and if the search refers a OpenGeoDB dataset than similar to google a map of the place. To use this in your own search engine, watch the Tutorial how to integrate OpenStreetMap in YaCy.
Unfortunatly there is currently no native interface in YaCy, so getting a plain list of all weblinks in OSM is:
|Currently it focus url=* and website=* only, but there are more!|
osmosis --read-pbf germany.osm.pbf --tee --wk keyList=url,website --un --write-xml linksw.osm --nk keyList=url,website --write-xml linksn.osm egrep "url|website" linksn.osm>links.txt egrep "url|website" linksw.osm>>links.txt
This might need som times (for Germany ~10mins) Now you need to cut the urls out of the OSM XML key, value pairs. A guly hack is just to replace this lines with empty string:
"/> <tag k="url" v=" <tag k="website" v=" <tag k="contact:website" v=" <tag k="url:en" v="
In present, YaCy seems to be unable to use this lists, directly, so we have to create simple HTML Linklists, to import them as bookmarks. Even if this would be easy to do with a script, I do it manualy for testing:
split --bytes=500k -d links.txt linksX -Open each file with LibreOffice Writer -Select all, apply Standartformatierung -Go menu - Formating-AutoCorrecture->apply -Save as .html