Since this was written (in 1999) swish-e has added all the features found here, and a bunch more, to the base distribution
of swish-e. This document is here for historical reasons only.
[ Skip to the code section of this document if you don't want to read the discussion part. ]
After hacking together a little CGI interface to it -- http://www.lhsc.on.ca/cgibin/search -- it wasn't long before I wanted to start adding features like result filtering, result paging, and "start of document" contents to the search output. Adding these things to the C source for swish-e itself sounded like digging a large support hole for myself, so I decided to take the easiest route (at the expense of some speed) by modifying the swish-e spider and my own swish-e CGI front end.
The result filtering and result paging features are relatively simple additions to the CGI front-end, a small perl program.
The "start of document" contents feature, that I refer to as " document abstracting", is arrived it by doing two things:
Get swish-e running and building indexes using the HTTP method before making any changes. That way you know that everything is working before you start modifying it.
Here's how I update the index and abstract database for my site every night.
Notice how both swish-e and the spider create files that end in
working, so they can be renamed here.
(From a shell script launched from the web server user's crontab)
# Create swish-e index:
#
/opt/lhsc/www/swish/swish-e -i http://www.lhsc.on.ca/ \
-S http \
-c /www/database/swish-e/http.conf \
-f /www/database/swish-e/lhsc.index-working -v 0
mv /www/database/swish-e/abstract.gdbm.working \
/www/database/swish-e/abstract.gdbm
mv /www/database/swish-e/lhsc.index-working \
/www/database/swish-e/lhsc.index
Comments and questions should go to steve.vanderburg@lhsc.on.ca.