Introduction

Copyright (C) 2004-2007 Mikio Hirabayashi
Last Update: Tue, 06 Mar 2007 12:05:18 +0900

Table of Contents

  1. Introduction
  2. Installation
  3. Deployment
  4. Complement

Introduction

Hyper Estraier is a full-text search system. You can search lots of documents for some documents including specified words. If you run a web site, it is useful as your own search engine for pages in your site. Also, it is useful as search utilities of mail boxes and file servers.

The characteristic of Hyper Estraier is the following.

Hyper Estraier has two aspects. One is as a library to construct a full-text search system. That is, API (application programming interface) is provided for programmers. It enables for you to embed advanced functions of full-text search into your applications.

The other is as an application of the API described above. A command and a CGI script are provided. Using them, you can construct a typical full-text search system without any programming.

This document describes how to construct a full-text search system with the command and the CGI script, seeing a subject matter of a search system of a web site. Let's start with learning of the command and then step to the API.


Installation

This section describes how to install Hyper Estraier with the source package. As for a binary package, see its installation manual.

Preparation

Hyper Estraier is available on UNIX-like systems and Windows NT series. At least, the following environment are supported.

gcc 2.95 or later and make are required to install Hyper Estraier with the source package. They are installed by default on Linux, FreeBSD and so on.

As Hyper Estraier depends on the following libraries, install them beforehand.

As well, it is suggested to build QDBM with enabling zlib (./configure --enable-zlib) so that the index of Hyper Estraier becomes smaller. Note that QDBM 1.8.74 or earlier is not supported.

Installation

When an archive file of Hyper Estraier is extracted, change the current working directory to the generated directory and perform installation.

Run the configuration script.

./configure

Build programs.

make

Perform self-diagnostic test.

make check

Install programs. This operation must be carried out by the root user.

make install

Result

When a series of work finishes, the following files will be installed.

/usr/local/include/estraier.h
/usr/local/include/estmtdb.h
/usr/local/include/estnode.h
/usr/local/lib/libestraier.a
/usr/local/lib/libestraier.so.8.38.0
/usr/local/lib/libestraier.so.8
/usr/local/lib/libestraier.so
/usr/local/lib/pkgconfig/hyperestraier.pc
/usr/local/bin/estcmd
/usr/local/bin/estmttest
/usr/local/bin/estmaster
/usr/local/bin/estbutler
/usr/local/bin/estcall
/usr/local/bin/estwaver
/usr/local/bin/estload
/usr/local/bin/estconfig
/usr/local/bin/estwolefind
/usr/local/libexec/estseek.cgi
/usr/local/libexec/estfraud.cgi
/usr/local/libexec/estproxy.cgi
/usr/local/libexec/estscout.cgi
/usr/local/libexec/estsupt.cgi
/usr/local/share/hyperestraier/estseek.conf
/usr/local/share/hyperestraier/estseek.tmpl
/usr/local/share/hyperestraier/estseek.top
/usr/local/share/hyperestraier/estseek.help
/usr/local/share/hyperestraier/estfraud.conf
/usr/local/share/hyperestraier/estproxy.conf
/usr/local/share/hyperestraier/estscout.conf
/usr/local/share/hyperestraier/estsupt.conf
/usr/local/share/hyperestraier/estresult.dtd
/usr/local/share/hyperestraier/estraier.idl
/usr/local/share/hyperestraier/locale/...
/usr/local/share/hyperestraier/filter/...
/usr/local/share/hyperestraier/increm/...
/usr/local/share/hyperestraier/doc/...
/usr/local/man/man1/...
/usr/local/man/man3/...

Mac OS X, HP-UX, and Windows

On Mac OS X, perform `make mac' instead of `make', and `make check-mac' instead of `make check', and `make install-mac' instead of `make install'. As well, `libqdbm.dylib' and so on are created instead of `libestraier.so' and so on.

On HP-UX, perform `make hpux' instead of `make', and `make check-hpux' instead of `make check', and `make install-hpux' instead of `make install'. As well, `libqdbm.sl' is created instead of `libestraier.so' and so on.

On Windows, the Cygwin environment is required for building. Moreover, MinGW versions of zlib, libiconv, regex, QDBM, and Pthreads are required. On that basis, perform `make win'. No installation command is provided for Windows.

Options of configure

The following options can be specified with `./configure'.


Deployment

This section describes how to create an index, and deploy the CGI script.

Administration Command

A database called inverted index is used in order to search for documents quickly. That is, you should make the index containing target documents before you search some of them.

estcmd is provided to administrate indexes. estcmd handles each file on the file system of the local host, as each document. estcmd can register documents to the index and remove them from the index. Moreover, estcmd can gather documents under a directory and register them as a job lot. Supported file formats are plain-text, HTML, and e-mail (MIME).

As other formats are also supported by using filters, the method is mentioned later.

Indexing

It is presupposed that you run a web site and its contents are under `/home/www/public_html'. Then, let's register them into the index as `/home/www/casket'.

cd /home/www
estcmd gather -sd casket /home/www/public_html

Files under `/home/www/public_html' are gathered and registered into a new index named as `casket'. That's all for indexing.

Deployment of the CGI Script

It is presupposed that the URL of a directory for CGI scripts is `http://www.estraier.ad.jp/cgi-bin/' and its local path is `/home/www/cgi-bin'. Then, let's deploy requisite files into there.

cd /home/www/cgi-bin/
cp /usr/local/libexec/estseek.cgi .
cp /usr/local/share/hyperestraier/estseek.* .

`/usr/local/libexec/estseek.cgi', `estseek.(conf|tmpl|top|help)' in `/usr/local/share/hyperestraier/' are copied into `/home/www/cgi-bin/'. estseek.cgi is the CGI script. estseek.conf is the configuration file. estseek.tmpl is the template file. estseek.top is for the message of the top page. estseek.help is for the usage of search functions.

Open estseek.conf with a text editor and modify it. Most items are not needed to be modified, except for `indexname' and `replace'. Do as the following.

indexname: /home/www/casket
...
replace: file:///home/www/public_html/{{!}}http://www.estraier.ad.jp/
...

`indexname' specifies the path of the index. `replace' specifies regular expressions and replacement strings to convert the local URL of each document for the web server.

Let's Try It

All set? Let's access the URL `http://www.estraier.ad.jp/cgi-bin/estseek.cgi' with your favorite web browser. How to use is described on the page.

Updating the Index

When some documents in your site are modified or new documents are added, please update the index at regular intervals. Though it is okay to delete the index and remake it, incremental registration is useful.

The `-sd' option added when indexing is to record modification time of each document. And it is useful for incremental registration. Let's perform the following command.

cd /home/www
estcmd gather -cl -sd -cm casket /home/www/public_html

The option `-cm' is to ignore files which are not modified. The option `-cl' is to clean up data of overwritten documents.

Reflection of Deleted Documents

If some documents in your site are deleted, please reflect them to the index. Let's perform the following command.

cd /home/www
estcmd purge -cl casket

All records in the index are scanned and records of deleted documents are removed. The option `-cl' is to clean up data of overwritten documents.

Optimization

Iteration of `gather' and `purge' makes the index fat gradually. Optimization is to eliminate the dispensable regions and keeps the index small.

cd /home/www
estcmd optimize casket

If `gather' or `purge' is performed without the `-cl' option, records of deleted documents are not deleted though deletion marks was applied to them. `optimize' is useful to delete such void regions.

Automated Administration

`cron' enables you to automate operations for administration. Register the following script to `crontab'.

/usr/local/bin/estcmd gather -cl -sd -cm /home/www/casket /home/www/public_html
/usr/local/bin/estcmd purge -cl /home/www/casket

For more detail

Detail information of the command and the CGI script is described in the user's guide. Moreover, for information of the API, see the programming guide.


Complement

This section describes how to contact the author and the license of Hyper Estraier.

Contact

Hyper Estraier was written and is maintained by Mikio Hirabayashi. You can contact the author by e-mail to `mikio@fallabs.com'.

License

Hyper Estraier is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License or any later version.

Hyper Estraier is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with Hyper Estraier (See the file `COPYING'); if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.

Acknowledgment

Hyper Estraier was developed under management by Fumitoshi Ukai and supports by Exploratory Software Project of Information-technology Promotion Agency, Japan (IPA).