Isite/Isearch Quick Start Guide

or, Installing and Using Isite/Isearch for the Hopelessly Impatient with Slightly More Thorough Instructions for Those who Care.

Isearch is a text search system developed by CNIDR. CNIDR is the Center for Networked Information Discovery and Retrieval. They are part of a non-profit corporation known as MCNC. MCNC used to stand for "Microelectronics Center of North Carolina", but they dropped the long form and now they're legally just MCNC. Anyway, the lawyers thought you'd like to know that.

Isearch pretty much does one thing: it lets the user search through a bunch of text by hunting for words. It doesn't "understand" your text collection, it doesn't try to parse it, it just uses slightly-less-than-brute-force statistical methods to hunt for words in documents. It does this by building indexes. An Isearch index is a just like the index in a book: if you look for a word in the index of a book, it tells you a page number to look at. Likewise, if you search for a word with Isearch, it looks in the index for a filename associated with that word and then shows that file to you. It can do more complicated things, but that's the basic game plan.

Installing Isite requires five steps:

  1. Get Isite. If you're reading this, there's a good chance you've already finished this step and the next one.
  2. Uncompress and un-tar the distribution. Again, unless someone printed this and put it on your desk, you're through with this step, too.
  3. Edit the Makefile. This is where you tell Isite what kind of machine you have. If you have a fairly mainstream Unix box, there won't be any problem. If you have a strange machine, then you've probably had a lot of practice porting software by now anyway.
  4. Compile. If you did Step 3 correctly, then this is a no-brainer. On the other hand, this is rarely a no-brainer in the real world.
  5. Index and search some sample text so you know you did things right.

Really, installation usually isn't tricky at all. Start to finish should take around thirty minutes the first time, and about ten when you download new versions.

STEP 0: GET GCC and GZIP

Okay, we lied: There's an extra step. You really should have gcc and g++ on your machine, and you must have gzip (and its companion gunzip) so you can unpack the archive that Isearch comes in. You can get gzip and gcc via anonymous ftp from prep.ai.mit.edu in the directory /pub/gnu. Gzip installation is fairly simple. Unpack the tar file and follow the instructions in the file "README". Installing gcc and g++ is a lot more complicated. In principle, you could use a vendor-supplied C++ compiler. The Solaris SparcWorks C++ compiler is known to work, and the CenterLine C++ compiler has also compiled Isearch successfully. The bad news is that there are several subtly different versions of the C++ specification and each compiler views it a little differently. Getting Isearch to work with a different compiler is about like porting Isearch to another machine. Isearch is also fairly dependent on compiler versions, especially with gcc. Gcc users should make sure they are using at least 2.7.X. Note that the old, pre-compiled gcc for Solaris is not meant to be a general-purpose compiler: it's just enough of a compiler to compile a newer version of gcc.

Incidentally: You're also going to have to install libg++ if you haven't already. SAme ftp site, same directory. Install it after you install g++.

STEP 1: GET ISITE

You should always work from the latest version of Isite available. Bugs are being corrected and new features are being added daily (literally). You can always get the newest version from ftp.cnidr.org. Here's a sample session where we will download today's version of Isearch from ftp.cnidr.org by logging in as "anonymous", sending our email address as our password, changing the current directory to "/pub/software/Isite", making sure we're in binary mode for ftp, and getting the file:

% ftp ftp.awcubed.com
Connected to ftp.awcubed.com.
220 awcubed FTP server (Version wu-2.4(1) Sun Jan 1 17:43:49 EST 1995) ready.
Name (ftp.cnidr.org:escott): anonymous
331 Guest login ok, send your complete e-mail address as password.
Password:
230 Guest login ok, access restrictions apply.
ftp> cd /Software
250-Please read the file README
250-  it was last modified on Fri Mar 22 15:41:06 1996 - 3 days ago
250 CWD command successful.
ftp> ls
200 PORT command successful.
150 Opening ASCII mode data connection for /bin/ls.
total 4995
Isite-2.05-linux.tar.gz
Isite-2.05-osf1.tar.gz
Isite-2.05-solaris.tar.gz
Isite-2.05-sunos.tar.gz
Isite-2.05.tar.gz
README
archive
untested
226 Transfer complete.
772 bytes received in 0.031 seconds (24 Kbytes/s)
ftp> binary
200 Type set to I.
ftp> get Isite-2.05.tar.gz
200 PORT command successful.
150 Opening BINARY mode data connection for Isite-2.05.tar.gz (4836638 bytes).
226 Transfer complete.
local: Isite-2.05.tar.gz remote: Isite-2.05.tar.gz
4836638 bytes received in 1.12 seconds (9.9e+02 Kbytes/s)
ftp> quit
% 

Notice a few things:

  1. There are precompiled binary kits for Linux on an Intel box, DEC OSF/1 (now called Digital Unix, soon to be called "The Operating System Formerly Known as 'Prince'") for Alpha (now know as Alpha AXP), Solaris 2.X for Sparc, and SunOS 4.1.X. If you have one of these machines and you really can't get Isearch to compile in step 4, consider using one of these binary kits. If you don't have one of the above architectures then you're going to have to build from source code anyway.
  2. There is a directory named "untested". The files in that directory are, well, untested. If you need an absolutely newest and greatest version, then look in here.

STEP 2: UNCOMPRESS AND UN-TAR THE DISTRIBUTION

You should now have a copy of Isearch as a gzipped tar tar. The ".gz" suffix indicates a gzipped file, and the ".tar" suffix indicates it's tarred. The first thing to do is to uncompress the file:

% gunzip Isite-2.05.tar.gz

You now have a (much larger) file called "Isite-2.05.tar".

The next step is to un-tar the file we just uncompressed. In our example, we're going to want to install Isearch so it has the path name "/local/project/Isite". To do this, copy or move the file to "/local/project":

% mv Isite-2.05.tar /local/project

Finally, we'll un-tar the distribution:

% cd /local/project
% tar xf Isite-2.05.tar

There should now be a directory named /local/project/Isite, and it should contain the Isearch distribution:

% ls /local/project/Isite
CHANGES    Makefile   TUTORIAL   doctype
COPYRIGHT  README     bin        src

If you see all of that, then you probably got it right.

STEP 3: EDIT THE MAKEFILE

This is the part that makes people nervous, but it really isn't that bad at all. You need to edit the Makefile with your favorite editor, and essentially just follow the instructions. Edit /local/project/Isite/Makefile in this example (you won't have to edit the makefiles in the subdirectories, they automatically inherit what they need).

The first choice is for compiler. Probably 99% of the world should leave the default "g++". Isearch is developed by people who use g++, so that's your best bet. It's also hard to beat the price.

The second choice is for "CFLAGS", which are the options to pass to the compiler. There are canned selections for you to choose from. If you don't know what to select, then "CFLAGS=-g -DUNIX -Wall" is a good starting guess.

The third choice is for the location to install the finished programs. The default "/usr/local/bin" is a good guess. Note that if you don't elect to actually "make install" in step 4 then you'll never have to set this.

The rest of the choices probably should be left alone.

While on the subject of editing files, it's worth noting that the file /local/project/Isite/Isearch/doctype/dtconf.inf describe how many "doctypes" your copy of Isearch will be aware of. A doctype is essentially a file type handler; there are doctypes for simple ASCII files, files of separate paragraphs, and so forth. You can almost always just leave this file alone and take the default, which is "Give me all of 'em!".

STEP 4: COMPILE

At this point, you should be able to

% cd /local/project/Isite-2.00
% make

And all the right things will happen. Note that many compilers will print a few warning lines along the way, but they're warning about pretty harmless stuff. If there are any errors, then the compilation will grind to a halt. Otherwise, when you're done there will be new files in the "bin" subdirectory:

% ls bin
Iindex        Isearch       Iutil         libIsearch.a	
libz3950.a libsapi.a izclient zclient zping zserver zbatch 
zcon zgate

If you got that far, then you can (optionally) :

% make install

and the compiled executables will be copied to /usr/local/bin.

STEP 5: INDEX SOME TEXT AND SEE HOW IT WORKS

Now let's index a little text and see if Isearch really works. First, let's pick a couple of files to index. In this example, we'll index the files "CHANGES" and "COPYRIGHT" since we know everyone has them.

% cd /local/project/Isite
% mkdir testIndex
% cd testIndex
% /local/project/Isite/bin/Iindex -d tester ../CHANGES ../COPYRIGHT
Iindex 1.45
Building document list ...
Building database tester:
   Parsing files ...
   Parsing /local/project/Isite/CHANGES ...
   Parsing /local/project/Isite/COPYRIGHT ...
   Indexing 1870 words ...
   Merging index ...
Database files saved to disk.
% ls
tester.dbi  tester.inx  tester.mdg  tester.mdk  tester.mdt

That created five index files to describe the two files we indexed. Don't worry, we could have indexed every file on your system and still only had five index files. Now let's do a little searching and see how well we did:

% /local/project/Isite/bin/Isearch -d tester table
Isearch 1.45
Searching database tester:

1 document(s) matched your query, 1 document(s) displayed.
      Score   File
   1.   100   /local/project/Isearch/CHANGES

Select file #: 

At this point, if you enter "1" and press return, it will display the contents of "CHANGES". If you press return without a number, Isearch will exit. This is sort of a trivial example, but if you index thousands of files then the above search will list the ones that contain either "RSET" or "table". If you run Isearch without any arguments then it will give a list of all of its options, what they mean, and some examples of their use. For more information, see the Isearch Tutorial included with the distribution.