.: FIELD SELECTION TABLE (FST)

4.6 FIELD SELECTION TABLE (FST)

This is perhaps the most difficult of the four forms to understand.

CDS/ISIS has two ways of finding information in the database, which can be compared with the two ways of finding information in a book. Suppose we have a book on architecture and we want to find any mention of cathedrals. One method is to start at page 1 and scan each page in turn to see whether ’cathedrals’ occurs on that page. This is known as a ’serial’ or ’sequential’ search, because we are searching through the pages in sequence. It would be quite a reliable method (provided we could keep up the concentration) but it would

take a long time if the book had several hundred pages.

A much quicker method is to make use of the index (provided that the book has one). We look under C, find ’cathedrals’, and then see an entry something like:

cathedrals 30, 212, 360

Now we can go straight to those page numbers and read what is said about cathedrals. This method might not be quite so reliable, since it depends on the skills of the indexer. He or she might have considered some mentions of ’cathedrals’ to be too insignificant to index.

CDS/ISIS allows both these approaches to information retrieval. The first method, scanning through the records sequentially examining the text contained in the record is known as free-text searching. It is likely to be a slow process when the database contains more than a few hundred records. The second method, using an index, is the normal way of searching. CDS/ISIS allows you to set up the index automatically and refers to it as the index or inverted file. (The list of terms in the index without the details of their occurrences is also referred to as the terms dictionary.)

The selection of terms from the database records to go on to the index file is controlled by the Field Selection Table. It is not possible for the computer

to select terms according to their significance. Instead the selection depends upon three rules:

i. Which fields from the record are to be indexed (e.g. you probably want authors indexed but not the publisher or the number of pages).

ii. How the index terms are to be constructed from the data in these fields (called the indexing technique). For example, do you want the title ’Good secretarial practice’ as a whole

field under ’G’, or do you want it split up into separate words so that

’secretarial’ can be searched under ’s’?

iii. You can specify a list of stopwords which are not to be used on their own as index terms, e.g. ’in’, ’of’ and ’the’.

CDS/ISIS allows much flexibility in specifying each of these three rules. It is important to consider them carefully, since they determine what searches will be possible on the database. For instance, if you index authors as separate words, then ’Walpole, Horace’ will appear under

’Horace’ and under ’Walpole’: you cannot search him as ’Walpole, Horace’. If you index titles as whole fields, then ’The Concise Oxford Dictionary of Quotations’ cannot be searched under

’Dictionary’ or under ’Quotations’. It is, in fact, possible in CDS/ISIS to index the same field in more than one way.

If you have divided the field into subfields, you can index different subfields by different techniques (or some subfields but not others).

Each line of the Field Selection Table comprises three elements: the Tag or Name, the Technique and the Format. You need to make an entry in the table for each field you want to index (i.e. to make searchable) and if the same field

is indexed in two ways you need two entries for it.

Again if you are unsure about writing FSTs it would be a good idea to engage the services of the

Dictionary Assistant. This will give you a dialog box like the one in Figure 4.3.

Figure 4.3 Dictionary Assistant dialog box

All you need do is to choose which technique to apply and which fields to index. The listbox on the right shows the techniques available. The two most commonly used are 0 – by line and 4 – by word.

0 means that the whole field contents will be indexed as a single term.

1 means index each subfield separately and so is relevant only if the field is divided into subfields.

2 means index only words or phrases which have been entered between angle brackets, e.g.

<inflation rate>. This technique can be used to select particular terms from a lengthy piece of text such as an abstract. Some CDS/ISIS users like to enter descriptors this way and use technique 2 to index them.

3 is similar to 2 but indexes terms entered between slashes, e.g. /Windward Islands/

4 signifies that each word in the field will be indexed separately (except stopwords – see Section

4.7). If the field is divided into subfields, you must specify mode mhl or mdl in the extraction format – see Section 5.2.

Other values are also available and are explained in the Reference manual. If you choose one of the values 5 to 8 you will have to edit the format manually to put in the required prefix. For help on choosing the right technique please see Section 4.8.

Now click the check boxes against the fields you want to be indexed (i.e. searchable) and finally click OK. The FST is then displayed and you can edit it if necessary. Using the Dictionary Assistant, all the fields selected are indexed by the same technique: if you want to apply different techniques to different fields, you will need to make changes here.

Each entry in the FST has three parts. In the top part of the dialog box the entry being edited is shown in three separate boxes. In the Entries box each entry is shown on one line with spaces between the three parts.

The first value, which was called the ID in the DOS version of CDS/ISIS, is normally the same as the tag of the field from which the terms come. (It does not have to be, but this usually makes searching easier.) It can be used to specify the type of term when searching, as we shall see in chapter 7. If you choose a number that corresponds to a field tag, Winisis will show the field name in the Tag/Name box when you are editing it. If you choose a number that does not correspond to a field tag, it will be shown as the number followed by “FST Tag”.

The second value, the indexing technique, specifies how the index terms are to be extracted as explained above.

The third column, the format, shows which field in the record the terms are to come from. As in the display format, fields are specified with v in front of their tags.

So, if the title field has a tag 200 and we want to index each individual word, the entries would be:

Tag/Name: 200 Title Technique: 4 Format: v200

and if the author field is 100 and we want to index the author name as a whole: Tag/Name: 100 Author Technique: 0 Format: v100

If we want to index only subfield a of field 100 we could specify

Tag/Name: 100 Author Technique: 0 Format: v100^a

This dialog box works in a similar way to the one for the FDT. When you have entered the data for each field, the focus will be on the Add button. Either click on the button or press {Enter} to add the field to the table (displayed in the Entries box). If you need to correct the details for any entry, just click on that entry in the Entries box and the details will be copied into the boxes used for editing. If you need to remove an entry, highlight it and click the Delete button. An example of an FST is shown in Figure 4.4.

Figure 4.4 Example of Field Selection Table (FST )

For more information on writing the data extraction format, please see Chapter 5, especially

Section 5.2 for dealing with subfield markers and Section 5.5 for dealing with repeated fields.

Again, do not be too concerned to get the Field Selection Table right first

time. It is best to try it out on a few sample records and look at the index terms produced. If they are not what you want, edit the FST and then

regenerate the inverted file.

When you have completed your entries in the Field Selection Table, click the Terminate button. You are then asked to confirm that you want the database to be created. Click Yes and your wish should be granted. You are then invited to select a database to work on: you can choose the one you have just created or a previous one.

Source: https://eprints.mdx.ac.uk/3077/5/whandbk.pdf

.

Thursday, August 13, 2020

FIELD SELECTION TABLE (FST)

No comments:

Post a Comment