The Permuterm Subject Index
Abstract
The paper reviews the Science Citation Index and points out especially the role and significance of the Permuterm Subject Index being a tool for literature searching under title entries.
The entries consist of four components: two terms (concepts), the author's name and the identifying symbol (indicator of identity) of the document. The first term is the primary term, the second is the co-term.The primary term is used as a weighted keyword followed by the corresponding co-terms, while the co-term is followed by the other two components.
The methods of processing the index are the following. entries are automatically generated from the title by computer. Three machine vocabularies are used consisting 1. of full stop words (irrelevant words); 2. of semi-stop words (semi-irrelevant words); and 3. of paired word stops (irrelevant paired words). The computer, based upon the vocabularies, selects the relevant and semi-stop words from the title entries. The remaining relevant and semi-stop words in the title are assigned as co-terms to each of the primary relevant words, thus forming pairs of primary term - co-term. The computer adds the author's name and the identification sign of the document to these pairs. The complete entries, finally, are organized into the stipulated order.
The critical examination of the Permuterm Index shows that the information content of the titles is being revealed by a maximum amount of entries. Therefore the index is complying with its intended purpose. Although the entries taken from the title contain a relatively small amount of information, there is a possibility of tracing practically all relevant words in the title. However, there is a serious drawback; the number of superfluous entries is considerable, which means that the computer generates all the primary term - co-term pairs from the title, irrespective of their relevance from the viewpoint of the content. The redundancy of title entries could only be eliminated by human sffort: this however, - owing to the necessary quantity of mental work - is far too circumstantial and costly and therefore practically not feasible.
The paper raises the question, whether the index could be rendered more usable by changing its structure.
One possible modification would be if the entries contained the full titles of the document instead of only the identification sign. This solution, however - even if it might seem useful theoretically - seems unrealistic due to the considerable increase in the size of the index.
Another possible change would be: to decrease or increase the number of title entries. In the first case the entry would include only one subject term. This structure would be obviously inferior to the existing one, since references under individual entries (concepts) would accumulate to an extent that searching would be practically impossible. An increase in the number of title entries, in turn, would only result in a considerable growth of the number of entries; consequently, both the size of the index and the number of superfluous entries would grow considerably and searching would become very difficult.
If we take into consideration the predominant aspects when determining the structure of the Permuterm Subject Index, it must be admitted that the present structure seems to provide the best possible solution.
The entries consist of four components: two terms (concepts), the author's name and the identifying symbol (indicator of identity) of the document. The first term is the primary term, the second is the co-term.The primary term is used as a weighted keyword followed by the corresponding co-terms, while the co-term is followed by the other two components.
The methods of processing the index are the following. entries are automatically generated from the title by computer. Three machine vocabularies are used consisting 1. of full stop words (irrelevant words); 2. of semi-stop words (semi-irrelevant words); and 3. of paired word stops (irrelevant paired words). The computer, based upon the vocabularies, selects the relevant and semi-stop words from the title entries. The remaining relevant and semi-stop words in the title are assigned as co-terms to each of the primary relevant words, thus forming pairs of primary term - co-term. The computer adds the author's name and the identification sign of the document to these pairs. The complete entries, finally, are organized into the stipulated order.
The critical examination of the Permuterm Index shows that the information content of the titles is being revealed by a maximum amount of entries. Therefore the index is complying with its intended purpose. Although the entries taken from the title contain a relatively small amount of information, there is a possibility of tracing practically all relevant words in the title. However, there is a serious drawback; the number of superfluous entries is considerable, which means that the computer generates all the primary term - co-term pairs from the title, irrespective of their relevance from the viewpoint of the content. The redundancy of title entries could only be eliminated by human sffort: this however, - owing to the necessary quantity of mental work - is far too circumstantial and costly and therefore practically not feasible.
The paper raises the question, whether the index could be rendered more usable by changing its structure.
One possible modification would be if the entries contained the full titles of the document instead of only the identification sign. This solution, however - even if it might seem useful theoretically - seems unrealistic due to the considerable increase in the size of the index.
Another possible change would be: to decrease or increase the number of title entries. In the first case the entry would include only one subject term. This structure would be obviously inferior to the existing one, since references under individual entries (concepts) would accumulate to an extent that searching would be practically impossible. An increase in the number of title entries, in turn, would only result in a considerable growth of the number of entries; consequently, both the size of the index and the number of superfluous entries would grow considerably and searching would become very difficult.
If we take into consideration the predominant aspects when determining the structure of the Permuterm Subject Index, it must be admitted that the present structure seems to provide the best possible solution.
Downloads
Published
2019-02-26
How to Cite
Orosz, G. The Permuterm Subject Index, Scientific and Technical Information, 15(8-9), p. 646–663, 2019.
Issue
Section
Articles