skip to content

Logo of the Regional Government
 
 

You are in:

Presentation of the thesaurus

Information

  1. Purpose of a thesaurus
  2. Subject areas
  3. Language versions and standards used
  4. Structure
    1. Subject areas and microthesauri
    2. Semantic relationships
      1. Scope notes
      2. Microthesaurus relationship
      3. Equivalence relationship
      4. Hierarchical relationship
      5. Associative relationship
  5. The thesaurus in figures

1. Purpose of a thesaurus

A thesaurus is a structured list of expressions intended to represent in unambiguous fashion the conceptual content of the documents in a documentary system and of the queries addressed to that system. This representation is carried out through document indexing and query formulating procedures in such a way that the concepts corresponding to both the former and the latter are expressed through the thesaurus descriptors.

The need to use these indexing and query formulating mechanisms is made clear when we consider that the ordinary language used by both the authors of documents and by users who want to find these documents is in fact often ambiguous:

  • It may be possible to express the same concept using a number of synonyms (e.g. agriculture, agriculture sector, agriculture and fisheries sector), and a document indexed using one synonym would not be found on the basis of a query which uses a different expression. One of the purposes of the thesaurus is to eliminate the problems of synonymity: one of the synonyms is chosen, in a more or less arbitrary fashion, as a descriptor; the other synonyms for the same concept, or terms that describe concepts that are very close to what is represented by the descriptor, are given the status of non-descriptors.

    Only descriptors can be used to represent the conceptual content when indexing documents and formulating questions; the non-descriptors, which are also shown in the thesaurus, allow users to know which descriptor to use.

    If a correspondence is established between identical concepts expressed in different languages, the user of a multilingual thesaurus can in addition query the documentary system in his/her own language and retrieve documents irrespective of the language in which they have been indexed.

  • One term can have various meanings (for example, "press" can mean the newspaper industry, journalists, printing machinery or various machine tools); a query using a term with homonyms would result in the retrieval of documents bearing no relation to the subject the user wants. One of the purposes of the thesaurus is to eliminate the problems of homonymity: each descriptor is put into a context in such a way that its meaning is unambiguous.

A thesaurus thus comprises:

  • Descriptors, i.e. words or expressions which denote in unambiguous fashion the constituent concepts of the subject area covered by the thesaurus (e.g. agriculture sector);

  • Non-descriptors, i.e. words or expressions which in natural language denote the same concept as a descriptor (e.g. agriculture and fisheries sector/agriculture sector) or equivalent concepts (e.g. agriculture/agriculture sector) or concepts regarded as equivalent in the language of the thesaurus (e.g. banana/tropical fruit);

  • Semantic relationships, i.e. relationships based on meaning (first between descriptors and non-descriptors and secondly between descriptors).

up

2. Subject areas

The Eurovoc Thesaurus covers all subject areas which are of importance for the activities of the European institutions:

  • Politics
  • International relations
  • European Communities
  • Law
  • Economics
  • Trade
  • Finance
  • Social questions
  • Education and communications
  • Science
  • Business and competition
  • Employment and working conditions
  • Transport
  • Environment
  • Agriculture, forestry and fisheries
  • Agri-foodstuffs sector
  • Production, technology and research
  • Energy
  • Industry
  • Geography
  • International organizations

Some subject areas are more highly developed than others because they are more closely involved with the Community's centres of interest. Thus, for example, the names of the regions of each Community Member State are in Eurovoc but not those of non-Community countries.

It should also be stressed that one of the characteristics of thesauri in general and of Eurovoc in particular is that the grouping of descriptors into subject areas is to a certain extent arbitrary. It is in fact possible for certain descriptors to relate to two or more subject areas, but in order to make the thesaurus easier to manage and to limit its size it is generally accepted that limits have to be put on polyhierarchy, in other words the systematic inclusion of each descriptor in all the subject areas to which it could belong. Descriptors that could fit into two or more subject areas are generally assigned only to the field which seems the most natural for users.

up

3. Language versions and standards used

The Eurovoc Thesaurus is published in the official languages of the European Community:

Eurovoc 4.2 exists in the 21 official languages of the European Union (Bulgarian, Spanish, Czech, Danish, German, Estonian, Greek, English, French, Italian, Latvian, Lithuanian, Hungarian, Dutch, Polish, Portuguese, Rumanian, Slovak, Slovene, Finnish and Swedish) and two other languages (Croatian and now Basque). Eurovoc has been translated by the national parliaments of certain other countries (Albania, Russia and Ukraine).

All the Thesaurus languages have equal status: each descriptor in one language necessarily matches a descriptor in each of the other languages.

However, there is no equivalence between the non-descriptors in the various languages, as the richness of the vocabulary in each language varies from field to field.

The Eurovoc Thesaurus has been compiled in strict accordance with the standards of the International Standards Organisation:

  • ISO 2788-1986 - Guidelines for the establishment and development of monolingual thesauri
  • ISO 5964-1985 - Guidelines for the establishment and development of multilingual thesauri

up

4. Structure

4.1. Subject areas and microthesauri

At a generic level, Eurovoc has a two-tier hierarchical classification:

  • Subject areas, identified by two-digit numbers and titles in words, e.g.:

    10 EUROPEAN COMMUNITIES

  • Microthesauri, identified by four-digit numbers (the first two digits being those for the field containing the microthesaurus) and by titles in words, e.g.:

    1011 COMMUNITY LAW

The numbering of subject areas and microthesauri is identical in all language versions.

up

4.2. Semantic relationships

At the specific level of descriptors and non-descriptors, the structure of Eurovoc depends on semantic relationships:

  • Scope note
  • Microthesaurus relationship
  • Equivalence relationship
  • Hierarchical relationship
  • Associative relationship

up

4.2.1. Scope notes

Some descriptors are accompanied by notes, introduced by the abbreviation SN (Scope note). The notes have a dual purpose:

  • Definition, if this clarifies the meaning of the descriptor (definition note)
  • Guidance on how to use the descriptor when indexing documents and formulating queries (application note)

up

4.2.2. Microthesaurus relationship

All descriptors are accompanied by a reference to a microthesaurus, introduced by the abbreviation MT (Microthesaurus) to show to which microthesaurus or microthesauri they belong.

4.2.3. Equivalence relationship

The equivalence relationship between descriptors and non-descriptors is shown by the abbreviations:

  • UF (Used For), between the descriptor and the non-descriptor(s) it represents
  • USE between a non-descriptor and the descriptor which takes its place

The equivalence relationship in fact covers relationships of several types:

  • Genuine synonymity, or identical meanings
  • Near-synonymity, or similar meanings
  • Antonymy, or opposite meanings
  • Inclusion, when a descriptor embraces one or more specific concepts which are given the status of non-descriptors; because they are not often used

up

4.2.4. Hierarchical relationship

The hierarchical relationship between descriptors is shown by the abbreviations:

  • BT (Broader Term) between a specific descriptor and a more generic descriptor, together with a number showing the number of hierarchical steps between the specific descriptor and each broader term.

    Notes:

    1. Descriptors with no broader terms are "top terms".
    2. Certain descriptors in subject areas 72 (GEOGRAPHY) and 76 (INTERNATIONAL ORGANISATIONS) are polyhierarchical, in other words they have more than one broader term at the next higher level.
  • NT (Narrower Term) between a generic descriptor and a more specific descriptor, together with a number showing the number of hierarchical steps between the generic term and each narrower term

up

4.2.5. Associative relationship

The associative relationship between descriptors is shown by the abbreviation RT (Related Term) between two associated descriptors.

The associative relationship can be of various kinds, including:

  • Cause and effect
  • Agency or instrument
  • Hierarchy (particularly in cases where, as explained in section 2, polyhierarchy has not been allowed; here, to help the user, the missing hierarchical relationships are replaced by associative relationships)
  • Concomitance
  • Sequence in time or space
  • Constituent elements
  • Characteristic feature
  • Object of an action, process or discipline
  • Location
  • Similarity (in cases where two near-synonyms have been included as descriptors)
  • Antonymy

Attention should also be drawn to the essential features of the associative relationship:

  • it is symmetrical;
  • it is incompatible with the hierarchical relationship: if two descriptors are linked by a hierarchical relationship there cannot be an associative relationship between them, and inversely;
  • descriptors under the same top term cannot be linked by an associative relationship.

up

5. The thesaurus in figures

All language versions of the Eurovoc Thesaurus comprise:

  • 21 subject areas
  • 127 microthesauri
  • 6645 descriptors (of which 519 are top terms)
  • 6669 reciprocal hierarchical relationships (6669 BT and 6669 NT)
  • 3636 reciprocal associative relationships

The subject areas, microthesauri, descriptors, hierarchical relationships and associative relationships are strictly equivalent in all languages.

The numbers of non-descriptors and scope notes, on the other hand, vary from language to language:

Languages Scope Notes Non-descriptors
Languages Scope Notes Non-descriptors
EUS 883 7 596
ES 891 7 756
CS 834 13 139
DA 827 6 602
DE 728 8 295
EL 825 6 866
EN 759 6 769
FR 840 6 691
IT 754 9 453
LV 680 6 009
HU 889 8 618
NL 849 6 828
PL 17 172
PT 761 6 308
SL 27 150
FI 859 5 445
SV 818 6 491

up