Controlled vocabularies

Introductory articles

Discussion articles

  • All about facets and controlled vocabularies
    Information architects are fascinated with faceted classification and its application to information architecture problems. However, facets remain difficult to understand and there are few options for learning about them. This is the first in a series of articles that aims to correct this situation. We intend to explain both facets and the more general concept of controlled vocabularies. We want to make the subject accessible to those who don’t have advanced degrees in library and information science. Furthermore, we want to show how these concepts can be applied to solve information architecture problems for the web and other digital information environments.

  • Creating a controlled vocabulary
    This article describes a process for building your own controlled vocabulary (CV). Creating a clear plan early on can save you a lot of trouble down the road and minimise unwelcome surprises. The broad strokes of CV design are like any other type of design: planning and preparation are essential, fundamental steps in producing a good design.

  • Mind your phraseology: using controlled vocabularies to improve findability
    An introductory article from Christina Wodtke showing how controlled vocabularies can help people find information on the web.

  • Preparing a controlled vocabulary for content management and access: an indexer's perspective
    "What is the difference between indexing a resource and developing a controlled vocabulary for classification? It’s really about the perspective of the person organizing the information."
    (Earley and Associates)

  • Synonym rings and authority files
    Synonym rings and authority files are simple tools that can bridge the gap between natural language and complex controlled vocabularies (taxonomies and thesauri) quite nicely.

  • Tomatoes are not the only fruit: a guide to controlled vocabularies
    This is a brief introduction to the relationships between taxonomies, thesauri and ontologies, and similar 'things'. It doesn't contain definitive, scientific definitions, it is a personal interpretation of some fairly complex structures. It aims to give you a fairly clear what these 'things' are, so librarians or IT people can't blind you with science. "
    (Maewyn Cumming)

  • What is a controlled vocabulary?
    A controlled vocabulary is a way to insert an interpretive layer of semantics between the term entered by the user and the underlying database to better represent the original intention of the terms of the user. Consider what happens when you do not use a controlled vocabulary. An uncontrolled vocabulary simply uses the natural language of the documents and matches that with the natural language of the user. This is extremely specific, and it gives the user exactly what they ask for. Sounds great right? Consider, however, a site about chemistry, where many of the documents use the chemical name of the element ("iron"), and many use the chemical symbol of the element ("Fe"). Using an uncontrolled vocabulary, the results will only include the terms entered by the user. If the user entered "Fe" in the search box, he will not get any of the results for documents that use the term "iron." There is a good chance the user is missing some documents he would like to have. Very few users will enter both terms, and many will be reviewing their results thinking they are seeing the results from all relevant documents.

Research articles

  • Converting a controlled vocabulary into an ontology: the case of GEM
    The prevalance of digital information raised issues regarding the suitability of conventional library tools for organising information. The multi-dimensionality of digital resources requires a more versatile and flexible representation to accommodate intelligent information representation and retrieval. Ontologies are used as a solution to such issues in many application domains, mainly due to their ability explicitly to specify the semantics and relations and to express them in a computer understandable language. Conventional knowledge organisation tools such as classifications and thesauri resemble ontologies in a way that they define concepts and relationships in a systematic manner, but they are less expressive than ontologies when it comes to machine language. This paper used the controlled vocabulary at the Gateway to Educational Materials (GEM) as an example to address the issues in representing digital resources. The theoretical and methodological framework in this paper serves as the rationale and guideline for converting the GEM controlled vocabulary into an ontology. Compared to the original semantic model of GEM controlled vocabulary, the major difference between the two models lies in the values added through deeper semantics in describing digital objects, both conceptually and relationally.

  • Semantic problems of thesaurus mapping
    With networked information access to heterogeneous data sources, the problem of terminology provision and interoperability of controlled vocabulary schemes such as thesauri becomes increasingly urgent. Solutions are needed to improve the performance of full-text retrieval systems and to guide the design of controlled terminology schemes for use in structured data, including metadata. Thesauri are created in different languages, with different scope and points of view and at different levels of abstraction and detail, to accommodate access to a specific group of collections. In any wider search accessing distributed collections, the user would like to start with familiar terminology and let the system find out the correspondences to other terminologies in order to retrieve equivalent results from all addressed collections. This paper investigates possible semantic differences that may hinder the unambiguous mapping and transition from one thesaurus to another.

  • Vocabulary mapping for terminology services
    The paper describes a project to add value to controlled vocabularies by making inter-vocabulary associations. A methodology for mapping terms from one vocabulary to another is presented in the form of a case study applying the approach to the Educational Resources Information Center (ERIC) Thesaurus and the Library of Congress Subject Headings (LCSH). Our approach to mapping involves encoding vocabularies according to Machine-Readable Cataloging (MARC) standards, machine matching of vocabulary terms, and categorizing candidate mappings by likelihood of valid mapping.  Mapping data is then stored as machine links. Vocabularies with associations to other schemes will be a key component of Web-based terminology services. The paper briefly describes how the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is used to provide access to a vocabulary with mappings.

Glossaries

Bibliographies

  • Publications on thesaurus construction and use
    This is a list of printed and electronic publications about the principles of constructing and using information retrieval thesauri. It is not a list of existing thesauri, although some thesauri have been included when they are good examples or illustrate the results of different approaches to thesaurus construction.

Resource collections

  • ControlledVocabulary.com
    An introduction to controlled vocabularies and some examples of how they are used with a focus on applying a controlled vocabulary to an image database.