Metadata

Introductory articles

  • A metadata primer
    More information is available than ever before--on the Web, your company intranet, in your content management repository, and elsewhere. This is both exciting and problematic, and extremely frustrating when you can't find what you're looking for. What is missing is information about the information--that is, labeling, cataloging and descriptive information--that enables a computer to properly process and search the content elements. This information about information is known as metadata.

  • Overview of metadata
    Part of a series of readings for students of information science, this article provides an introduction to the idea of "metadata" and to how this term is used in library and information science, with respect to both library-cataloging practice and the provision of access to information on the web.

  • Understanding metadata (PDF)
    Understanding Metadata is a revision and expansion of Metadata Made Simpler: A guide for libraries published by NISO Press in 2001. It provides a thorough introduction to metadata and covers the function and structure of metadata, metadata schemes and elements, and creating metadata.

Discussion articles

  • 2003 Dublin Core conference summary
    Dublin what? You may be wondering, "What is Dublin Core? And why would you need a whole conference about it?" The end of September and beginning of October brought representatives from various countries around the world to a sunny and warm Seattle, Washington, host of the 2003 Dublin Core Conference where the theme was Supporting Communities of Discourse and Practice--Metadata Research and Applications.

  • Building a metadata-based website
    The online world has been flooded in recent years with talk of metadata, structured authoring, and cascading style sheets. The idea of a semantic web is gaining momentum. At the confluence of these two broad categories of activity, new models of websites are emerging.

  • Collaborative knowledge gardening
    Conventional wisdom holds that people will never assign metadata tags to content. Abandoning taxonomy is the first ingredient of success. These systems--Flickr and del.icio.us--just use bags of keywords that draw from and extend a flat namespace. Feedback is immediate. As soon as you assign a tag to an item, you see the cluster of items carrying the same tag. If that’s not what you expected, you’re given incentive to change the tag or add another.

  • Death of a meta tag
    My advice about the meta keywords tag has long been simple. For those running large Web sites, or who are short on time, don't worry about it. The stress and time involved in trying to craft a tag is not worth it, in terms of the minor benefit it might bring. It is far more important for site owners to instead concentrate on creating good title tags for their pages, a key page element that has consistently shown it can help with ranking across all major crawlers.

  • Demystifying metadata
    In the faddish dot-com world it's tempting to dismiss metadata as this nanosecond's buzzer button, but metadata is really an age-old answer to an age-old problem. The problem is, how to get the most out of a stored collection of information. Datastores are bigger than ever and so is the problem. A consensus is growing that metadata is the answer. Metadata is often described as "information about information" but I prefer to think of it as another layer of information - simplified, distilled, made orderly - created to help people use an information source.

  • Developing and creatively leveraging hierarchical metadata and taxonomy
    In content metadata and hierarchies, you will often find a goldmine of implicit and explicit data that you can leverage to creatively contextualise content. After a brief introduction on taxonomy and metadata, this article focuses on finding and utilising such relationships in hierarchies.

  • Faceted metadata search and browse
    There is no single way to provide navigation for everyone: users have such disparate needs. Traditional field-based or parametric search engines for structured data have used a command line or provided a form to fill out, but these require a lot of knowledge on the searcher's side. Full text search wipes out the value of the metadata. A good solution to these problems involves exposing the facets in dynamic taxonomies, so that the search user can see exactly the options they have available at any time.

  • Findability with tags: facets, cluster and pivot browsing
    "For a while I have been thinking of different ways of supporting finding information with tags that go beyond tag-clouds. There are three trends that are worth pointing out."
    (Rashmi Sinha)

  • Folksonomies and controlled vocabularies
    "The advantage of folksonomies isn't that they're better than controlled vocabularies, it's that they're better than nothing, because controlled vocabularies are not extensible to the majority of cases where tagging is needed. Building, maintaining, and enforcing a controlled vocabulary is, relative to folksonomies, enormously expensive, both in the development time, and in the cost to the user."
    (Clay Shirky)
  • Folksonomies - cooperative classification and communication through shared metadata
    "This paper examines user-generated metadata as implemented and applied in two web services designed to share and organize digital media to better understand grassroots classification. Metadata - data about data - allows systems to collocate related information, and helps users find relevant information. The creation of metadata has generally been approached in two ways: professional creation and author creation. In libraries and other organizations, creating metadata, primarily in the form of catalog records, has traditionally been the domain of dedicated professionals working with complex, detailed rule sets and vocabularies. The primary problem with this approach is scalability and its impracticality for the vast amounts of content being produced and used, especially on the World Wide Web. The apparatus and tools built around professional cataloging systems are generally too complicated for anyone without specialized training and knowledge. A second approach is for metadata to be created by authors. The movement towards creator described documents was heralded by SGML, the WWW, and the Dublin Core Metadata Initiative. There are problems with this approach as well - often due to inadequate or inaccurate description, or outright deception. This paper examines a third approach: user-created metadata, where users of the documents and media create metadata for their own individual use that is also shared throughout a community."
    (Adam Mathes)

  • Folksonomies: how about metadata ecologies?
    "Folksonomies are clearly compelling, supporting a serendipitous form of browsing that can be quite useful. But they don't support searching and other types of browsing nearly as well as tags from controlled vocabularies applied by professionals".
    (Louis Rosenfeld)

  • It's time to get serious about metadata
    When it comes to the web, there is nothing more misunderstood than metadata. Technical people search vainly for a way to automate its creation. Many editors and writers want nothing to do with it. And yet without quality metadata a website cannot properly achieve its objectives. It’s time to get serious about metadata.

  • Living with topic maps and RDF
    For someone looking into topic maps and RDF today the similarities between them are obvious, and it may appear absurd that users should be forced to choose between two technologies that to them must seem almost indistinguishable. However, as this paper tries to show, there are a number of reasons why this is so.

  • Managing content with automatic document classification
    News articles and web directories represent some of the most popular and commonly accessed content on the web. Information designers normally define categories that model these knowledge domains (i.e. news topics or web categories) and domain experts assign documents to these categories. The paper describes how machine learning and automatic document classification techniques can be used for managing large numbers of news articles, or Web page descriptions, lightening the load on domain experts. The paper uses two datasets, one with with more than 800,000 Reuters news stories and another with over 41,000 websites, and classifies them using a Naïve Bayes algorithm, into predefined categories. We discuss the different parameters and design decisions that normally appear when building automatic classifiers, including, stemming, stop-words, thresholding, amount of data and approaches for improving performance using the structure in XML documents. The methodology developed would enable web based applications or workflow systems to manage information more efficiently, i.e. by assigning documents to topics automatically or assisting humans in the process of doing so.

  • Merging metadata and content-based retrieval
    The article describes a "hybrid" educational resource discovery system, which combines metadata and content-based retrieval methods. A pilot study was conducted to compare this hybrid system with an existing metadata-based system, with the aim of finding out if the hybrid system helps educators locate relevant resources with less effort. The results of the study suggest that the hybrid system decreased the variability in the number of user actions required to locate learning resources. The hybrid system interface featured embedded links, pointing to inner pages within a larger compound learning resource; study participants made use of these embedded links to locate individual learning objects.

  • Metacrap: putting the torch to seven straw men of metadata
    A world of exhaustive, reliable metadata would be a utopia. It's also a pipe-dream, founded on self-delusion, nerd hubris and hysterically inflated market opportunities. There are at least seven insurmountable obstacles between the world as we know it and meta-utopia. I'll enumerate them below.

  • Metadata and search
    A summary of a workshop on metadata and search from the 2003 Dublin Core Conference. Links to presentations used in the workshop are provided.

  • Metadata and taxonomies for a more flexible information architecture
    This presentation describes a methodology for developing customised taxonomies and metadata schema for a collection and its users. The methodology takes into account both the basic indexable aspects of content objects, and the ways that a particular group of users tends to search for them. The presentation then illustrates how an information architecture based on taxonomies and metadata can be used to make a number of basic web site and intranet functions more flexible and dynamic.

  • Metadata and XML: improving the findability of information (PDF)
    Presentation notes from Tekom European Information Development Conference 2004.
    (Peter J. Bogaards)

  • Metadata for the masses
    Many classification systems suffer from an inflexible top-down approach, forcing users to view the world in potentially unfamiliar ways. But what if we could somehow peek inside our users’ thought processes to figure out how they view the world? One way to do that is through ethnoclassification--how people classify and categorize the world around them.

  • Metadata is an essential web writing skill - part 1
    Metadata is one of the most misunderstood aspects of content management and website design. Editors and writers tend to look at it as a technical issue. Technical people look for a software solution. Both are wrong. Metadata is a fundamental skill that web writers and editors must acquire.

  • Metadata is an essential  web writing skill - part 2
    Creating great metadata for your content begins with understanding who your reader is. What is the metadata they look for when they read a page of your content? What are the type of words they use when they search for your content? When scanning your classification, what are the "trigger words" that will make them want to go deeper into your website?

  • Metadata on the web: on the integration of RDF and topic maps
    There are two competing models by which we can express metadata. RDF (Resource Description Framework) is a W3C recommendation and by design is meant to form the base of the W3C’s vision of the Semantic Web. Topic Maps is the ISO 13250 standard, and although developed independently of the W3C, it has several properties that make it an interesting alternative to RDF. The two languages are rather different even in their basic concepts, and the choice of one model can have far-reaching consequences both on the kind of statements that can be expressed on a resource, and, more importantly, on the long-term usefulness of these statements. We set about developing META, three integrated tools for the coherent management of metadata, both for RDF and Topic Maps. META is composed of a metadata editor, a metadata navigator, and a bidirectional converter from RDF to Topic Maps and vice versa.

  • Metadata: seven tips for writing better keywords
    The shift in how search engines treat keywords is significant. They tend to ignore the keyword meta tag and rather look for keywords in the actual page content. This means that you need to figure out your keywords before you write any content. Then, you include them throughout your content, particularly in headings and summaries.

  • Metadata? Thesauri? Taxonomies? Topic maps!
    Information architects have so far applied known and well-tried tools from library science to solve this problem, and now topic maps are sailing up as another potential tool for information architects. This raises the question of how topic maps compare with the traditional solutions, and that is the question this paper attempts to address. The paper argues that topic maps go beyond the traditional solutions in the sense that it provides a framework within which they can be represented as they are, but also extended in ways which significantly improve information retrieval.

  • Social consequences of social tagging
    "I think folksonomy has incredible value—the two web sites that I use most heavily right now are Flickr and del.icio.us. And I understand that this is something that can’t be stuffed back into the bottle. Nonetheless, I don’t think that means we have to accept it with an uncritical eye, or adopt every new implementation of tagging without consideration."
    (Liz Lawley)

  • The cognitive cost of classification
    The mental effort required to consistently assign keywords outweighs the benefits for most frontline contributors to content, document, and knowledge management systems.

  • The knowledge-model driven enterprise
    This article describes how externally focused metadata is an essential element of a truly robust enterprise data model. It shows how a metadata repository can serve as a fundamental resource for enterprise applications of all kinds. And it argues for a wider role for information architects in designing and developing the kind of metadata required to serve such a broad purpose.
    (Andy Schriever)

  • Web search: how the web has changed information retrieval
    Topical metadata have been used to indicate the subject of Web pages. They have been simultaneously hailed as building blocks of the semantic Web and derogated as spam. At this time major Web browsers avoid harvesting topical metadata. This paper suggests that the significance of the topical metadata controversy depends on the technological appropriateness of adding them to Web pages. This paper surveys Web technology with an eye on assessing the appropriateness of Web pages as hosts for topical metadata. The survey reveals Web pages to be both transient and volatile: poor hosts of topical metadata. The closed Web is considered to be a more supportive environment for the use of topical metadata. The closed Web is built on communities of trust where the structure and meaning of Web pages can be anticipated. The vast majority of Web pages, however, exist in the open Web, an environment that challenges the application of legacy information retrieval concepts and methods.

  • Writing for the web, part 2
    Writing for the Web requires careful planning. Your content needs to fit well within the context of your website. When a reader finds your content, they need to be able to scan it quickly. That's what metadata is about. In order for your website to be found, you need to write for how people search.

Case studies

  • Metadata based search and browse functionality on the NSW Office of Fair Trading intranet: a case study
    The NSW Office of Fair Trading launched its first intranet in June 2003. At the very beginning of the intranet project we recognised that unless users could find information easily the intranet would not succeed. We also understood that different people prefer to find information in different ways. To maximise the chances of searchers finding relevant information, and to provide flexibility in search options, we developed and implemented metadata driven search and browse functions. This case study describes the standards, tools and technology we used and how metadata was manipulated to retrieve information in a number of different ways.

Presentations

Tools

  • Taxomita
    Taxomita is a tool for creating faceted taxonomies using PHP and MySQL. Version 1 is in beta testing.

  • Facetmap
    FacetMap was constructed initially to demonstrate the concept of traversing multiple taxonomies simultaneously, and now also to offer a data model and programming interface so that web designers can incorporate the process into their own sites. A solution so simple that we can take any metadata you've got, and turn it into a browsing system.

Glossaries

  • Metadata glossary
    In an attempt to summarise the relationship among various metadata formats and how they relate to building Internet systems I wrote a glossary. I then ordered and tied the terms together with a bit of narrative to explain the relationships among the terms.

Interviews

  • Unraveling the mysteries of metadata and taxonomies
    Christina Wodtke of Boxes and Arrows interviews Samantha Bailey (former Argonaut and current lead IA for Wachovia Corporation’s Wachovia.com website) about Information Architecture, her dream process and the mysteries of metadata and taxonomies.

Resource collections

  • Metadata resources
    A collection of metadata-related resources from the International Federation of Library Associations and Institutions.