Legacy technology of library catalogs affects us today. What is the term “enter under” in AACR2 if not a throw-back to the book catalog where indeed titles were literally entered under the author’s name? In 2009, we are creating bibliographic metadata using a code of rules developed quite separately from any computerization of data, a set of rules that was implemented at a time when the size limits of the catalog card and the searching limitations of the card catalog were still the norm. Think back to 1978, those of you who are old enough to have been in libraries then.

As the linked data grassroots folks clarified at their ALA session (see the powerpoint slides at http://wikis.ala.org/annual2009/index.php/Grassroots_Programs), we are in a different world in terms of data structures and possibilities. The change is not simply to our bibliographic records as it was throughout the 20th century but from bibliographic records to something else entirely. The Marc format, yes, was a communications format for bibliographic data but it was also a handy-dandy card set maker, the most obvious instances of that being the Festschrift byte and the second indicator of the 245. (Try explaining the utility of the Festschrift thingie without going into divided catalogs and card printing profiles!)

We have been developing our data content standards and our data structure definitions in isolation from each other. AACR2 and MARC were developed and maintained by two different groups of folks under different auspices, often at cross purposes (though not at odds with each other, just separated and sometimes unaware). Are we headed in that direction again?

Just as AACR2 wasn’t, RDA is not an open standard. It is an owned publication. As such, its maintenance and distribution will be limited by concerns related to publication ownership. Many will be priced out of having access to it while still trying to apply it in data they share with the rest of us. This is wholly antithetical to the development of data standards that are trying to “leave the silo.” It’s not that there is some evil corporate empire behind this control over our cataloging code. We did this to ourselves, starting with the first publication of ALA Rules in 1883.

The changes we must embrace are fundamental. We need to embrace openness in our standards development, joining the rest of the world in what makes the Web work. If we want to encourage use of a standard we must make it readily available to anyone creating bibliographic metadata.

We need to acknowledge that the stand-alone bibliographic record as we know it is not a valid structure in the linked data world. We need to understand that catalogs may work best if instead of policing access to the data we encourage more open wiki-style data development.

We need to figure out how we deal with the change in the economics of bibliographic metadata that must occur for libraries, vendors, system developers, and data providers. LC doesn’t want to be the world’s chief cataloger anymore. Nobody other than OPAC folks who have to will pay ALA for access to RDA when other, more open, metadata practices are available.

If we strive now to adopt RDA while shoe-horning the results into MARC-based bibliographic records and our current models of who does what, are we wasting our time on development of work arounds when we could be developing a wholly new environment? Are we going to spend hundreds and thousands of man-hours training catalogers and other library workers how to do this awkward adaptation thereby delaying our ability to pay for the eventual greater change?

We have to solve the issues of data structure and infrastructure before we go worrying about how we train catalogers to adopt RDA to Marc and the rule of three, don’t we? I think back to the hours and hours of labor spent on adopting heading changes in card catalogs for AACR2 in the early 80s, changes that really didn’t affect access and would be much easier a few years later in online catalogs.

So, what exactly does the current testing of RDA by the chosen 20 comprise?

(Unfortunately, I was not able to attend the grassroots meeting and am basing my reference to that group on their slides and other writings by the individuals in the group.)

In an October 11, 2007 AUTOCAT discussion, James Weinheimer said, “Part of fitting into the larger world of metadata will mean that we will have less control over many things, and our terminology will probably be one of the first casualties. Ultimately, I think it will turn out to be one of the easier things we will have to give up.”

Hal Cain responded, “I find it utterly perplexing that RDA is being prepared for cataloguers, its primary audience, yet with no attempt to produce any consensus about the terms in which it is to be expressed; indeed, with no reasonable attempt to explain how the new vocabulary agrees with or differs from the old. It seems to me that what we’re being led into is a different discipline from what we practice now.”

It has taken me the last year and a half to start to understand the new vocabularies and make “mental-crosswalks” to what I know about cataloging, metadata, and other aspects of information retrieval. My understanding today is far ahead of where it was last Fall when I tried to explain what I thought was happening to the joint OLAC/MOUG conference and I have the advantage of the kind of job that allows me lots of time for study and cogitation. Had I been in a library technical services operation now I would just be shaking my head and hoping someone would tell me about it later.

Why is this? Well, I must go back to the blind men and the elephant metaphor. The vocabulary-disjunct described above is like putting boxing gloves on the sightless examiners. “Here, figure this out, why don’t you — oh, and you will have to interpret it according to what you have knowledge of through this barrier of foam padding.

Sometimes the simplest ideas seem like gibberish because the terminology is so foreign. Okay, I’ll admit it, I wasn’t getting the idea of FRAD (Functional Requirements for Authority Data) because I’d come upon statements like, “Like FRBR, FRAD describes an entity-relational model, with the focus of FRAD on the entities related to ‘authority data’ rather than to the ‘bibliographic record’ itself,” and realize I couldn’t translate this into something I could understand sufficiently to explain it to students. I admitted it to Glenn Patton at the OCLC/MOUG do that I was FRAD-clueless and said told me to think of the ability to fill in blanks with data from elsewhere. Bingo, I started to get it.

Sometimes we spend too much time trying to explain things in detail, losing people along the way as their eyes glaze over. We need clear explanations that use terminology we can understand. A very clear explanation of flat files and
relational databases can be found at “What are relational databases?” 23 March 2001. HowStuffWorks.com.
http://computer.howstuffworks.com/question599.htm Marc records are flat file records. Our OPACs break up the content into tables to operate. So far, so good.

MARC records, created by catalogers, using standardized content standards (e.g. AACR2, LCSH, etc.) have been the primary source of information to populate the tables of our OPACs. Information from 245 $a has been moved to the table of titles. Information from a 650 has been placed in subject tables.

The new way of looking at things says, why shouldn’t we be able to populate the tables with data from a variety of sources? If the title has already been entered in a title field in some other data labelling scheme, why could our OPACs not use that data in the absence of data from a MARC record?

Consider, for example, the representation of books on order in our OPACs. Right now, we create a MARC record to represent the ordered item — we basically do preliminary cataloging. What if our OPACs could use the ONIX data a publisher creates in lieu of a MARC record?

The cataloger in me rises up and says, “Yeah, but the publisher data might not use the same capitalization and punctuation information!” But now I have to ask, “So, what?” Does this matter? Will it affect retrieval or relevance assessment?

We are undergoing a revolution, my friends, but the revolution is not RDA or FRBR. The revolution is one that must reconsider the roles catalogs and cataloging have served. And we need to do this without fear of change but rather excitement for what this might make possible.

Like many of you, I rushed over to read Shawne Miksa’s new article in the ASIS Bulletin, “Resource Description and (RDA) and New Research Potentials.”

The article is exactly what it claims. A list of suggested research directions, wholly appropriate for the ASIS Bulletin. Don’t look for it to break new ground. Those of us who have been following the issues and discussions will appreciate the articulation of research directions but this is not really an article that some may be hoping for.

My favorite sentence, however, is in reference to the ongoing tests of the RDA draft. “These tests should generate a considerable amount of data for analysis and study. At the very least, the testing may simply reveal that the rules don’t work and thus show us how not to develop cataloging guidelines, which is always a valuable lesson.”

Paradigm shift is just as overused a phrase as deck chairs on the Titanic but what we have is a cluster of paradigmatic shifts going on. We are shifting from the concept of bibliographic and authority records to mashable metadata. RDA looked in this direction but when work was undertaken five years ago, we weren’t where we are now. We are also coming to realize that we can no longer treat our standards as the writing of a copyrighted book with its ownership in the hands of a publisher. Standards need to be open and the ownership/copyright model is antithetical to this.

I just re-read Roy Tennant’s 2004 Library Hi Tech article on metadata. It makes a great deal more sense to me now than it did five years ago. Among the statements he makes that got my attention were,

“No single organization should own the essential pieces of a new bibliographic infrastructure”

and

“We do not need a bibliographic record format. We need a bibliographic metadata infrastructure that has a number of components, each of which may have multiple variations. Our systems must be able to accommodate a great diversity of record formats to provide us with the flexibility and power that only such diversity can provide.”

Yowser.

If you’re looking for it, he has his author’s version posted at http://roytennant.com/metadata.pdf

Citation: Roy Tennant. A Bibliographic Metadata Infrastructure for the 21st Century. Library Hi Tech 22, no. 2 (2004): 175-181.

First, apologies to those of you who might have been wondering whether I left the planet. Nope. Just getting ready for teaching this summer in the short terms — something I haven’t done in a few years.

Meanwhile, a great discussion has been going on at AUTOCAT and I’ve been participating over there. I will always be loyal to AUTOCAT as it’s kept me connected for so many, many years.

When our current catalogs were set in motion in 1876, there were few other bibliographic tools. Poole’s index appeared occasionally and then Wilson’s Reader’s Guide made its appearance at the beginning of the 20th century but, for most libraries, the catalog played a prime role in providing access to bibliographic information. Catalogs were published in book form and shared among libraries with larger libraries developing extensive collections of the catalogs of other libraries. The twentieth century saw the publication of many subject-specific bibliographies but it wasn’t until the middle part of the century that indexing of the periodical literature really took off.

The second half of the twentieth century saw the library catalog playing proportionally smaller role in bibliographic searching. By bibliographic searching, I do not mean the search for bibliographic records as we do in technical services operations but rather the search for citations to documents that underlies our use of catalogs and bibliographic databases. The library catalog’s chief utility was in identifying individual monographs owned by individual or groups of libraries. Each catalog did this for a limited number of libraries although the largest did it for thousands of libraries and came to be known as “WorldCat.”

Over the course of the 20th century, libraries found ways to do proportionately less and less creation of bibliographic records. By the end of the century, for most libraries, most records that enter the catalog are created by some other source, often the US Library of Congress. Because the catalog continued to limit its contents to items “held” by a library, a complex system developed for choosing and downloading individual cataloging records. Since the information explosion was under way and, as Ranganathan specified, “The library is a growing organism,” the cataloging operation continued as a major function in many libraries.

From the standpoint of the folks who do it (catalogers) and library administrators, the catalog differs greatly from other bibliographic databases. It’s chief differences are two: 1) It limits its scope by library holdings, and 2) it’s done in-house. There are advantages and disadvantages associated with these two characteristics.

As the 20th century came to an end and the 21st century began, the role of the catalog as bibliographic retrieval device continues to diminish in proportion to other retrieval devices.. A revolution approaches that requires a wholly new way of looking at bibliographic retrieval by those who are now involved with cataloging.

[A lousy place to break this off, but bear with me. It’s time to go congratulate our local graduates and welcome them to our profession!]

Christine Schwartz over at Cataloging Futures commented about the under valuing of subject analysis and classification in the new OCLC report: Online Catalogs: What Users and Librarians Want. After discussing the desire of catalog users to have a way for catalogs to retrieve more relevant results it says –on page 15 of the report (p. 23 of the .pdf)–

“Improving the relevance of search results is an interesting data quality problem whose solution goes well beyond the boundaries of the types of metadata that catalogers have been responsible for supplying, obtaining, managing or mining.”

No, NO, NO!!! What the heck are subject headings & classification but clear indications of relevance to topics specified?!? Weight the subjects in making relevance based retrieval! Ack! Instead, for myriad reasons, we waste the subjects and think we’d be just as well off without them. Grack!

Oh, I wish I had time to go into this more coherently today but it’s the end of the semester so this week is kaput. But soon, I will be back with more on this. Tonight is my last online subject analysis/classification class of the semester wherein the talk is about the future. How timely.

Trying to understand what is needed for bibliographic access and what some folks are talking about is not a simple endeavor. And much of the time we seem to be talking at cross purposes.  Often we aren’t but we are speaking different languages. We are at once surrounded by MEGOs.

In his political dictionary, William Safire described the MEGO (My Eyes Glaze Over — me go) as something incredibly important but soporifically dull. Others describe it as a barrage of technical terminology that confuses someone who is too embarrassed to admit they don’t understand it. Our current situation is strewn with MEGOs.

For many non-catalogers, AACR2, authority control, the ins and outs of the MARC format, etc., are all MEGOs. But in recent years catalogers have been inundated by MEGOs: FRBR and all its terms, RDA and all its terms, the terminology of web-based metadata, the list goes on.

I do wish I had heard the NETSL presentation that went along with Rick Block’s slides for “RDA: Boondoggle or Boon? And What About MARC?”. His slides do an excellent job of articulating what many of us are feeling, which is a deep and profound “Huh?”

pullindifferentdirectionsDo you feel somewhat like this guy in the middle?  I know I do and I’m not even running a cataloging operation.  All I really have to do is figure out what the heck I should concentrate on in the upcoming cataloging class.  I mean, is it fair to ask them to purchase AACR2?  Will ALA Publishing have any in the warehouse?  I don’t imagine they plan to print more.  No, I don’t expect specific answers to those questions here.

Anyway, just a small wail before I try to understand the MEGOs one at a time.

This started as a comment in reply to Jeffrey Beall’s comment on my last post — but, like Topsy, it jus’ grew. So, here it is as a new post.

I think there are folks out there who think cataloging is wholly unnecessary. One might also think article indexing is unnecessary as we have more access through full text searchable databases. I don’t honestly know if they are right and neither does anyone else on either side of the argument.

We dyed-in-the-wool catalogers can rant until the end of time (or the end of cataloging) that “they” are out to destroy us but though a cataloger in my heart I am first a librarian. The purpose of librarianship is not cataloging but providing people with the information they need to be successful in their lives. Cutter’s objects were all about helping folks FIND information. Our first question should be, “How can we best do that?” I can say cataloging and someone else can say Google-books-style access and neither of us can say who is right or wrong because we don’t have EVIDENCE! I mean evidence that user needs are better served by one or the other.

What we are talking about is bibliographic access not cataloging per se. What does bibliographic data need to look like to best serve its purposes. We have always decided this by guess and by golly with a soupçon of but-we’ve-always-done-it-this-way thrown in. RDA both fights against this and caves in to it resulting in a mess.

I have to come back down to earth to get ready to teach the Dewey Decimal System tonight. We continue to educate future librarians in what we do now. How can it be otherwise? But I wish I could tell them we are approaching major changes in a rational way based on openness and evidence. I can’t.

We can limit our discussions to the technological aspects of wholesale changes to cataloging and even acknowledge the costs and difficulties of making changes, but I think we unwisely limit our conversations if we don’t start to address openly the issues of control and ownership when it comes to our bibliographic universe.

In a previous squib, I talked about ALA Publishing’s ownership (no quotes) of RDA and its impact but the issue is bigger than that. Look at Diane Hillmann’s April 9th squib talking about, among other things, the process behind RDA. She says, “Can we finally look at what worked and didn’t with the RDA development process, at what the tools available to us provide to meet our needs for broad participation and quality control, and design something that makes more sense? We cannot just keep maintaining the powdered wigs and the formal dancing in the face of the revolution happening outside our gates.”

Think, too about the request (order?) from LC for Ed Summers to take down lcsh.info and LC’s claim that they will do it so the rest of us don’t really have to try to improve the US government info that is LCSH — LC’s reliance on the regulation that allowed them to charge 10% above cost for cards as permission to exercise intellectual property ownership over their cataloging output. Karen Coyle helps to make this clear here.

Then there’s OCLC’s continuing attempts to express ownership of the bibliographic data created and shared by many public (and private) institutions.

We have a problem, folks. Technology is not the only thing holding us back. A failure to commit to treating our standards as commonly owned and developed tools has a role, too.