by
Brenda Parris Sibley
School of Library and Information Studies
University of Alabama at Tuscaloosa
July 7, 1998
Introduction
Over the last twenty to thirty years, information technology has been changing the way we work in libraries, including the tools we work with and the kinds of materials through which we provide information. The changes have been very rapid in the last few years, and the Internet is fast becoming the information resource of choice. Some have speculated about whether there would be a need for libraries in the future. Some have said there would certainly be no need for catalogers. But those people have failed to notice there are librarians who are whole-heartedly embracing the new technologies, and that there are catalogers who are bringing their skills into this information age, organizing--yes, cataloging--the Internet!
This paper will explore why the Internet should be cataloged, who is cataloging it, what problems are involved, and what is being done in planning for the future.
Is it really practical to catalog the Internet? Are search engines and web pages of links not enough?
In a 1995 study, Arlene G. Taylor and Patrice Clemson compared search engines and catalogs.1 They found the weaknesses of search engines to be:
This rapidly expanding and changing resource we call the Internet needs organizing, and who can do it better than we who organize information every day? We may try lists of URLs on our web sites, but those lists grow long and hard to keep organized, and the helpful descriptions found in bibliographic records are missing from these lists. Perhaps we still need the MARC record after all?
In the manual used by OCLC for is Internet Cataloging Project, Cataloging Internet Resources: A Manual and Practical Guide, Nancy B. Olson gives the basic premises of the Project:2
Even among librarians, there has not been agreement as to the necessity of cataloging the Internet. Jim Holmes, from the University of Texas at Austin, relates that access to e-journals through the library's web page was praised at a meeting, ignoring the access that was being provided through the library's OPAC. He thinks this is changing though: 3
...there is a perception, which I believe is changing,
that e-journals should not be cataloged. There is a belief that
Web search engines provide better access to information on the
Web and that alphabetical lists of e-journals are an appropriate
approach to these materials. I believe however that most
librarians realize that present Web search technology is not
up to the task of information retrieval for scholarly purposes
and that the catalog currently is the single best mechanism for
organizing and collecting information for all users.
With time that perception should change more. The search engines just don’t do the job of an OPAC. Neither do subject-organized lists on web pages, though they are helpful. As Hawkins has said "adding Web based serials allows Web OPAC to function as an internet gateway offering users the full range of access points, subject analysis and search functionality available for other types of materials."4 Hsai-Yee has said that librarians, and catalogers in particular, have the expertise needed for cataloging the Net: 5
Libraries are better suited than search engines or Internet
search services in selecting resources because they have had experience
in acquiring materials of various formats for their local users.
Librarians' expertise in resource selection and their relatively
well-defined local constituencies will ensure their success in
evalutating and selecting Internet resources. Catalogers, in
particular, should be involved in organizing Internet resources
because they have applied these principles to the cataloging of
materials in various formats and should be able to apply these
principles to the cataloging of Internet resources with equal
efficiency.
The Internet is being cataloged, in local OPACs and across the country and around the world. OCLC's Internet Cataloging Project began in 1991 with 30 catalogers volunteering to catalog a sample. The key findings from this first project were: 6
In response to these findings, Nancy Olsen's Cataloging Internet Resources: A Manual and Practical Guide was published, and the USMARC 856 field was devised and proposed. The InterCat Catalog was developed (purl.org/net/intercat) with records taken from WorldCat (the OCLC Online Union Catalog. As Jul relates: 7
the InterCat Catalog demonstrates the union of catalog searching
and its now common host of functions--keyword and phrase searching,
selected index searching, and Boolean operations with direct Web access
to the Internet resources. The basic level of search access that users
have come to expect for a common book had been extended to a collection
of Internet resources. Soon library system vendors began modifying
their products to take advantage of the linking capability of the 856
field and the Web OPAC was born.
A second OCLC Internet cataloging project began in 1994, with 231 participants from all types of libraries and 4,700 Internet resources. A survey in 1996 indicated that participants felt the project to be very successful and most planned to increase the rate at which they cataloged Internet resources.8 By 1998 when Jul's article was written, 5000 OCLC member libraries had cataloged more than 18,000 Internet resources--with holdings in more than 330,000 individual libraries. 9
In spite of the success of the InterCat Project, there are difficulties with cataloging web sites because they are different from anything we have cataloged before.
Catalogers are good at describing items they can touch and hold in their hands. They can see the object as a whole and see how parts of it relate to one another. This isn't so easy in an online environment, and the constant changing and growing environment of the Internet compounds the description problem. Hawkins has listed the differences and difficulties in cataloging electronic serials: 10
Banerjee has explained the cataloger's problem with describing a web site by by saying the user determines the organization of the site by how he or she uses it: 11
Print documents can be examined as an integrated whole
because the physical formatting of the work determines the
relationship between the document components (i.e., pages,
chapters, etc.). In the hypertextual environment which allows
users to read documents in the sequence best suited to their
needs, it is frequently difficult to identify what is being
cataloged. Relationships between files depend on how the
user reads the document and the formatting of digital documents
depends largely on user specified preferences.
We cannot hold them in our hands to see an ordered relationship between the pages, cannot know the number of pages, or measure the size of the item in centimeters as we can a book. And unlike a print publication, the online publication may change every day. Banerjee comments further on the instability of web sites: 12
The identifying characteristics tend to be volatile because
digital works are easy to modify. The title, content, location,
author, or other information associated with an electronic document
may change frequently. If catalogers attempt to add notes and
access points each time an item changes, the likely result will
be an explosion in the number of records in the database and/or
records which are excessive in length.
Not only do web sites change often, but they also move, and sometime disappear. Jul addresses the problems involved in cataloging "moving objects."13
Libraries add value to Internet resources by selecting,
cataloging, and integrating the results into local OPACS. But this
still leaves open the question of transience and impermanence. What
Internet user has not encountered "Error 404, File not found?"
This all-too frequent error message can signal, among other things,
the fact that a resource, once identified by a URL, has moved or
ceased to exist.
The use of the 856 field in our OPACs is a wonderful new way to link the local catalog and the internet, but those ever-changing URLs are difficult manage. Jul tells how this has affected OCLC's InterCat Project:14
URLs represent address-specific locations and encoding them
in bibliographic records that are meant to be distributed across
systems is problematic. Some would call it a cataloger maintenance
nightmare, and such a characteristic is not far off. Data collected
during the OCLC InteCat project revealed that, on the average, 3% of
URLs in the InterCat Catalog could not be accessed during any given
test. This relatively low but still troublesome percentage owes,
no doubt, to the types of resources that libraries select and catalog
and is almost certainly several times lower than the failure rate for
the Internet as a whole. An unknown subset of these links failed
because the URL had changed (the resource had moved.) (Other source
of failure include the remote system or the network, and determining
the exact cause of a URL's failure can be difficult. Not all failures
can be attributed to the URL.) It soon became apparent that encoding
URLs in bibliographic records brought both benefits and liability.
For this reason, OCLC's Office of Research developed PURLs, or persistant URLs. With this aliasing system, a web site's PURL is created and associated with the URL. The PURL does not change, though the URL can be changed as needed. Registered users create the PURL to establish the PURL/URL relationship. Jul explains the benefits of the PURL system:15
The advantage of this system is obvious: a PURL can occur any
number of times (e.g., I multiple copies of bibliographic records,
on Web pages, in an electronic paper, in citations or bibliographies,
in a bookmark list), yet if the URL for the referenced resource were
to change, that change would have to be made only one time using the
PURL server. Suddenly catalog maintenance (and link maintenance
generally) is reduced to a single update.
PURLs simplify the problem of missing URLs, but the difficulties catalogers have with describing web sites remain. As Banerjee has said "Identifyig where and how a work may be accessed by PURL and other constructs is not particularly difficult for the simple reason that mode of access of a resource will never be as amorphous as the description or 'aboutness'"16 What is being done about this?
Some say the problems are with AACR2 having not been written for electronic serials. Others say MARC won't work for web site cataloging (despite the fact that it is working in the OCLC project). Others say they do work well, but the complexity of them is time consuming. Jul has said:17
Bibliographic records are rich and highly refined examples of metadata that can include descriptive, subject, classification, authority, and access data, all of which are created according to established rules and formats. Not surprisingly such robust content is achieved only through consistent complexity, and it is this very complexity that requires a cataloger's training, skill, and experience. It is this same complexity, which is a natural response to an inherently complex world, that limits the number of bibliographic records that could be created at any time to the potential output of the cataloging workforce at large.
In another article Jul said, "Librarians may wish that the USMARC format were more flexible, able to express hierarchical relationships more easily, for example, or to be able to inherit information from related records. But as a standard communications format, USMARC is serviceable, if not perfect, and remains the most widely accepted format for machine-to-machine communication and exchange of bibliographic information." 18
Though describing a web site with current standards is complex and time-consuming; it can be made a bit easier if the cataloger can find the information needed at the web site being cataloged. To provide some standardization of this kind of information, in 1995, OCLC initiated what has become known as the Dublin Core Metadata Workshop Series (Dublin for Dublin, Ohio). The Dublin Core Element Description includes the following elements: 19
Metatags are constructed using the above elements, and these are embedded as headers in web documents. The cataloger can use the information contained in these for cataloging and search engines will pick up this information to more accurately describe the site. Cataloging software could possibly be developed which would translate Dublin Core into MARC. One attempt at this has produced MARCit,20 by Nichols Advance Technologies, which pulls from the title metatag for the 245 field and automatically puts the URL into the 856 field. Perhaps further development with software like this could yield a program that would save time and expense in cataloging, when metatags are correct and complete.
Another possible aid could come through encoding documents with SGML--Standard Generalized Markup Language. Pitti explains: 21
SGML provides a syntax and a meta-language for defining and expressing the logical structure of documents. "SGML, I believe as a general standard that allows us to structure text an to interrelate many different kinds of information, offers us an opportunity to make the Internet a coherent, standard based, information whole, and orderly information universe.
Wheras HTML is a formatting language, SGML notes the logical structure of a web document, and thus could be a great help to a cataloger in describing the document. SGML is a part of the Text Encoding Initiative (http://www.uic.edu/org/tei/ an international cooperative project to develop guidelines for the preparation and exchange of electronic texts for scholarly research."22
Giving Internet resources classification numbers is another topic of debate. Some think this would be confusing in the local library since the Internet resources are not located on the shelves of that library. However, in a survey by Richards of two listservs: AUTOCAT and INTERCAT in April 1998, most respondents favored adding call numbers for the following reasons: 23
Call numbers are the basis for several projects: OCLC's Scorpion project, using Dewey classsification24and CyberStacks(sm), 25a virtual library organized by Library of Congress Classification While classification numbers are not necessary in MARC records, they do address the possibility of an organizational scheme for libraries of the future, and thus warrant inclusion in our catalogs now as well as in Internet virtual libraries.
The woes we heard a few years ago, often as a response to hearing of libraries outsourcing cataloging, are not valid. There is plenty of work for librarians, and even catalogers, in the libraries of the future. The important thing to remember is that we need to be actively planning for the future now and participating in the exciting new things that are happening with the Internet and with our local OPACS, on and off the Web. We should heed the advice of Pitti.26
I believe that librarians, and in particular, catalogers, have
a professional obligation to assertively assert themselves in the
creation of this information universe. If librarians sit back and
wait to be asked, the disparate and all too shortsighted forces
developing the Internet will not think to ask them to participate
in the planning until it is too late.
The Internet has not brought the end of libraries, but rather has brought us new and exciting resources to share with our library users. The skills we bring with us as we embrace new technologies will help us to organize these new resources for better utilization by those who use our libraries. Our organizational skills can reach far beyond our local libraries as our web-based catalogs are used by people around the world. We can have an impact on the organization of the Internet as we share our skills and help to plan for the future. No longer are we just librarians or just catalogers; we are information specialists, and organizers of metadata--we are the information professionals
1 Arlene G. Taylor and Patrice Clemson. "Access to Networked Documents: Catalogs? Search Engines? Both?" OCLC Internet Cataloging Project Colloquium Position Paper. http://www.oclc.org/oclc/man/colloq/taylor.htm (6/27/98)
2 Nancy B. Olson, ed. Cataloging Internet Resources. 2nd ed. http://www.oclc.org/man/9256cat/cover.html (6/27/98)
3 Jim Holmes. "Cataloging E-Journals at the University of Texas at Austin: a Brief Overview." The Serials Librarian. v. 33, no. 1/2 (1998): 175.
4 Les Hawkins and Steve Shadle. "Cataloging Electronic Serials" The Serials Librarian. v. 34, no. 3/4 (1998):389.
5 Ingrid Hsieh-Yee. "Modifying Cataloging Practice and OCLC Infrastructure for Effective Organization of Internet Resources". OCLC Internet Cataloging Project Position Paper. http://www.oclc.org/oclc/man/colloq/hsieh.htm (6/27/98)
6 Erik Jul. "Cataloging Internet Resources: an Assessment and Prospectus." The Serials Librarian. v.34, no. 1/2 (1998): 92.
7 Ibid., 93.
8 Kyle Banerjee. "Describing Remote Electronic Documents in the Online Catalog: Current Issues." Cataloging & Classification Quarterly, v. 25, no. 1 (1997): 11.
9 Jul., 93.
10 Les Hawkins. "Serials Published on the World Wide Web: Cataloging Problems and Decisions." The Serials Librarian. v. 33, no. 1/2 (1998): 125.
11 Banerjee, 6.
12 Ibid., 7.
13 Jul, 95-96.
14 Ibid., 97.
15 Jul, 97.
16 Banerjee, 17.
17 Jul, 98.
18 Eric Jul, Eric Childress, and Eric Miller. "42: Don't Panic, It's a Common Disaster." Journal of Internet Cataloging. v. 1, no. 3. http://jic.libraries.psu.edu/jic1nr3-42.html (6/27/98)
19 Stuart Weibel. "Metadata: the Foundations of Resource Description." D-lib Magazine (July 1995) http://www.dlib.org/dlib/July95/07weibel.html (6/28/98)
20 Nichols Advanced Technologies. MARCit Inc. http://www.marcit.com (6/15/98)
21 Daniel V. Pitt. "Standard Generalized Markup Language and the Transformation of Cataloging." Serials Librarian. v. 25, no. 3/4 (1995): 243-253.
22 Edward Gaynor. "From MARC to Markup: SGML and Online Library Systems." ALCTS Newsletter. v. 7. No. 2 (1996) http://www.lib.virginia.edu/speccol/scdc/articles/alcts_brief.html (6/28/98)
23 Rob Richards. "Adding Classification Numbers to Bibliographic Records for Internet Resources: Summary of Listserv Responses and Annotated Bibliography." Internet Cataloging Issues & Resources. http://www.colorado.edu/Law/lawlib/ts/classnet.htm (6/27/98)
24 Keith Shafer. "Scorpion Helps Catalog the Web" OCLC Projects. http://orc.rsch.oclc.org:6109/b-asis.html (6/27/98)
25 Gerry McKiernan. "The Once and Future Library" Issues I Science and Technology Libraries. http://www.library.ucsb.edu/istl/96-fall/mckiernan.html (6/9/98)
26 Pitti, 253.
Anderson, Bill and Les Hawkins. "Development of CONSER Cataloging Policies for Remote Access Computer File Serials." The Public Access Computer Systems Review. v. 7, no. 1 (1996) http://info.lib.uh.edu/pr/v7/n1/ande7n1.html (6/27/98)
Banerjee, Kyle "Describing Remote Electronic Documents in the Online Catalog: Current Issues". Cataloging & Classification Quarterly. v. 25 no. 1 (1997): 5-20.
Beck, Melissa. "Remote Access Computer File Serials" CONSER Cataloging Manual, Module 31. http://lcweb.loc.gov/acq/conser/module31.html (7/3/98)
Burnett, Thomas C. and Linda K. TerHaar. "Can I Get it or Not?: A Public Services View of Cataloging Electronic Journals." The Serials Librarian, v. 34 no. 1/2, (1998): 177-185.
Butterfield, Kevin L. "Catalogers and the Creation of Metadata Systems: A Collaborative Vision at the University of Michigan" OCLC Internet Cataloging Project Colloquium Position Paper. http://www.oclc.org/oclc/man/colloq/butter.htm (6/27/98)
Caplan, Priscilla. "Cataloging Internet Resources." The Public Access Computer Systems Review. v. 4, no. 2 (1993): 61-66.
Dodd, David G. "Grass-Roots Cataloging and Classification: Food for Thought from World Wide Web Subject-Oriented Hierarchial Lists. Library Resources and Technical Services. v. 40, no. 3 (1996): 275-286.
"Dublin Core Metadata" OCLC's PURL.org http://purl.oclc.org/metadata/dublin_cor/ (7/3/98)
Duda, Andrea L., ed. Untangling the Web: Proceedings from the Conference Sponsored by the Librarians Association of the University of California, Santa Barbara and Friends of the UCSB Library. (Apr. 26, 1996) http://www.library.ucsb.edu/untangle/ (6/28/98)
Gaynor, Edward. "From MARC to Markup: SGML and Online Library Systems." ALCTS Newsletter. v. 7, no.2 (1996) http://www.lib.virginia.edu/speccol/scdc/articles/alcts_brief.html (6/28/98)
Hawkins, Les, and Steve Shadle. "Cataloging Electronic Serials." The Serials Librarian. v. 34, no. 3/4 (1998): 385-389.
Hawkins, Les. "Serials Published on the World Wide Web: Cataloging Problems and Decisions" The Serials Librarian. v. 33, no. 1/2 (1998): 123-145.
Holmes, Jim. "Cataloging E-Journals at the University of Texas at Austin: a Brief Overview." The Serials Librarian. v. 34, no. 1/2 (1998): 171-176.
Hsieh-Yee, Ingrid. "Modifying Cataloging Practice and OCLC Infrastructure for Effective Organization of Internet Resources". OCLC Internet Cataloging Project Position Paper. http://www.oclc.org/oclc/man/colloq/hsieh.htm (6/27/98)
Jul, Erik. "Cataloging Internet Resources: an Assessment and Prospectus." The Serials Librarian. v. 34, no. 1/2 (1998): 91-104.
Jul, Erik, Eric Childress, and Eric Miller. "42". Journal of Internet Cataloging. v. 1, no. 3. http://jic.libraries.psu.edu/jic1nr3-42.html (6/27/98)
Library of Congress, Network Development and MARC Standards Office. Guidelines for the Use of Field 856. Revised August 1997 http://www.loc.gov/marc/856guide.html (6/27/98)
McDonough, Jerome P. "SGML and the USMARC Standard: Applying Markup to Bibliographic Data". Technical Services Quarterly. v. 15, no. 3 (1998): 21-33.
McKiernan, Gerry. "The Once and Future Library." Issues in Science and Technology Librarianship. http://www.library.ucsb.edu/istl/96-fall/mckiernan.html (6/9/98)
Nichols Advanced Technologies MARCit Inc. http:/www.marcit.com (6/15/98).
Olson, Nancy B., ed. Cataloging Internet Resources: A Manual and Practical Guide. 2nd ed. Dublin, Ohio: OCLC, 1977. http://www.oclc.org/oclc/man/9256cat/toc.htm (6/27/98)
Pitti, Daniel V. "Standard Generalized Markup Language and the Transformation of Cataloging". The Serials Librarian. v. 25, no. 3/4 (1995): 243-253.
"Purl." Persistant URL Home Page http://purl.oclc.org (6/28/98)
Richards, Rob. "Adding Classification Numbers to Bibliographic Records for Internet Resources: Summary of Listsrv Responses and Annotated Bibliography" Internet Cataloging Issues & Resources. http://www.colorado.edu/Law/lawlib/ts/classnet.htm (6/27/98)
Schneider, Karen G. "Cataloging Internet Resources: Concerns and Caveats." American Libraries. (March 1997): 77.
"A Script to Build Meta tags using the Dublin Core Types" Dublin Core Meta Tag Builder. http://vancouver-webpages.com/Vwbot/mk-dublin.html
Sha, Vianne T., Timothy B. Patrick, and Thomas R. Kochtanek. "The Traditional Library and the National Information Infrastructure". OCLC Internet Cataloging Project Colloquium Position Paper. http://www.oclc.org/oclc/man/colloq/sha.htm (6/27/98)
Shadle, Steven C. "A Square Peg in a Round Hole: Applying AACR2 to Electronic Journals". The Serials Librarian. v. 33, no. 1/2. (1998): 147-166.
Shafer, Keith, Stuart Weibel, Erik Jule, Jon Fausey. "Introduction to Persistant Uniform Resource Locators" PURL Home Page. http://purl.oclc.org/OCLC/PURL/INET96 (6/28/98)
Shafer, Keith. "Scorpion helps Catalog the Web" OCLC Projects. http://orc.rsch.oclc.org:6109/b-asis.html (6/27/98)
Shieh, Jackie. "Does it Really Matter?: The Cataloging Format, the Sequential Order of Note Fields, and the Specifics of Field 856". OCLC Internet Cataloging Project Colloquium Field Report. http://www.oclc.org/oclc/man/colloq/shieh.htm (6/27/98)
Simpson, Pamela and Robert Seeds. "Electronic Journals in the Online Catalog: Selection and Bibliographic Control". Library Resources & Technical Services. v. 42, no. 2 (1998): 126-132.
Taylor, Arlene G. and Patrice Clemson. "Access to Networked Documents: Catalogs? Search Engines? Both?" OCLC Internet Cataloging Project Colloquium Position Paper. http://www.oclc.org/oclc/man/colloq/taylor.htm (6/27/98)
Weibel, Stuart. "Metadata: The Foundations of Resource Description." d-Lib Magazine (July 1995) http://www.dlib.org/dlib/July95/07weibel.html (6/28/98)
Weibel, Stuart. "PURLs: Persistent Uniform Resource Locators" PURL Home Page. http://purl.oclc.org/OCLC/PURL/SUMMARY (6/28/98)
Xu, Amanda. "Accessing Information on the Internet: Feasibility Study of USMARC Format and AACR2." OCLC Internet Cataloging Project Colloquium Field Report. http://www.oclc.org/oclc/man/colloq/xu.htm (6/27/98)
Younger, Jennifer A. "Resources Description in the Digital Age". Library Trends. v. 45, no. 3 (1997): 462-487.
Copyright © 1998 Brenda Parris Sibley