CATRIONA, SERIALS AND THE INTERNET

The propties of electronic serials and other resources published on t k World Wide Web are described with respect to providing and maintaining acc;ess to them by library users. The relevance of t k CATRIONA projecf as a methodology for integrated search and retrieval of all bibliographic materials, whether printed or ekctronic, is discussed.

The initial phase of the CATRIONA (Cataloguing and Retrieval of Information over Networks Applications) Project was carried out at the University of Strathclyde in the second half of 1994 with funding from the British Library Research and Development Department. A report on the project has been published by the British Library as Library and information research report, 1051 This first phase investigated the feasibility of using existing and emerging standards for the bibliographic control of electronic information resources stored and distributed on Wide Area Networks (WANs): I... the idea of a distributed catalogue of Internet resources integrated with standard 239.50 library system OPAC interfaces (and hence, with retrieval of information on hard copy resources) is already a practical proposition at its most basic level ...' (ibid.) In particular, it was shown that it was possible to search the online catalogue of Brigham Young University (BYU) in Utah from a personal computer (PC) in Glasgow, and retrieve a bibliographic record for a particular document which happened to be an electronic text. After the record was displayed on the PC screen, a single click of the mouse button was all that was required to bring up the document itself. This was achieved without any collusion between BMJ and Strathclyde University Library: BW did not create the bibliographic record specifically for CATRIONA; the software used on the PC was ' off the shelf and not developed specifically for the project, and there was no duplication of either the record or the electronic document to avoid networking problems. Most importantly, the project used standards which are already established or under development, and did not indicate any need for a completely novel approach. How this is achieved, and the ways in which information is being created and stored on the Internet, may be of great significance for serials librarians.
There are five major components and processes involved in the CATRIONA approach: the electronic information object (EIO); the Universal Resource Locator (URL); standard methods of cataloguing; 239.50, and the Internet.

The Internet and electronic information objects
The cataloguing of serial titles is only the start of the process of providing access to their informational content over 90 per cent of the distinct intellectual works in a typical library collection are contained within serial issues, requiring a huge effort in analytical cataloguing to integrate them with other materials, or the provision of expensive, non-integrated abstracting and indexing services with a coverage far beyond what is normally immediately available for local consultation. Yet libraries are coming under more and more pressure to provide integrated access to all of their information resources. Costs are rising at a much higher rate than background inflation and readers must be encouraged to make greater use of what is available.
Library users, who have increasing expectations of automated systems, find it difficult to understand why they have to log-off from a CD-ROM in order to check the catalogue, rather than open multiple windows on the same screen, or to take hand-written notes from printed sources, whether catalogue card, serials list or bound index, when it is so much more convenient to cut-and-paste and save to floppy disk. Sophisticated users might wonder why it is not possible to make a single search of the l i b r q s catalogue for all locally-available materials connected with The Economist, whether individual issues of the journal, guides written by the editorial team, or monographs about the history of the publication. Naive users might not realise that they have to look up the printed list of serial holdings, rather than the automated catalogue, or that only a small percentage of the titles covered by the CD-ROM abstracting service can be consulted without incurring inter-library loan charges.
Until recently, librarians have only had to deal with local' information resources, either physically stored on the library's shelves, or delivered over the Local Area Network (LAN). These resources share the common property of self-integrity: the book is regarded as a 'whole' work; the CD-ROM database has welldefined parameters in terms of coverage, internal structure, and mode of use. This applies also to remotely-stored resources, such as dial-up indexing and abstracting services. Standard rules for cataloguing, such as AACR, have been developed to incorporate new formats, and machine-readable record structures, such as MARC, have been flexible enough to store new fields. The feasibility of providing a properly integrated search and retrieval service for local resources need not be questioned; it is the relatively sudden emergence of the Internet as a general information storage and distribution system which stimulated the setting-up of the CATRIONA project.
The spread of the Internet, with low-cost, planet-wide data communications acoessible to millions, is becoming a siauficant factor in highlighting the problems of creating 'one-stop information shops'. At the same time, however, the Internet provides a mechanism for resolving them, since it is not just a very large communications network. It links computers together and computers are excellent devices for storing and manipulating electronic information. In addition, the World Wide Web (WWW) has emerged as an important means of 'publishing' information on the Internet. WWW is a standardised system which allows certain types of electronic files to be accessed from personal computers anywhere on the Internet. It is a relatively simple and easy task to create these files. Indeed, many individuals and organisations are doing so in order to supply publiclyaccessible information about themselves. Such files are formatted for display on screen using Hypertext Mark-up Language (HTML), which also allows sections of text or graphics to be highlighted and linked to other HTML documents. Multiple links on be made: a click of the mouse-button on the highlight results in the linked document being displayed, and a click on a highlight in the new document displays another, etc.
A single electronic file, a journal article, for example, can be linked to many different hyperdocuments. The full text of the article might be available via an 'issue' of the electronic serial, a subject based indexing system, the authofs personal bibliography, and the user's personal file of citations. 'Active' hyperdocuments exist: the author or publisher (often the same) of a paper may allow readers to send comments and criticisms which are then linked to the original. Other sources which subsequently cite the original may then be linked, followed by further comments and criticisms, and so on.
In the Internet publishing world there is no stage in the process of acquisition which gives effective control over the resource to the librarian. The librarian cannot influence where a resource is located, when it can be used, or who can have access to it. As ever, it is the librarian's task to facilitate access for those who need it. If the information technology is available to users, this task essentially becomes that of cataloguing and indexing.
The absence of a physical manifestation removes the requirement for physical handling of parts, documents, and copies which are labourintensive, but networked electronic information creates some new problems. Access may require licences and passwords. Computers where the information is stored may need special log-on procedures. Publications containing graphics or sound may require special software or hardware to be used effectively, and users need to be alerted to this before attempting to access them.
It can be useful to regard an electronic file which contains information, together with the means of accessing it and manipulating it, as an 'object'. In computer programming terms, an object contains not only data, but also the software instmctions for processing it. Therefore, CATRIONA has evolved the concept of an electronic information object (EIO). An EIO includes the content of an electronic resource, the software required to display and manipulate it, information needed to identify and retrieve it, and access parameters such as log-on script and password.

Universal Resource Locatorlcataloguing standards
Another major component of the CATRIONA approach is the Universal Resource Locator (URL). The idea of the URL had already been developed to identify, firstly, the location' of the electronic document, and, secondly, the links to the parts of which it is composed, so that it can be assembled on screen for the user. The URL is an electronic address, made up of the Internet address of the computer with the directory and filename of the resource within that computer.
There are examples of URLs in the references given at the end of this paper. It is the URL that is used by HTML to link a highlight in one document to another, separate document. The UIU is used specifically by CATRIONA to link a particular EIO to a standard catalogue record used to search for and identify the resource.
The standard record structure used by CATRIONA is MARC; the Library of Congress has approved the addition of a new field to the USMARC standard, tagged 856, to contain the URL. Subfields have been designated for log-on scripts, passwords, and other parameters that may be required to connect to the resource. Newly-developed library management systems based on client-server technologies allow a personal computer, or client, to search an automated catalogue on a database server, download a retrieved record, and use the record to connect to another, remote server where the catalogued resource itself is stored, which in turn is downloaded to the client These systems thus allow the advantages of hyper-linking to be incorporated into tried and tested methods for efficient information retrieval.
There are several methods which could be used for retrieval, in addition to the structured catalogue favoured by CATRIONA. One approach to providing access to electronic resource titles, subjects, and associated persons and organisations is to gst the computer to do it automatically, by keyword indexing and searching. Many WWW services provide access to EIOs by listing their 'titles' and hyper-linking to the resource itself via its URL, and a number of initiatives using this approach have been funded under the eLib programme, e.g. the Edinburgh Engineering Virtual Library (ENLY Precision in retrieval can be enhanced through the use of international standards for bibliographic structures and the mediation of a trained information manager or organiser. This was true when books were chained to the shelves, and remains true today. Standards for determining the titles of resources, categories, classifications or subject indicators are not being widely used and cross-searching of different services may not be very efficient or precise. Most importantly, it is very difficult to integrate this type of approach with access to nonelectronic resources, mediated in most libraries by the bibliographic catalogue. A single search cannot determine which resources a library holds and which it has access to via the Internet, not only because systems are technologically incompatible, but also because the bibliographic descriptors used do not follow a common format or content standard.
True integration, allowing all relevant resources to be identified in a single search by the end-user without the need for the assistance of a librarian, can only be achieved if the standards and methods of cataloguing which have been developed over many years are applied to the new electronic resources, as they have been to printed materials. This implies the use of MARC and AACR It is just not feasible to retroconvert catalogue records to a new standard; nor can we wait until all resources become EIOs. This approach is being investigated by OCLC and other centralised cataloguing agencies.
Automated catalogue records have always been seen as detached from the resources they describe, although in principle there is a link via the shelfmark included in individual holdings records. If periodicals were shelved, unbound, in an automated warehouse, it would be possible to use the shelfmark to instruct a robot to fetch the issue and deliver it to the enquirer. With an EIO, the 'robot' is pure software, and the resource can be delivered to the user via the same technologies that have been used to identify it in a search.
The CATRIONA feasibility study identified a number of general problems associated with cataloguing EIOs. Duplicate copies of an electronic resource can be easily made, and often are, to avoid bottlenecks in the Internet. The response time of the Internet slows considerably when North America goes to work (from about lpm to 9pm GMT), and some documents are duplicated on 'mirror' sites. For example, UK users may be able to connect to a computer in Edinburgh rather than Chicago to access a particular resource and avoid delays caused by having to compete with U.S. users. The URLs will be different, not least because different computers are involved, and careful coordination is required to ensure that both copies remain identical if the original is being updated. There is a problem, then, in determining which copy or version of an electronic resource is the most u p todate.
Another problem is the stability of the URL.
The publisher or information provider might have to change the location of the file because of a change in local infrastructure or policies, thus invalidating the existing URL. This is the equivalent of changing the shelfmark of an item in a very, very large library, and if the 'catalogue' does not reflect the change, the item is effectively lost. Older, archival information might be deleted at source without mnsultation or warning. While the Internet itself is designed to operate even if large parts of the network are disconnected, this does not apply to individual computers connected to it; a failure or other i n t e m p o n to the hard disk where the file is stored, or the computefs connection to the local area network, or that network's connection to the Internet, will result in the resource becoming unavailable.
CATRIONA proposes that a potential solution to URL stability is the use of Universal Resource Names (URNS) which differ from URLs in that volatile elements such as computer address and file name are not used. The idea is that the URN will remain a unique identifier for an EIO; the actual location' will still be the URL, but there would be some kind of procedure to link the URN and URL together, and update the URL if it changes. Another, similar, approach is cumntly being tested by OCLC, using Permanent URLs (PURLS). The catalogue record would store the URN, and thus not require updating.
It would seem to make sense for the ultimate responsibility for the accuracy of this catalogue record to lie with the creator of the resource being catalogued. The backlog of uncatalogued items is enormous and growing at a prodigious rate. It can be very difficult to determine the 'proper' title of an EIO, or who the creator is, or even what the subject is. EIOs do not have ISBNs or ISSNs; and the URL is not a suitable alternative, because it changes if the file location changes, and the same URL cannot be used for copies of an EIO stored on a different computer. A centralised cataloguing agency has no control over the location of an EIO, and it is unlikely that EIO owners will meticulously inform the agency when the URL is changed. It will be hard to determine what level of analytic cataloguing is necessary, or indeed what analytic means when applied to a hyperdocument. It is the information provider who will first know if a URL has changed, and which is the latest version, and what the subject is.
Cataloguing at source is not a new idea. Librarians have been extolling the virtues of 'cataloguing-in-publication' KIP) for many years. Although CIP never really caught on, it might be argued that the principal reason for this was that the bibliographic information printed on the title page verso could not be accessed remotely until it was incorporated in a machine-readable record. Improved retrieval of the item, and resulting increased sales, had to wait until the CIP data could be searched in bibliographic databases, and librarians would eventually catalogue the item anyway, printed CIP or no. CIP could not be used for advertising the item in the period immediately after publication when currency is a powerful aid to sales.
The situation is entirely different on the Internet, however. The Internet equivalent of a CIP record would be searchable as soon as it was created, and retrievability would be enhanced, if international standards on structure and content were used. Furthermore, the Internet is not, currently, a 'broadcast' medium such as television or billboard posters and there is vehement dislike of unsolicited mail messages: 'Advertisers are looking for ways of reaching the largest market realisable, without incumng the wrath of MUD wizards and electronic warriors who defend the liberty and access of the citizens of C~beria.'~ Advertising a particular resource in competition with similar products is very difficult within the Internet, and publishers have to continue to use established channels such as print and post. Using the Internet alone, there are only two means of drawing attention to a new resource: by reputation via e-mail lists and bulletin boards, and by retrieval via bibliographic searches. This is a powerful incentive for information providers to create standard catalogue records and this is a skilled, professional task. "My advice to major Web contributors (and to creators of Web authoring tools) is to hire a library scientist."' However, the ideal of sufficiently structured 'metadata' for search and retrieval purposes being incorporated in the EIO at source is likely to take some time to evolve. Meanwhile, CATRIONA envisages a number of 'union' catalogues for Internet resources, created, perhaps, on a national or regional basis, and cooperating to avoid duplication and resolve the problems of version control.
If machine-readable catalogue records are not created centrally, and not stored in a single catalogue, how can they be comprehensively searched? Another recent development provides a solution, and another major component of the CATRIONA model. 239.50 is an international standard which allows bibliographic searches to be formulated within a local library management system using syntax and procedures familiar to its users. The search is then translated into a common format and transmitted to remote catalogues, where it is reformatted into the syntax required to search those catalogues. Search results are returned to the client using the same protocols. 239.50 thus allows systemindependent searches which can be camed out on a number of different databases. Enhancements are required to 239.50, however, to allow, in particular, simultaneous searching of catalogues, user-friendly collation of results and their subsequent display.
239.50, WWW, URLs, MARC and AACR provide the necessary and sufficient building blocks to build a high-precision, user-friendly method for bibliographic retrieval of locallystored library materials integrated with globallydispersed, remotely-accessible electronic information resources. The CATRIONA feasibility model uses these components in the following way.
The enquirer uses a single 'client' workstation for information retrieval. A search is formulated using local terminology and indexing strategies; that is, in a way familiar to the enquirer. Enhanced 239.50 'middleware' allows the enquirer to select which catalogues and databases to search; these will usually include the local catalogue, and a standard set of CATRIONAspecific Internet union catalogues. The same software cames out the searches simultaneously on the selected catalogues, matches and sorts the results, and delivers a list of hits to the enquirefs workstation. The enquirer selects a hit and more detailed bibliographic information is displayed. If the record has an embedded URL, the workstation highlights it or displays an active

Implications for serials librarianship
'Serial: A publication in any medium issued in successive parts bearing numeric or chronological designations and intended to be continued indefinitely. Serials include periodicals; newspapers; annuals (reports, yearbooks, etc.); the journals, memoirs, proceedings, transactions, etc., of societies; and numbered monographic serie~.'~ This definition might be readily applied generally to many electronic information objeds, if the 'numeric or chronological designations' part is ignored. Of course, some printed serials do not carry any such designation, and the electronic equivalents of traditional serials may eventually lose these distinctions as well. Why wait for the next 'issue' to publish an article if it is ready now? Is there anything fundamentally wrong in publishing the same paper in several different electronic serials? Surely an active hyperdocument emanating from two or more persons with a common interest fits the definition of memoirs and transactions of a society: can a society consist of just one person? Is the concept of a personalised serial valid? It is clear that the distinctions between serials and non-serials are becoming blurred in the electronic environment, and this will have a major impact on serials librarianship, at least in the longer term.
There are no individual parts to be received for an electronic serial, and, therefore, no claiming, no missing issues, no gaps in the back-run, no need for binding. Material need no longer be 'for reference only' or restricted to certain groups of readers to reduce the ad hoc demand for access. Circulation of parts is achieved by providing a direct link from the reader's personal computer to the URL, or copying to floppy disk or local area network if the source is not linked to a wide area network or the Internet. Mounting current issues on display shelves or circulating photocopied contents pages for current awareness is replaced by training users to connect to the URL on a regular basis. Selective dissemination of information is carried out by an intelligent 'agent' identifying keywords and popping-up a customised 'things you should look at real soon n o d window when the reader switches on the desktop PC. The agent might be an expert system, or even a librarian. It is not difficult to foresee a time when a librarian maintains and regularly updates a set of electronic indexes, with URLs pointing to separate electronic articles, on behalf of a number of library users. Each index then essentially becomes the contents page of a personalised electronic serial.
What does it all mean for serials librarians? CATRIONA, or any other system which achieves the same aims, will take a long time to be developed. Only a few libraries are using the very latest systems which will allow catalogue records to be linked to EIOs, and therefore allow integrated searching of local and remote resources. There is nothing to prevent libraries with older automated systems cataloguing electronic materials in the same way as printed resources, but special arrangements have to be made for holdings or copy-level information as there is no physical manifestation of the item. Information about access, including the URL and software required to view the EIO, can be placed in the USMARC tag, or a local note, for display to the enquirer. Once the librarian has access to a computer permanently connected to the Internet, perhaps the institution's campus-wide or community information system (CWIS/CIS), the creation of lists of URLs of serials titles, or lists created by others, is simple to achieve, but much more difficult to maintain for the reasons already given. There is much duplication of effort going on, as there is no central authority or framework to coordinate the creation of such lists; libraries who can are doing it for themselves at a local level. Some libraries, like my own, are creating standard records in the main catalogue and a corresponding entry in the CWIS for each EIO selected by subject specialists, with a note in the catalogue alerting the reader to the entry in the CW w7.
Another development marries WWW techniques with MARC cataloguing to form a 'web PAC'. An example involving a serial can be seen on the PAC of Hull University Library by searching for the title 'Renaissance forum'! These initiatives allow integrated retrieval of records for all relevant resources during a non-specific-item search in the catalogue, and also direct access to the resource if it is a WWW EIO. In Napier's case, only separate, unlinked systems are currently available; if the EIO is found in the catalogue, the user has to note the entry details and go to the CWIS to retrieve the item.
It is likely that the number of electronic serials will grow rapidly, principally because it is much cheaper to publish digitally than in print. Serials librarians may very well find that time saved by not having to deal with physical parts will be spent on maintaining lists of URLs. In some institutions, there will be a growing demand from researchers for electronic equivalents of currentawareness and SDI services; the personalised lists described earlier. Even if a CATRIONA-like system is developed, allowing the end-user to carry out ad hoc or repeat searches for themselves, the need for mediation by the serials librarian will not entirely disappear; too much choice is as much a problem for people as too little, and many library users will want some kind of selectivity to be exercised before they initiate the search. The role of the serials librarian will become very similar to the role of the information scientist; there is nothing new under the electronic sun.
The CATRIONA report indicated the need to develop these ideas further, by creating a larger 'demonstrator' system which would allow more detailed study of procedures and technologies with the involvement of a bigger number of institutions and library personnel. Funding is still being sought for this; unfortunately, a bid for eLib money has been rejected. Another bid for eLib funding is currently in preparation, however, to explore strategies and mechanisms for making local EIOs available on the Internet. There are many useful electronic resources, created as part of research and training programmes, locked' within individual higher education establishments because they are stored on stand-alone computers, lack URLs, require unusual interfaces, or merely haven't been advertised on the Internet. As well as showing the feasibility of the CATRIONA approach, the project also gathered together a large amount of information about related initiatives and developments which is included in the printed report. Needless to say, this can be accessed over the Internet, including links to WWW documents and other EIOs of interest9. A paper outlining the relevance of CATRIONA to cataloguers has also been publishedlO.

BLACKWELL'S ANNUAL PERIODICAL PRICES FOR 1996
previously published in the LA