DLI Social Science Team Home Page

Index

Diary

Internal Reports

Completed Papers

Papers in Progress

Conference Presentations

Site Visit and Quarterly Reports

Main DLI Page

Web Client- DeLIver



send comments or questions to: l-neuma1@uiuc.edu

Building a University Digital Library: The Need for a User-Centered Approach


Ann Peterson Bishop, Susan Leigh Star, Laura Neumann, Emily Ignacio, Robert J. Sandusky, Bruce Schatz

University of Illinois at Urbana-Champaign


Introduction

This paper provides a status report on the development of a digital library testbed of engineering journals at the University of Illinois. The project began in Fall 1994, one of six projects funded by the NSF/ARPA/NASA Digital Library Initiative (DLI). The DLI projects represent an important research component in efforts to develop underlying technologies for a global information infrastructure. The testbed at the University of Illinois is comprised of material from commercial and professional society publishers. It represents a production library that will allow effective search and display of SGML documents and will be propagated to the Internet for use by faculty and students at the Big Ten universities. (For more information about the Illinois DLI project, send email to thabing@uiuc.edu or visit the project homepage at http://www.grainger.uiuc.edu/dli.

The focus of this paper is the progress and plans of the user evaluation component of the project. The evaluation team's sociological studies aim at developing methods and models for understanding the nature of particular information interactions, individual work and learning processes, institutional and professional dynamics, and community phenomena in the distributed, digital realm. Implications and issues for academic institutions involved in introducing digital libraries to their constituencies are identified and discussed.

Progress in Building a Digital Library

The ultimate goal of our DLI project is to evolve the Internet into the "Interspace" by bringing the ability to search and display structured documents to the Internet. That is, we are critically concerned with both the functionality needed to interact with structured documents and the infrastructure required to scale such functionality up to a global community. We are building a prototype digital library and studying its use, but we are also interested in learning how digital library use fits into the broader context of information work. Reaching this goal entails constructing a testbed that represents a production digital library, one that will move to a model of federated, distributed repositories and operates within the real constraints faced by publishers, university libraries, users, and the software industry.

Our testbed is primarily intended for use by the academic engineering community. The testbed collection centers around SGML science and technology journals and magazines from a number of commercial and professional publishing partners. At the present time, the testbed contains about 120 periodical articles. We are currently engaged in processing the following titles: AIAA Journal (American Institute of Aeronautics and Astronautics); Applied Physics Letters (American Institute of Physics); Physical Review Letters (American Physical Society); Journal of Transportation (American Society of Civil Engineers); IEEE Transactions (IEEE); and Computer, Computational Science and Engineering, and Software (IEEE Computer Society). We are learning a great deal about what it takes to build and make available a heterogeneous electronic library collection. Our publishing partners have widely differing degrees of experience with publishing in digital formats and none is currently completely prepared for SGML production and networked access for their material. Thus, document processing involves intensive collaboration, training, and negotiation. Each prefers a slightly different document type definition (DTD). Document indexing is a difficult and extremely labor-intensive process, and document display via SGML viewers is still in its infancy.

In a workshop held with our publishing partners, we learned that, in addition to developing expertise with the technological aspects of digital journal production, all our publishers are naturally concerned with such issues as the accuracy and appearance of the resultant digital product, developing models and mechanisms for commercial transactions related to journal use, maintaining the security of the testbed so that only authorized use occurs, learning about the nature of their constituents' use of electronic journals, and planning for operations after the DLI project is over.

We are also beginning to learn what is really involved in the creation of a production digital library facility within an academic library. A number of scenarios regarding the future role of the library in creating and maintaining digital journal repositories exist. It is conceivable that library personnel will pursue their traditional management, technical, and public service functions in this new realm by selecting material, negotiating with publishers, indexing documents, developing search and display mechanisms, designing the library system interface, and supporting people in their use of the collection. As our testbed grows to include thousands of documents and tens of thousands of users from across the University of Illinois and other Big Ten universities, Grainger library staff will be among the first to gain "real world" experience with the creation and support of a large-scale, distributed digital library.

Our long term goal for developing the Interspace depends upon a system model based on federated, distributed repositories and the transition to community-based interactions. We presume that, eventually, publishers, or even individual authors, will maintain repositories for their own collections. The "library" (regardless of its eventual technological and organizational shape) will provide the integration functions of searching and displaying across the multiple heterogeneous sources. There is the potential for journals to both expand and fracture to produce malleable objects suited to the needs of individuals or of a community of users with shared needs, work, or goals. These objects could be created from retrieving and manipulating units that are smaller than (e.g., individual tables, equations, or sentences) or external to (e.g., datasets underlying a table, complete texts for cited references, readers' annotations) the unit represented by the typical printed journal article. The implications of such major infrastructural changes for the academic community deserve special attention.

In pursuing our vision of the Interspace, we are currently working on a number of software development projects, with a number of other partners. Our software development work explores and tests a number of different approaches to building digital library infrastructure; we don't know yet how successful our efforts will be. The testbed team is currently working to integrate OpenText search software with Panorama, the SGML viewer from SoftQuad, and to improve the display of SGML documents. Software is also being developed to translate the various publisher DTDs into a standard canonical form, to fully index SGML documents and put the data into an SQL database structure, and to develop user interface modules for fulltext retrieval.

In addition to the testbed prototype being constructed in the Grainger Library, DLI team members associated with Mosaic development at the National Center for Supercomputing Applications are working on software infrastructure to support digital library use on the Internet. On the client end, they have built the CCI (Common Client Interface) package, which will enable users to follow links to other documents from an article displayed with an SGML viewer. On the server end, they are developing stateful gateways (CGI -- Common Gateway Interface) for SQL. These gateways will not only link across the network to the search engines, but also record the history of the search session so that the results of previous searches can be refined. We will subsequently be developing a Z39.50 gateway as well.

The stateful gateway is our immediate approach to the problem of providing translation to and from multiple heterogeneous distributed repositories. As part of our DLI project, computer scientists affiliated with NCSA are also designing new server architectures to handle the demands for next generation Web services. One important aspect of our distributed repository model is the use of unique invariant document names (URNs rather than URLs). Another is incorporating "metadata"--a standard set of descriptive and subject-oriented elements--to aid in retrieval. (For a description of the use of metadata that we, along with others, are exploring, see the report Metadata Workshop: The Essential Elements of Network Object Description. URL: http://www.oclc.org:5046/conferences/metadata/metadata.html.) In addition, we plan to collaborate with the Corporation for National Research Initiatives by serving as a testbed for their copyright registration system, and with researchers from Carnegie Mellon University by serving as a testbed for NetBill, software which enables commercial transactions to be conducted over the Internet.

To make distributed repositories work in practice, it is necessary to move toward semantic solutions for information retrieval. Since text search is still word matching at base, scalable approaches to deeper semantics involve interactive programs for suggesting alternative search terms. To help users match their search concepts to controlled vocabulary terms, traditional A&I services use professional indexers to generate a domain thesaurus and assign several thesaurus terms to each document in a collection. We are examining the feasibility of having the user interact directly with a thesaurus, both the manually-constructed thesaurus and some new sophisticated automatic classification schemes. The INSPEC thesaurus serves as the basis for a graphically-oriented, browsable online thesaurus developed as part of DLI effort (Johnson & Cochrane, 1995). In addition, a prototype version of an automatic index system--the creation of a matrix or "concept space" based on term co-occurrence in journal articles--has been developed by Hsinchun Chen. This concept space was generated from 400,000 abstracts from the last 3 years of INSPEC coverage. It can be used to suggest alternative terms from a given term; note that the suggestions are words that occur together but not necessarily synonyms. Thus, the automatic classification is based on "context," whereas the manual classification is based on "meaning." An example of Chen's concept space approach--generated from Computer Science Technical Reports--is available on the Web (URL: http://ai.bpa.arizona.edu/cgi-bin/csquest). We will continue this research by expanding semantic mapping across engineering domains and developing an architecture that supports analysis of net objects beyond that currently envisioned for repositories on the Internet.

User-Based Research and the DLI Project: Overview

A fundamental component of our DLI project is sociological research and evaluation related to digital library use. The mandate of our testbed evaluation team is to:

  • Provide ongoing user feedback for developing retrieval mechanisms, charging schemes, and other DL system features and functions

  • Document and analyze extent and nature of testbed use, satisfaction, and impacts

  • Identify reasons for use and non-use of the testbed, to gain a more complete understanding of testbed successes and failures

  • Contribute to theoretical understanding of the changing information infrastructure and how it is transforming engineering and library work, communication, and learning practices

  • Develop and assess new methods for conducting user-based digital library research, i.e., for capturing user data in the distributed repository environment.

To pursue these goals, we have developed an integrated research program that combines broad study of use with deep study of social phenomena. Over the course of the DLI project, we will conduct ongoing observations of engineering work and learning activities and how they intersect with the use of distributed, digital information. Individual and group interviews will be conducted with a range of potential and actual testbed users from the engineering community. We will conduct usability tests of various components and versions of the DL prototype and experiment with economic models and charging mechanisms. Extensive data on use will be gathered through large-scale user surveys and system instrumentation (i.e., the creation of testbed transaction logs).

Conceptual Challenges in Studying Digital Library Use

Our two major conceptual challenges in this project are both theoretical and methodological: triangulation and scalability. By triangulation, we mean taking multiple views of the complex phenomenon of building infrastructure, and using multiple methods to do so. We also mean that the project itself involves a kind of triangulation of different concerns in order to make a workable system, including those of designers, librarians, users, and publishers. By scalability, we mean both the scaling-up process of the project itself and the corresponding methodological challenge to us in understanding use. We originally coined the (somewhat unpronounceable) term "nethnography" (net + ethnography) to describe this challenge: part of what we would like to do is to preserve the rich details of ethnographic-type fieldwork and observations by using the electronic medium to "follow the actors." We would thus like to understand usage (both actual and potential) from a number of viewpoints, and in an increasingly wide scale.

Triangulation is not simply a matter of adding views or data points together. Data deriving from different methods, or design decisions taken from different viewpoints, or usage by those with different needs, always means a negotiated process. Even where there is substantial agreement between parties, triangulation requires taking account of different timelines, constraints, and audiences faced by different groups, or the divergent ways data can be aggregated (Star, 1986). Rather than try to resolve these difficult questions by fiat or formula, we have begun to conceptualize our own work in terms of different viewpoints, each with its own epistemological focus. These are the viewpoints of users' work, our analysis framework, and our study methods, with a fourth "meta view" as the process of integrating and triangulating these views. Each of these views also scales, from focusing on digital library use at the individual level (encompassing more cognitive tasks), to wider net- or Web-wide phenomena.

Individual focus. At the individual work level, tasks such as actual browsing of information and information retrieval are important. Methodologically, the major tools here are observations, usability testing and recording individual use sessions through continuous screen captures of prototype systems and transaction logs. Analytically here, the important questions concern those cognitive/conceptual changes for users in moving to digital form, including such anticipated changes as different metaphors and images (e.g., "navigating" in cyberspace vs. "wandering" the stacks). Identifying specific problems faced by individual users will contribute to improving the design of the digital library itself.

Workflow focus. Moving from the individual in front of the screen, the next space we encounter is that of the flow of work, where the digital library is embedded within a workspace. The terminal on the desktop becomes the virtual library, and the questions change correspondingly. Capturing this sort of data means using surveys, interviews, and on-site observations of individuals and teams, as well as in-library observations for some tasks. Analytically, the questions move here to understanding the links between an individual's workspace and workflow and the features of the system. Do the "collections" fit the needs of the tasks at hand? Does the system itself fit easily into the needs and practices of the individual or the team? One example of this sort of question concerns how an individual's "ethno-classification" and personal library and filing system is affected by the digital library. We all have ways of organizing our own information, from the arrangement of items on our desktop and in our filing cabinet to the arrangement of bibliography programs and files on our desktop computer. Will there be important matches and/or mismatches between these and the way the digital library is organized? Will our thesaurus enhance the users' own classification scheme and natural wording of queries? How will people form personal collections from digital library material for subsequent manipulation and sharing within the context of their work?

Institutional and occupational focus. In addition to workspaces and work groups, our project has interesting institutional implications, including occupations and the institutions of the extant physical libraries affected. Within the resources we have, collecting information at this level means making links with the (growing) secondary literature on the sociology of engineering and computer science, as well as the nature of library work and publishing, and synthesizing those findings and testing them back against the interviews, surveys and focus groups we conduct with representatives from these different communities. For example, we recently found Louis Bucciarelli's Designing Engineers (1994) very valuable for its concept of "object world," describing how all engineers simultaneously juggle constraining physical and design demands and optimization challenges with organizational resources and politics. We have begun to adapt his model to our own system development process, examining the different object worlds of various groups of users, and of designers. Analytically, the challenge for this level of focus is to understand the changing distribution of skills posed by the virtual library environment and to understand the nature of organizational transformations. What will happen to the ways librarians, engineers, and publishers currently organize their work and operations at the institutional and professional levels? How is the journal as a mode of scientific and technical communication transforming knowledge communities, and vice versa, given the changing information infrastructure? Another important analytic question here is: what of the sociology of standards? At the institutional and occupational level, many standards are embedded within the digital library projects, and these interact with scores more, from protocols for information transmission to the standards used by engineers and computer scientists themselves.

Web-wide and virtual community focus. At the widest level of scale with which we work, the digital library interfaces with the World Wide Web, and involves a large number of people (whose ties to each other and to the information in the system may be looser than those influenced by shared proximity, tasks, goals, or institutions) in complex cognitive tasks such as information retrieval. How will we understand this? We have begun, methodologically, by developing mechanisms for collecting transaction logs of actual usage through an instrumented version of Mosaic, and here the methodological challenge is transforming raw usage data into meaningful patterns. We are also faced with the choice about how, and how far, to follow the users (given confidentiality constraints, which are considerable) over the net. We have proposed for this aspect to establish bulletin boards and dialogues with remote users, and again, rely on triangulation of data from the other foci to flesh out the picture of web usage. Scaling up the collection of logs we are taking without being inundated with data is potentially a major difficulty. In addition to its use in studying the nature and extent of individual Web behavior, Web-wide transaction data could also be used to understand the document landscape of the NII. By analyzing the matrix of links between documents and users in a kind of cyberspace mapping, we could, for example, develop measures of association between documents, based on their use, that could be employed as a means of suggesting relevant documents to users. This approach to information retrieval provides, from the user's point of view, an alternative to indexing based on subject and descriptive elements associated with a document; in addition it provides an alternative to profiling users in the attempt to identify documents that meet their needs. Analytically, here, there are many questions in the realm of what we call "sociology of infrastructure," that is, what is the nature of large-scale changes in work and cognition afforded as the entire information infrastructure begins to change?

Meta-foci. Each of the above foci can be thought of as concentric, and layered rings of problems, each interacting with the other, but with somewhat distinct mandates and methods challenges. Stepping back from these layers, there are also a number of meta problems to be considered. First, the question not only of what is involved in each of the foci, but how they interact and interweave together is a formal problem in social science terms. Second, the characterization of the whole shift has reflections for each focus, from individual to global electronic networks. What sorts of language and images do we ourselves use to capture this complex imbrication? We have found the economic notion of "transition regime" to be useful here, the idea that large-scale transformation as well as large systems development, which go on for years, have their own regime of values and routines, including the constant adjustments, maintenance and upgrading involved in moving individuals and workspaces over to new information infrastructures (Star and Ruhleder, in press). Finally, the metaphors of "organizational texture" seems to be useful for a first cut at describing the different foci--people often explain their experience with digital media, including libraries, in textural terms such as being "close knit," "knotted up," or "dangling here." We are investigating a number of such metaphors as a way of talking about the vast, complicated changes engendered by both digital libraries and the large scale network changes in which they are embedded (Cooper and Fox, 1990).

Focus Groups

In order to understand the needs of potential users of the DLI testbed, we conducted three focus group interviews with members of the academic engineering community: one with faculty members, one with graduate students, and one with undergraduates. Participants were randomly selected from rosters of computer science, engineering, and physics departments. Between four and seven people participated in each focus group; one member of the DLI evaluation team served as moderator, introducing general topics of conversation, while other team members took notes to accompany the audiorecordings of the sessions. Brief written surveys on computer, network, and library use were completed by each participant.

Discussion in the focus groups centered on the purpose and nature of participants' use of journals, problems associated with journal use, other information sources important in academic engineering work, and ideal digital library features. In comparing the comments made about journal use and digital libraries by faculty, graduates, and undergraduates, we began to see the individual as a member of a knowledge community whose primary elements include both documents and other members of the community. One's sense of the knowledge community and one's own place in that community develops over time, as experience with the knowledge domain, knowledge structures, and other community members expands and coalesces around personal interests and needs.

Undergraduates had little sense of themselves as members of contributors to, or discerning consumers within, their knowledge community. Journal use tended to be assigned, as opposed to self-initiated, and their needs were reflected in a "one-stop shopping" approach to engineering journals. Their goal was to find several articles on a particular topic as quickly as possible, and they had little knowledge of particular authors, journals, perspectives, or terms. On the other extreme, faculty members had a keen sense of the organization of their knowledge community and their own place in it. They seemed to exemplify a "cruise and swoop" approach to the journal literature, needing to maintain an overview of particular topics and research developments and quickly retrieve specific information of interest based on their knowledge of particular authors, journals, developments in their own and others' work, and the ability to predict what would be found and how needed information would be identified and acquired. Graduate students occupied the middle ground and seemed to be "working the literature" intensely in order to formulate a well-developed sense of their knowledge community and their own role within it. Journals were used to thoroughly review a particular topic, learn about publication norms, develop a personal perspective or interest, support research, and find specific facts, theories, and methods. Graduate students had a preference for certain authors and journals, and spent a great deal of time trying to understand library systems and organizing, annotating, and sharing their personal collections.

This analytic perspective is helping us understand which digital library features will be important to broad classes of users and why. For example, while undergraduates expressed much less interest in the ability to search for journal articles by particular authors, graduate students hoped that the digital library would improve their ability to perform comprehensive author searches, and faculty members wanted contact information for each author included so that they could get in touch with them to discuss their work in greater depth. But we also feel that we can identify digital library requirements that cut across groups. The focus groups made it clear, for example, that virtually all journal users would like to be able to access independently only specific article components and follow cited references forward and back; all have difficulty identifying documents through topic words, whether controlled and uncontrolled vocabularies are used. Ideal digital library features most often mentioned by focus group participants include:

  • Flexible and user-controlled search and display mechanisms that would support both browsing and direct retrieval, with substantial improvements to help users "find the right words" to use in topic searches

  • Powerful searching at, and movement among, all levels (from entire collections to components within individual documents, from document records and representations to the document itself)

  • Convenient, comfortable, easy, and inviting physical access

  • Comprehensive and integrated access to information in various formats, places, sources, and disciplines

  • Orientation, explanation, and examples to facilitate the user's understanding of what the digital library contains and how it can be used

  • "Live" documents that would allow users to access and manipulate information contained in figures, references, etc.

The focus groups provided important insights into journal and information system use within the context of work within the academic engineering community. Results have contributed to developing our theoretical framework for understanding digital library use; suggesting which features are perceived as most important to potential users and why; and providing an indication of the types and levels of online and offline support that users will need.

DLI Software Instrumentation and User Registration

We are currently developing two primary mechanisms for automatically collecting DLI usage/user data. First, many of the DLI software components are instrumented, or programmed, to collect detailed transaction logs of each user session. Second, an automated DLI user registration process collects demographic information about each DLI user and provides a confidential mechanism to link the data in the transaction logs to individual DLI users. These data are being collected to serve two primary purposes. The first purpose is to provide various sorts of management data. This includes summary data to the project management and the outside world concerning the number of users and their aggregate behavior, as well as providing system performance data to ensure that the DLI is operating within acceptable tolerances. The second objective met is the collection of detailed data on individual DLI users and their behavior.

Management data are provided both by the data generated by the instrumented software and by the data collected during the user registration process. For example, the registration data can be summarized to provide profiles of the DLI user community. The transaction logs can be used to provide summaries of user behavior in the aggregate, answering questions such as how many searches are performed during a given unit of time, what are the most frequently used functions, etc. This type of "how much, how often" information is also useful to DLI systems administrators in terms of monitoring performance of the DLI and planning for changes in demand and capacity.

Information about the behavior of individual users is provided by both the instrumented software and the user registration data. Prior to being allowed access to the DLI, a prospective user must successfully complete a simple online registration process which collects contact information as well as information about the user's professional background and the extent of the user's familiarity with common computing and communications systems. Information which could be used to identify the users is kept confidential and in a separate location. The DLI is an aggregation of several independently developed software systems, including database management systems, database search engines, thesauri, graphical World Wide Web browsers, SGML tools, and SGML and other data format viewers. Instrumentation is being introduced into several of these systems to provide details (in the form of textual transaction logs) of user interaction with each system. Essentially, each of the instrumented systems reports details on menu selections, search requests, and specified mouse movements in order to provide a fairly clear indication of what the user was doing at the system's interface. By logging the user's interactions with each of these systems, we can focus on one of the systems in isolation or all of the systems as components of a digital library. A higher level methodological question we hope to address is the extent to which these types of transaction logs can be used effectively to understand the behavior and needs of individual users, instead of using methods such as interviews and field observations.

We plan to ask the following questions of the registration and log data:

  • What are the demographics of DLI user community?

  • How many people have registered as DLI users? How has the number of users changed over time?

  • What is the level of usage per registered user? How does the passage of time affect frequency of usage?

  • What are the levels of system usage over time? What are the performance and capacity characteristics of this system?

  • What portions of the DLI need modification in order to address system or interface concerns revealed by the transaction logs? What DLI features and content are used most frequently or least frequently? What influences the frequency of use of individual system features?

  • What are the patterns of user behavior? How do individuals use the system, both per session and longitudinally? How do the usage patterns of various demographic groups match or differ from other demographic groups?

We will apply a variety of methods when analyzing these data including statistical analysis (frequencies, cross-tabulation, etc.), transaction log analysis (Peters, 1993), and examination of logs in combination with interview and observation data.

Interviews and Observations

To better understand the needs, preferences, and practices of potential DLI users, we have conducted individual interviews with, and observations of, high school students, undergraduate and graduate students, and engineering faculty members. We wish to give the testbed designers feedback concerning users' search techniques and problems with existing systems, as well as study the engineering community more generally, and we are using multiple sources to collect these data. Thus, we are using the grounded theory approach, as introduced by Glaser and Strauss (1967), to analyze these data. Our approach involves transcribing the interviews and observations, coding the most frequent occurrences of actions and perceptions within the data and asking under what conditions these actions take place. These coding schemes are then abstracted into memos which are then compared with other sources of data. This is an iterative process directed at inductively generating concepts and theories from the data collected.

The short term goal of our work is to not only provide feedback to the testbed team on issues that they specifically ask about, but also to watch for "the unexpected." For those observations and interviews most directly related to user interactions with library systems, summaries of coded data are given to the testbed engineers immediately so that they benefit from direct user input as they design the digital library testbed. As more observations and interviews are done, the coding scheme is expanded and refined and can be applied to other data sources as well, and more integrated and analytic feedback can be presented to system designers. Further, the integration of all of our data in this manner serves our long-term goal of generating theories about information infrastructure, knowledge gathering and sensemaking, and communication in the engineering community.

In the fall of 1994, ten interviews were conducted with individual engineering and physics professors. They were asked about their information finding and computer usage habits. It was found that each person has developed personal information systems which vary greatly: some work primarily within institutional structures and employ formal searching strategies, such as going to the library regularly; some are just the opposite, primarily using browsers to find relevant papers on the Web; others combine both approaches. An important aspect of information seeking is the person's reliance on others--their colleagues, students, and friends--to know what is "out there" and to keep informed. As one professor pointed out, "a lot of it [information finding] is word of mouth, in a way. It sort of originates with somebody finding a written document, but if something is interesting then you are more likely to hear about it from someone saying 'oh, hey, did you see this?' than actually searching around and going to the library." Additionally, it was found that individual engineers and physicists have differing levels of computer literacy, which in turn affects where they go to find information. This has larger implications for developing the digital library; clearly, it needs to be easily integrated into existing information finding systems as well as usable in a variety of situations by a wide range of people.

Observations were conducted at the Grainger Engineering Library, both at the reference desk and at various library computer stations. At the reference desk, we observed what patrons were asking and how their questions were answered by the reference librarians. Most patrons came to the reference librarian only when they had exhausted their own skills. Most of their questions stemmed from the fact that the patrons had incomplete or inaccurate information to search with, and they did not have a complete understanding of the systems and databases that they were using. Compared to library users, librarians used many different search strategies, tools, and information sources: searching in fields other than author or title (such as ISSN or ISBN numbers), using other databases, reference books, and Library of Congress headings. It seemed that the difference in searching ability between the reference librarians and the patrons was based not only on formal knowledge but also on the tacit know-how that the reference librarians acquired through experience. They often recognized documents by description only and knew an extra piece of information that allowed them to find the item. In addition, the reference librarians had also developed work-arounds to common problems associated with this particular library's systems. For example, a binder they called "the cheat sheet" contained an alphabetical listing of the journals and issues that the library owned, a valuable tool not generally accessible to library patrons.

We have also begun observing how patrons use existing computer systems related to the retrieval of fulltext material: Mosaic, Engineering Index on CD-ROM, the expanded OPAC, and IEEE's fulltext journal system. Our testbed designers were interested in learning more about the online information behavior of people who would soon be trying out our prototype system. Preliminary analysis reinforces the findings from the reference desk observations: patrons do not usually have a deep understanding of either the content or nature of the systems that they are using, and they employ a more random searching strategy with general information (i.e., subject searches with uncontrolled vocabulary). When patrons did not know what a database contained, they would simply enter their subject search terms and see what was returned, rather than trying to find a source of information to tell them if the contents of the database were relevant to their search subject.

Another set of observations were done at a local middle and high school. We watched students use computers generally, as well as in the library setting. These observations were followed by a small number of interviews. We found that, first, students are quite sophisticated in their information finding abilities; for example, on the Web, they knew the difference between different search engines available to them by the results retrieved. In the library, many of the people we talked to could and did search the entire University library system as well as the state-wide library system for items that they needed. Secondly, students seem to fall into two groups: those that are very computer literate but are unfamiliar with more traditional information finding systems in the library, and those that know the information systems in the library but are not as computer literate. These groups had little overlap.

Students also seem prone to exploring the systems, searching randomly and "creatively." For example, when a student was asked to show us a Web site that she had seen, she tried searching for it with the usual search engine and did not find what she wanted. She had found out about the Web site through a newsgroup that she had logged on to and then tried accessing that newsgroup again to see if the posting that gave her the URL was still there. It was not, so she instead went back to her email account to find a message in which she gave a friend the URL of a FAQ page on another related topic which then pointed her to the page she originally wanted. In this rather roundabout fashion, she did get the information she was looking for. This particular search strategy also illustrates the manner in which interpersonal and formal pointers are used for information retrieval in the digital realm.

Implications for Academic Institutions and their Constituencies

Our work during the first year of the DLI project leads us to several conclusions related to academic institutions and the communities they serve. We feel that the transition regime in moving to digital, distributed infrastructure will affect the entire texture of existing organizations. To understand how the transition will affect them and their clients, academic libraries must be aware of transformations at the individual, workflow, institutional and occupational, and Web-wide community levels.

Academic libraries should begin considering their role not just as gateways to information, but as developers and implementers of new systems. We should recognize that "journals" as we know them are a socially constructed phenomenon and that their functionality has been constrained by the technologies employed in their production and use. Another important consideration for academic institutions is that, in the new infrastructural regime, the concept of user access expands to encompass contributors to, as well as users of, digital libraries. Academic libraries should also consider their approach to the current separation between Web and library-based systems for identifying and manipulating needed information. The shifting boundaries between personal, professional, and institutional information infrastructures are linked to different tools and settings in a manner that may negatively affect individual knowledge creation and library operations.

It is important for academic libraries to consider the full range of organizational impacts from their chosen role in the development of digital libraries. What part of the information universe are they concerned with and what part of the information creation, search, and use cycle will they directly support? What new blend of human and computer processing--in creating digital repositories, selecting material, helping people match their concepts to system terminology, assessing system performance, and helping people to use the system--do they need to prepare for? These decisions will affect the nature of library work dramatically.

DLI projects provide a testbed for academic institutions in another sense: what are the implications for universities of taking on these kinds of large-scale R&D efforts? And how will they manage their work? How can the resources and talents of different units, from libraries to computer services to individual faculty and student researchers, be integrated most productively? In pursuing our sociological and evaluation research, we have found it very difficult to study the use of different systems at different levels in order to meet the different needs of the various participants in the system design process. Collecting and presenting data within the environment constructed by the differing views of designers, librarians, publishers, social science researchers, and users presents a considerable challenge. While our DLI work has unveiled some of the ways that academic institutions and their constituencies are forging new identities and alliances around digital libraries, it is clear that the nature of the large-scale infrastructural changes that we are currently immersed in will emerge slowly.

References

Bucciarelli, L. (1994). Designing engineers. Cambridge, MA: MIT Press.

Cooper, R., & Fox, S. (1990). The 'texture' of organizing. Journal of Management Studies, 27, 575-582.

Peters, T. A. (1993). The history and development of transaction log analysis. Library Hi Tech, 11(2), 41-66.

Star, S. L. (1986). Triangulating clinical and basic research: British localizationists, 1870-1906. History of Science, 24, 29-48.

Star, S., & Ruhleder, K. (In press). Steps toward an ecology of infrastructure. Information Systems Research.

Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative research. New York: Adeline de Gruyter.