Papers in Progress
Site Visit and Quarterly Reports
Main DLI Page
Web Client- DeLIver
send comments or questions to: email@example.com
From Usability to Use: Measuring Success of Testbeds
in the Real World
Laura J. Neumann (L-NEUMA1@UIUC.EDU)
Ann Peterson Bishop
15 June, 1998
to appear in the Proceedings of the UIUC DLI Spring Partners Workshop and 35th GSLIS Clinic. March 22-24, 1998.
In 1994, as part of its commitment to the National Information Infrastructure (NII), the NSF, DARPA and NASA funded six digital library projects at different universities (under the Digital Library Initiative, called "DLI" for short). Our Illinois DLI is a distributed, multi- disciplinary project. Computer scientists, librarians, and social scientists are working together to develop an SGML database, and protocols for federating repositories of data. The Testbed Team has constructed a prototype system which contains the full text of over 50 engineering, physics, and computer science journals. Some of the innovative aspects of our DLI project can be seen in the web based search interface called DeLIver. Through SGML markup of the scientific articles and enhanced search features, users can search for and display information from individual parts (e.g. "MIT" in author affiliation or "spectrum" in a figure caption) or the full text of the article at their desktops via the web.
Our Social Science Team is charged with carrying out user studies and evaluation work on the project- a vague charter at best. All the different teams on the project bring different expertise, interests, and assumptions about how the DLI project should work and what it is about. We have often found ourselves at the crux of these differences in our multiple roles of providing user feedback, running usability tests, meeting with the reference librarians who are responsible for incorporating the system into the library, testing new methods of study, and taking broader theoretical perspectives. Over the last almost 4 years, as our work has grown out of moving between these different expectations, we have had to negotiate these multiple and sometimes competing visions as to what the project was about, who its target audience is, and how best to reach and serve that audience (Neumann and Star, 1996).
At the same time that our work is many things to many different people, so is our DLI testbed itself. DeLIver is a hybrid system: something of a research system, a system to demo to project sponsors, and a system that is in production mode which users rely on (Bishop, 1998a). Each of these facets of our system imply different strategies and foci of research for the Social Science Team. We have approached dealing with this by tacking between the "usability" and "use" issues. It is really only in the last year, when our system began to have a substantial body of users, that use and usability began to merge.
History of Social Science Team
The DLI Social Science Team is comprised researchers with expertise ranging from qualitative research to computer programming to collecting transaction logs. We perform both formative and summative evaluations aimed at improving system design and documenting system use. At the same time that we are gaining new knowledge about the use of our system, we are trying new methods for studying the use of a digital library. Both of these lines of work also inform our broader interests in learning about the work habits of our potential and actual users, use of scientific and engineering journals, and use of digital libraries generally. To these ends, we have conducted needs assessments and have provided ongoing user feedback to system designers. We are working on documenting and analyzing extent and nature of testbed use, satisfaction, and impacts in the context of broader issues around the changing information infrastructure. We have had to develop new methods for capturing user behavior that spans the on- line and off- line environments, and we have made an attempt to assess these methods and discuss them with our scholarly community (Bishop, 1995; Bishop, 1996; Bishop and Star, 1996). In this paper, we will present an overview of our findings on usability and use, as well as a summary of some of the lessons we've learned in the course of our research.
Our goal is to create an integrated research program, combining the study of testbed users with broad studies of use with deep studies of social phenomena connected to the use of our digital library. We have gone about doing this through a wide variety of both qualitative and quantitative methods, such as observation of engineering work and learning activities, interviews and focus groups with a range of potential and actual system users, usability testing (particularly following Monk, Wright, Haber, and Davenport, 1993), and large-scale user surveys. In addition, we are employing a number of automated data gathering techniques, such as user registration, on- line feedback, and system instrumentation (the creation of testbed transaction logs).
We have also been experimenting with methods such as on- line "exit polls"— a short survey that pops up during a session after a particular amount of time has elapsed. We adapted our instrument from one created by researchers at the Alexandria DLI project in Santa Barbara. These polls ask about the purpose of the session as well as their success in accomplishing their goals, and include a few other questions about users' overall impressions of DeLIver. A second method that we have found to be fruitful is one we call "situated usability" interviews. This involves selecting people that fit particular criteria from the pool of registered system users and pulling their transaction logs. Using screen dumps of the different parts of the system and the transaction logs as conversation prompts, we asked targeted individuals questions about things that they had done with the system, why they had done them, and any comments they had about usability issues.
We are bringing the results of each of these methods together, in order to triangulate the findings and provide a deeper understanding of the nature of digital library use and the social phenomena involved.
Usability to Use
Up until this past year of work, our research efforts felt largely fragmented. On the one hand, we were doing usability tests to aid in interface development and we were talking with users who were testing early versions of the system. On the other hand, we were doing mini- ethnographies examining information gathering in "real world" settings. The one set of activities did not much overlap with the other. However when our web client's (DeLIver) "roll- out" began in October of 1997, pieces from the full range of our data collection initiatives began to fall into place to form a more coherent picture of use (see Bishop, 1998a for a more extensive discussion of what follows). In being forced into thinking about how we should gauge success and how we should measure use, we were able to find a way to bring together all of the work we had done up to that point.
Making that Transition: Roll-out Panic
To access DeLIver, prospective users must first enter their University of Illinois network identification number in an online "NetID form." This allows the publishers of material in the testbed a reasonable assurance that access is restricted to campus affiliates, in accordance with our original agreement. After filling in their NetID, prospective users must complete a registration form that provides us with the basic demographic data that will help us learn more about who is using DeLIver.
Analysis of the web logs revealed that, between November 1- 14, of 1540 attempted accesses, 1276 (83%) were abandoned at the NetID form. Of the 186 people who entered a NetID, 91 (49%) stopped at the registration form. Obviously, these grim numbers lead to a certain amount of panic on the project, so the Social Science and Testbed Development teams met to discuss what was going on. The debates centered on both why we should reduce barriers to use, and how to do so. In answering the "why" question, the hybrid nature of our project was made clear: in terms of creating a production system, some people were primarily concerned with making the system easier to access; in terms of demonstrating our system to key stakeholders, others were concerned with showing high usage statistics; and as a research project, we all wanted to draw users in so that we could learn more about digital libraries and their use.
Discussion of what could be done to reduce barriers to use included simplifying the testbed functionality by removing the multiple search options, removing the login and registration procedures entirely for a while, streamlining the login and registration forms, and finally, stepping up publicity by pinpointing the hubs of use and reaching more people. The last two options were selected as the least drastic, and were given a try.
In addition, this situation caused us to reflect on what the "bail outs" were telling us. We turned to our own and others' more general research on digital library and web use to reframe our thinking. A number of issues were revealed.
The question of how we should measure use and define success also became salient at this point. What could we expect as a reasonable number of users? In addition, the question of what was a "real use" of the system was raised. In terms of success, we looked at the real numbers of our pool of "potential users." There are approximately 1,000 graduate students and over 500 faculty members in various engineering areas, physics, and computer science at this university. The scale of users should be in this ballpark. The limited collection in our testbed does not comprehensively cover any of these areas, and some types of engineering are not represented at all.
- People aren't UIUC affiliates. We had to think of the potential pool of people who had access to our login screen as anyone on the web. Obviously, the vast majority of web users do not attend or work with or for our university. These people would turn away when asked for a "UIUC NetID."
- Lack of awareness among target audience. In reflecting on research that we and others had done, we knew that our system was most likely to be useful to graduate students in a particular set of fields (Star, Bowker, and Neumann, under consideration). Up to this point, general publicity was primarily in the libraries- and people in our target disciplines are not heavy library users (Entlich et al., 1997; Lancaster, 1995; Pinelli, 1991; Garvey and Griffith, 1980; and others).
- Registration form equals fee. Given the variety of systems available on the web in conjunction with some services available at our university, potential users had reason to suspect that they were being asked to register so that they could be billed for their system use.
- Lack of real need—just surfing. The DLI project has gained a certain amount of visibility among people interested in building digital libraries. It was possible that these people, and others, were simply curious about our site and surfed by to take a look. When presented with the NetID form, they left.
- Confusion—"NetID? What do I do now?" Use of a NetID for course registration and other purposes is fairly new to our university. It was entirely possible that people did not know what a NetID was.
- Registration form is too long. Our initial registration form was longer than a single average screen length. When confronted with what appeared to be an endless list of questions, it is possible that potential users did not think that using the system was worth the time needed to complete the required registration form.
When considering the likely frequency of use, we had to consider which of these people would actually be interested in our journals, and of those interested, which potential users would not have their own paper subscription that they would prefer to use? In addition, our research indicated that our users' searching and browsing habits followed cycles of research and of the semester—when would they be likely users? The time frames that were examined are just before finals at the U of I and during the winter break. Students are studying for finals or writing up projects; faculty are grading and doing catch- up work. Heavy use of our system was not likely at this time (Ignacio, Neumann, and Sandusky, 1995). Finally, our expectations of use should be modified by the amount of time needed to effectively market the system, allow new users to learn the system, and allow people develop into committed users.
A useful strategy to frame usage statistics will be to gather comparative data. How many people ever use the university's other on- line library systems, how often the paper journals are used, and the number of registered users for other digital library systems- including an earlier version of our own- must all be considered.
Ultimately, after the "NetID" form was clarified and better explained to users, it was made clear that the system was free, and the registration form was abbreviated, the "bailout" statistics improved somewhat:
Dates No. attempted accesses/ No. stopped No. people who entered NetID/ No. people stopped
Nov. 1- 14, 1997 1540/ 1276 (83%) 186/ 91 (49%)
Dec. 9- 19 462/ 259 (56%) 113/ 35 (31%)
Jan. 1- 23, 1998 560/ 240 (43%) 320/ 49 (27%)
Feb. 18- Apr. 9 1978/750 (38%) 1228/162 (29%)
The definition of "real use" of the system is a topic still under consideration. Our project, and DeLIver in particular, has gained some visibility on campus. It was brought to our attention that several large classes that deal with interface design have used DeLIver as a case study to be critiqued by the students and discussed in class. No doubt many of the undergraduates from non- science related disciplines accessed the system for this reason. DeLIver is also used as an example system in information retrieval classes in the library school. Many librarians in the science- related libraries have logged into the system so that they are somewhat familiar with it in case a patron at the library should ask a question about it. An untold number of people also log onto the system just to "check it out" or "mess around" to see what it is for. It is an open question whether or not these people should be considered "real" users. They are not interacting with the content of the system or the interface in order to complete work tasks by accessing information in the scientific and technical journals. However, who are we to say what people "should" do with our system? On the other hand, if the goal is to discuss what researchers find valuable about DeLIver, or to see if the ability to search parts of the article has proved useful in retrieving relevant items, then the logs of people who used this part of the system "just to try it" or to critique the interface could be misleading.
In the same way that working with some of the nitty- gritty aspects of our registration and transaction logging data has lead us to think more deeply about broader theoretical questions, we see our transaction logging and registration data as couched in larger questions. As we have noted, our definition of use depends on the identity and intentions of our user, as well as how he or she perceives and interacts with DLs and information more generally. Our findings begin to address these larger questions.
Understanding Digital Library Use
Several data collection and analysis efforts are still underway, such as transaction logging and large scale survey analysis, but we have gathered some preliminary information on the extent and nature of use that will be addressed more comprehensively later. In addition, we have investigated other areas related to social practices and digital libraries. These are sketched out below.
Who Are Our Users? What Are They Doing?
From our registration process we know that as of 5 June 1998, we have 1174 registered patrons of DeLIver. "Patron" refers to someone who is neither working for the project, nor employed at the library where our project is based. Approximately 75% of our patrons are men, 70% speak English as their primary language, and they are mainly in the 23- 29 age bracket. About 50% of DeLIver users are graduate students, and 30% are undergraduates. There is a surprisingly wide audience for the system-- all kinds of engineering are represented, and other science related fields such as ecological modeling, materials sciences, and biology- as well as users from fields such as communications, education and psychology.
We have found that, in spite of this wide audience, our heaviest users closely reflect the content of our testbed, which holds a large collection of items from civil engineering, electrical and computer engineering, and computer science. People who identify their primary field as "engineering—general" are also a large percentage of our user population, and have the highest average number of sessions. The graduate students login most often, but not by much. The relatively small number of faculty members who use the system seem to be intense users. A preliminary look at over 200 recently completed user surveys show that respondents are generally satisfied with our system. They also indicate that it has generally adequate search power.
When we begin to delve into use of the system, our transaction logs will provide the most information. At this time, we are still working with that data and results are not yet available. However, interviews, usability tests and logs from a previous version of the system indicate that people are using multiple search terms, but that they are not taking advantage of searching different parts of the articles. In interviews, on the other hand, people say that this type of search is a nice idea.
Interviews reveal that users often take advantage of the ability to see the abstract of the articles either instead of or before retrieving the full text. Although several people we talked to were looking for components of articles, they strongly stated that they are not interested in seeing "only" the figures or equations because without the surrounding text, these could not be evaluated. We have found that the extent to which people use the available full text is complicated by the fact that extra software is necessary to access it. When the software is available and functioning properly, the full text is used—however, downloading the software onto a local machine and getting it to work is no simple task. Accessing the full text also requires either a Windows environment and the use of Netscape or that a PDF version of the desired article be available (not all publishers have made a PDF version of their articles available; at last count it was about two-thirds of the contents of the database) and the user's machine have a PDF viewer. A few of the users we talked to did not understand the difference between SGML and PDF, and others did not understand the configuration of software needed. We have already talked to several users who have had many problems with these issues.
Given the nature of searching and display that is made possible through the use of SGML and the layered means of displaying search results, we have explored how researchers use journal components--such as abstracts, figures, equations, or bibliographic citations--in their work (Bishop, 1998b, in press). We have identified five basic purposes for use of article components: to identify documents of interest; to assess the relevance of an article before retrieving and reading the full text; to create a customized document surrogate after retrieval that includes a combination of bibliographic and other elements (e.g., author's name, article title, tables); to provide specific pieces of information such as an equation, a fact, or a diagram; and finally, to convey knowledge not easily rendered by words, especially through figures and tables.
Engineers describe a common pattern of utilizing document components to zoom in on and to filter information in their initial reading of an article. They tend to read the title and abstract first, then skim section headings. Next, they look at lists, summary statements, definitions, and illustrations before zeroing in on key sections, reading conclusions, and skimming references. But engineers pursue unique practices after this initial reading, as they disaggregate and reaggregate article components for use in their own work. Everyone takes scraps or reusable pieces of information from the article, but everyone does this differently, for example, by using a marker to highlight text portions of interest, or making a mental register of key ideas. People then create some kind of transitory compilation of reusable pieces, such as a personal bibliographic database, folders containing the first page of an article stapled to handwritten notes, or a pile of journal issues with key sections bookmarked. These intellectual and physical practices associated with component use seem to be based on a combination of tenure in the field, the nature of the task at hand, personal work habits, and cognitive style.
Making Sense of New On- line systems
Our digital library also provides an opportunity to step back and try to take a broader look at the use of on- line digital collections, and how people attempt to make sense of them. By analyzing data from several different data collection efforts, we have found that users can be confused by a system like ours, and it takes some time and interaction for them to figure out what the system is. In many usability tests, we identified patterns of user actions designed to "uncover" what sort of system they were using our DLI was and what it could do. What first appeared to be a random "trial and error" use of the interface was actually structured exploration that occurred frequently across sessions. We argue that users take a "cut and try" approach to differentiate our system from other genres of on-line systems such as a general web search engine or an OPAC. In addition, users were looking for cues that would tell them what conventions of different platforms and different interfaces would hold. We note that because our DeLIver interface in particular draws on many different genres without carrying any one through entirely, users are confused—for example, in one version of the system, all underlined terms were not linked, and in the current version, not all the links are easily identifiable. However, there are no defined conventions for interfaces to web based digital libraries, and we need to find a way to signal to users what is and is not different about the digital library systems they encounter (Neumann and Ignacio, 1998)
One other area of general research deals with the larger implications of our changing information infrastructure. Star, Bowker and Neumann (under consideration) discuss how communities of practice converge with information artifacts and information infrastructure to produce the ready-to-handness of particular resources. This coming together of infrastructure, community- based work practices, and bits of information is what appears to be transparency. This is created and maintained through access to and participation in communities of practice and their associated information worlds. This is described more fully through three case studies: of academic researchers, a profession creating a classification of work practices, and a large scale classification system.
In the case of academic researchers, as a person becomes a full member of a community, he or she has an ease of access to information that is a part of the day to day living and work. These processes are mostly invisible to outsiders and are generally not made up of formal information systems, but rather colleague networks, professional duties, and personal collections. What appears from the outside as transparent access to a field of information is really a product of the particular social location of the individual. Professions and communities also deliberately create convergence on language and practice in order to demonstrate a unified whole. Across levels of scale, "transparency and ease of use for groups are products of a shifting alignment of information resources and social practices" (p. 4). Work on digital libraries can be informed by this research in that it makes an implicit level of information gathering explicit, and gives a clearer definition of the role of formalized systems.
The most important lesson that we have learned in our work is that triangulation of data on all aspects of use and usability is crucial. It is this process that has allowed us to pursue the different social issues surrounding DL use as well as dealing with specific usability issues of our DLI search system. Triangulation involves planning and work to build on past data collection efforts and methods in order to fill out a holistic picture of use. The goal of a full understanding of use becomes much more attainable when complimentary evidence is merged from multiple sources.
A second lesson involves defining the place of user support and marketing of the system. This involves such things as answering user questions about the system, writing documentation for "help" pages, as well as distributing pamphlets and putting up signs about the system. Our project did not explicitly assign the responsibility for these tasks to particular project members, and some things have slipped through the cracks only to have people scramble to deal with them later.
In terms of some of our data collection efforts, we are now more aware of the realities of on-line data collection. Asking users to register in order to gather general information about them has a trade-off: some users will not use the system because of it, however, analytical power is gained by requiring some basic information. Surveys administered only on- line will have an extremely low return rate. This is not unusual, others (Entlich et al., 1997; Borghuis et al., 1996) have reported similar experiences. There have been few people, as far as we can tell, who have "faked" information or declined dummy information. What has been more frequent is users declining to answer. Fully 25% decline to answer our ethnicity question, 10% decline age, and only 1% decline gender.
The transaction log data has been particularly challenging to work with. Because DeLIver is web based, there are some that are more difficult than with stand alone systems. Data is gathered in a continuous stream- although sometimes the system retrieves results or the full text in several chunks, and parsing individual actions has proven difficult. Decisions had to be made about how to define a session in terms of time when an action was not taken. Finally, using default settings on the search form has been problematic—it becomes impossible to differentiate between people who actually wanted to search, for example, the full text as opposed to those people who didn't notice that there was an option to search any part of the article.
Using the web for surveying users has taught us to expect extremely low response rates. However, the convenience of using a web survey as opposed to a mass mailing means that sometimes this is the best option. Finally, as we have already noted, we have used several different styles of surveys on the web- feedback forms, pop- up exit polls, and finally, the large- scale user survey was both sent out on paper and made available on the web. It will be difficult to compare responses due to the widely varying response rates between them.
Usability and use are two sides of the same coin. Three years of moving between the two types of study has been a series of managerial and time allocation challenges, but as the project nears an end, it is clear that it was worth the trouble. Having specific and concrete data on the usability of our system in its multiple iterations and versions has informed our wider theoretical perspective. But the flip side is that work on usability was greatly informed by the more general work that was carried out on work practices, journal use, and the changing information infrastructure. Each informs the other, and both are necessary for a clear picture of the emerging phenomena of digital libraries.
Bishop, Ann Peterson. (1998a, in press). "Understanding Use in the Real World." In Proceedings of the IEEE Socioeconomic Dimensions of Electronic Publishing Workshop, Santa Barbara, CA. Piscataway, NJ: IEEE.
Bishop, Ann Peterson. (1998b, in press). "Digital Libraries and Knowledge Disaggregation: The Use of Journal Article Components." In DL '98: Proceedings of the 3rd ACM International Conference on Digital Libraries. New York: ACM.
Bishop, Ann Peterson, compiler. (1996). "Libraries, People, and Change: A Research Forum on Digital Libraries," 38th Allerton Institute, Oct. 27-29. (http://edfu.lis.uiuc.edu/allerton/96/)
Bishop, Ann Peterson, compiler. (1995). "How we do User- Centered Design and Evaluation of Digital Libraries: A Methodological Forum," 37th Allerton Institute, Oct. Oct. 29-31, 1995. (http://edfu.lis.uiuc.edu/allerton/95/)
Bishop, Ann Peterson and S. Leigh Star. (1996). "Social Informatics for Digital Library Infrastructure and Use," In: Martha Williams, ed. Annual Review of Information Science
and Technology, vol. 30.
Borghuis, Marthyn, H. Brinckman, A. Fischer, K.Hunter, E. van der Loo, R. ter Mors, P. Mostert, J. Zijlstra. (1996). TULIP: Final Report. Elsivier Science: New York, New York.
Entlich, Richard, Lorrin Garson, Michael Lesk, Lorraine Normore, Jan Olsen, Stuart Weibel: 1997. Making a Digital Library: The Contents of the CORE Project. TOIS 15(2): 103-123.
Garvey, William, and Belver Griffith. (1980). "Scientific Communication: its Role in the Conduct of Research and Creation of Knowledge," Key Papers in Information Science. White Plains, NY: Knowledge and Industry Publications.
Ignacio, Emily N., L. J. Neumann, R. J. Sandusky. (1995). "John and Jane Q. Engineer: What About Our Users?" internal report. http://anshar.grainger.uiuc.edu/dlisoc/socsci_site/J.J.Q.Engineer.html.
Lancaster, F. W. (1995). "Needs, Demands and Motivations in the Use of Sources of Information," Journal of Information, Communication and Library Science. 1 (3): 3- 19.
Monk, A., P. Wright, J. Haber, and L. Davenport. (1993). Improving your human- computer interface: A practical technique. New York, NY: Prentice Hall.
Neumann, L. J. and E. Ignacio. (1998). "Trial and Error as a Learning Strategy in System Use," to appear in the American Society for Information Science, Annual Conference, October 26- 29, Pittsburgh, PA.
Neumann, L. and S. L. Star. (1996). "Making Infrastructure: The Dream of a Common Language," Proceedings of the Participatory Design Conference 1996 (PDC '96), Cambridge, MA: Computer Professionals for Social Responsibility/ACM.
Pinelli, Thomas. (1991). "The Information- Seeking Habits and Practices of Engineers," Science and Technologies Libraries 11 (3): 5-25.
Star, S. L., G. C. Bowker and L. J. Neumann. (under consideration). "Transparency beyond the Individual Level of Scale: Convergence between Information Artifacts and Communities of Practice." Journal of the American Society for Information Science.