DLI Social Science Team Home Page

Index

Diary

Internal Reports

Completed Papers

Papers in Progress

Conference Presentations

Site Visit and Quarterly Reports

Main DLI Page

Web Client- DeLIver



send comments or questions to: l-neuma1@uiuc.edu

Future Research: Linking Social and Technical Aspects of Digital Libraries

Prepared for Site Visit, March 1996

S. Leigh Star


One of our overall goals for the social science team is to understand how people use digital material in their workplaces (and/or homes), and what difference it makes to the work organization. In future work we would like to focus part of this interest on an investigation of those workplaces. I propose to investigate how people develop ad hoc, local "folk" classifications for documents in their offices, and then how those classifications (or cognitive maps) match or mis-match the emerging digital library. This would include how remote access is managed at the desktop, associated activities (printing, filing on the physical or virtual desktop, according to what categories), whether information seeking is a response to browsing, pointers from friends via e-mail, phone, face-to-face contact.

Such an investigation would use observational techniques and interviews in the context of workplace assessment. (If homes turn out to be a significant venue they could be added at a later date.) In addition, I will write a supplementary proposal (initially to the UIUC Research Board and the Advanced Information Technologies Lab here) for a graduate student in industrial design or photography, who could accompany her on field visits and take photographs for comparison. Ron Rice at Rutgers has been conducting a similar study in workplaces, with photographic data, and has offered to share data. Barbara Kwsnik at Syracuse has done interesting work to classify how documents are named in offices, and we will attempt to use and perhaps expand her document taxonomy. Suchman, Trigg and Blomberg at Xerox PARC have investigated a similar set of problems in legal offices.

Rationale

The categories represented on a desktop are fairly ad hoc and individual, not even real anthropological folk or ethno-classifications, but more like a combination of cognitive maps and fingertip skills. But everyone has them in some form, and they are important in organizing computer-based work. If we are to understand larger-scale classifications, we also need to understand how desktop classifications link up with those which are formal, standardized, and widespread.

Every time you make a link in hypertext, you create a category. You make some judgment about two objects: they are the same, or alike, or functionally linked, or linked as part of an unfolding series. The rummage sale of information on the world wide web is overwhelming, and we all agree that finding information is much less of a problem than assessing its quality -- the nature of its categorical associations, and by whom they are made. The lens of classification, from desktop to wide-scale infrastructure, is a good one through which to view problems of indexing, tracking, and even bibliography on the web. In its social and cultural dimensions, it can offer insights into both the logical structure of classification schemes and thesauri and into their social implications.

Related Background

One of the research groups with I am involved at Illinois, the Illinois Research Group on Classification, has conducted a number of studies of the history and sociology of medical classification. We have looked at formal classification systems such as the International Classification of Diseases and the Nursing Intervention classification, and examined the processes of negotiation by which differing approaches to disease and to medical work are resolved into a single category scheme (Bowker and Star, 1994; Bowker, Timmermans and Star, 1995). We know from this research that the tensions and tradeoffs involved in classifying are never fully resolved. Ambiguities, local, moral, scientific and ethical differences are always a part of any workable scheme. Furthermore, idiosyncrasies and workarounds in encoding practices make "data quality" extraordinarily elusive from the point of view of control; any coding scheme is only as good as the work of the coders, and in many instances in medical classification, the coding work is neither valued nor monitored. (We know all too little about this aspect, by the way.) Nonetheless, in spite of these obstacles, at a large scale, these systems have become relatively stable, and are an important part of medical infrastructure. At least in part, they become stable in their very tolerance for ambiguity and for local tailoring of classifcatory needs.

The equivalent of coding work in digital libraries resides in the hands of readers who take the classification schemes (in various forms) of materials and adapt them. These schemes come from librarians, programmers, and other information intermediaries, as well as from other readers, often including oneself in the form of bibliographies, links and notes made at an earlier date.

The messy part comes in the seemingly innocuous verbs "taking and adapting." We know that navigating the web/digital library and information retrieval is an active selective process. It is now paired with the ability to download, filter and select -- in short, to begin to make a customized library for oneself. These materials are an interestingly messy combination of paper and electronic media; they are also a combination of pointers, abstracts, and documents. Whatever I pull onto my desktop, or navigate through from my link to the web, at this point as a reader, then, begins to weave in with my desk itself, my office, my filing cabinets, and all the rest of what Kling and Scacchi called "the web of computing" (1982).

The difficult part of tracing this web backwards through the net is its distribution in space, of course. If we want to understand how people are using both our interface, our repository, other federated repositories, in the context of the web, we need to go out into some offices and understand more about the ecology of their workplaces. One way to do this is through an understanding of how readers are classifying their local work spaces and materials, down to the level of the filing cabinet, but also at the level of the computer desktop, web browsers, and group-level software. Matching these schemes with the larger, more formal classification schemes is something we all do and have done almost routinely (how do you file your reprints? do you have different stacks for "must read" and "maybe someday"? for "I'll use this in teaching" and "I'm just keeping this because I feel too guilty to throw it away?"). Treating this as a topic for analysis will be one strong link between the social and technical aspects of designing digital libraries.

References

Bowker, Geoffrey and Susan Leigh Star. 1994. "Knowledge and Infrastructure in International Information Management: Problems of Classification and Coding" Pp. 187-213 in L. Bud, ed. Information Acumen: the Understanding and Use of Knowledge in Modern Business. London: Routledge.

Bowker, Geoffrey, Stefan Timmermans and Susan Leigh Star. "Infrastructure and Organizational Transformation: Classifying Nurses' Work," Proceedings IFIP WG8.2 Conference: Information Technology and Changes in Organizational Work. Cambridge, England. IFIPS, December 1995.

Kling, R. and W. Scacchi. 1982. "The Web of Computing: Computing Technology as Social Organization." Advances in Computers 21: 3-78.


Laura Neumann

During the rest of 1996, I would like to spend my research time focusing on some topics that have been suggested by our previous data collection efforts, but which have not been fully explored. In this context, I would like to spend time observing people in their day to day work practices. Generally, my interests revolve around learning how and why people navigate the information space that they are in.

The premise to dealing with these questions is learning more about what information space people find themselves in during their day to day work and where that information space comes from, who shapes it, and how. Earlier research suggests that this is highly dependent on the experience of the individual in their field as well as other factors.

As I explore this first issue, I would like to look at how people go about answering questions that they have. They may be drawing upon personal, informal, and/or traditional resources. Some of these "resources" probably include colleagues, friends, the web, the library, or a desk drawer full of articles. Individuals make choices as to which source of information they will go to in any given situation based on a variety of factors which probably include convenience, reliability, and their perceptions of how each resource will be able to deal with their question. I would like to look at what choices are made, and why the choice was made the way it was.

Part of this interest in where people go to answer questions is discovering how they formulate those questions and how they understand the information space around them. Their opinions on the usefulness of the web versus the library or on the difficulty of using a given system will mediate which system they choose to help them with any particular problem.

Another similar issue is how people remember and find what ever they need to answer their question. It is my impression that the "traditional" ways in which a system can retrieve items (e.g. author, title, date, journal, etc.) are oftentimes the least important ways that people retrieve material. Spatial memory, mapping, and individual classification systems are constantly in use as people go about finding things. My observations also lead me to believe that the color, size and feel of items are equally important.

Finally, I would like to examine the way in which members of a group interact with colleagues as their first or second most information resource, which suggests that there is a great deal of information flowing between people. Items such as books or articles, in addition to ideas and feedback, are passed amongst people. I would like to learn more about the dynamics of these flows of information.

All of these issues are tied into understanding the use of electronic information resources, generally, and our DL, specifically. E-mail and on-line archives are important resources for people who are looking for information. But these systems, as well as developing "digital libraries" impact and alter how people can go about looking for information. How is a spatial map for navigating a physical collection translated into the world wide web? How do individuals use cues such as color and "feel" to make sense of on- line resources? How can digital information be made ready-to-hand in a similar way to the manner in which an article sitting on one's desk is easily accessible? In the end, I would like to take a greater understanding of the workplace, setting, and workflows of information finding and question answering and apply that to improving our understanding of the growing and developing world of on-line information resources.


Ann P. Bishop

In the past, journal articles have been viewed as relatively stable and unified information units. Highlighting important sections, quoting particular passages, annotating personal copies, and photocopying only the bibliography are examples of how people currently both disaggregate and augment individual documents. One of the most innovative aspects of our DL is its capacity, through SGML and enhanced search features, to support the retrieval of individual components of documents. In this manner, the electronic journal article becomes increasingly malleable. Individual figures or equations can be disaggregated from their surrounding textual packages. As we've heard from many scientists and engineers, this capability can be highly valuable. Oftentimes a figure is a more useful than its title or abstract as a predictor of whether or not an article is worth retrieving and reading. Sometimes an equation is all that is needed from an article. And the ability to search for a term in some particular portion of the text could make information searches much more precise. Augmentation of the electronic journal article occurs as an individual cuts and pastes material from a variety of formal and informal sources or makes use of links from one document to related documents, data, or other material.

Our interactions with potential DL users lead us to believe that component retrieval and links to related material will be two of the most important features of our system. To what extent will scientists and engineers actually use these features once they are available? And in what manner will these new capabilities be employed in research, teaching and other endeavors? How will knowledge creation and transfer be influenced? What are the difficulties and dangers in "fragmenting" knowledge in this manner? I would like to focus future research on an investigation of this area by combining the analysis of DL transactions logs with workplace (office and library) observations and interviews. I believe that results will be important in evaluating and designing our DL testbed as well as in improving our understanding of information needs and uses in the context of day to day work.