Contents - Previous - Next


This is the old United Nations University website. Visit the new site at http://unu.edu


5. Language

Received wisdom in the 1950s was that IR required some kind of formal indexing or coding scheme. (Exactly what kind of scheme was one of the topics of endless debate.) Thus, items required indexing/coding in terms of the scheme, which probably in turn required a human indexer (though a machine might be taught to do it). A similar process was required at the search stage, in respect of the query or need, though an end-user might possibly learn enough about the formal scheme to conduct a satisfactory search.

The results of the early experiments, together with the developing technology and changing perceptions of how it might be used, caused a backlash against this received wisdom. It became feasible to throw into the computer larger and larger quantities of text, and retrieve on the basis of words in text rather than assigned keywords or codes. At first sight, the necessity for any kind of indexing scheme, at either end of the process, seems to disappear: the user can use "natural" language to search a "natural" language database, without any interference from librarians.

We have since come to a much more balanced view of language, though the debate continues to generate new questions as we develop our highly interactive systems. It is clear that natural language searching is a powerful device that can often produce good results economically. However, it places a large burden on the searcher, and besides, certain kinds of queries are not well served. In recognition of these points, many modern databases include both formal indexing and searchable natural language text.

Formal artificial languages (in which category I include library classification schemes) represent particular views of the structure and organization of knowledge. One idea that emerged from the analysis of such languages, and that is central to modern indexing languages as well as to the practice of searching, is that of the facet. Once it is recognized that topics and problem areas are potentially highly complex, it becomes essential to approach the problem of describing them via different aspects, or facets, and combining the resulting descriptions in a building-block fashion [8]. (The idea of a faceted classification scheme, while originally due to Ranganathan in the 1930s, was put in its most concise form by Vickery; B.C. Vickery, Faceted Classification, London: Aslib, 1960.)

Many modern indexing languages, while not necessarily following the rules of faceted classification, reflect an essentially facet-based approach to the organization of knowledge. But the approach also has value at the searching stage, whether or not the database being searched is indexed by such a language. This theme is taken up again below.

6. Boolean logic, search strategy, and intermediaries

Long before either natural language searching or faceted classification, and certainly long before the modern computer, it became apparent that certain kinds of information retrieval would benefit from the ability to search on combinations of characteristics, in a way that we would now normally represent by the Boolean operator AND. Indeed, much effort went into devising mechanisms to allow such searching. Prime examples were Hollerith's mechanically sorted punched cards of the 1890s and a scheme of optical coincidence cards first invented in 1915 [13].

Most currently available computer-based systems allow searching using Boolean logical search statements, together with a few extensions to Boolean logic appropriate to searching text; for example, an operator to indicate that two words should not only occur in the same record, but also that they should be next to each other. In this respect, they appear to differ little from the punched-card systems of the 1930s. However, we may point to two major differences. Firstly, as discussed above, we have the possibility of searching natural language text. Secondly, the systems are designed specifically to allow and encourage certain kinds of feedback during the search. In other words, it is not expected that a searcher will be able to specify, precisely and a priori, the characteristics of the desired item(s). Rather, the search is expected to proceed in an iterative fashion, with the results of one (partial) search statement serving to inform the next search.

Feedback in Boolean systems is of a very limited kind (this point will be taken up again below). Furthermore, the use of Boolean logic seems to suggest an analogy with traditional database management systems, where feedback is not normally an issue. However, the use of even a crude form of feedback is a recognition of the cognitive problems discussed earlier and a departure from the simple input-output model of information retrieval.

The problem of formulating a search and developing a search strategy for a Boolean system has received a great deal of attention [2]. This work has been informed by theoretical developments (the ASK hypothesis and facet analysis) and by the results of experiments. But one would not describe such work as theoretical or experimental so much as a codification of good practice. Searching such systems is best described as a skilled task. For this reason, it has not been the norm for scientists themselves to conduct their own searches. This does not mean, of course, that it does not happen. The companies offering search services on large text databases have been predicting for many years the dominance of end-user searching and the demise of the search intermediary. However, the intermediary has signally failed to disappear! (An intermediary is normally a specialist in the art of searching large text databases, perhaps but not necessarily with some subject knowledge. )

The impact of experimental work on the study of search strategy has not been very great. However, there was one experimental result from the 1960s that encapsulated, in a surprising way, the problems subsequently addressed by the ASK hypothesis and that was seminal in our understanding of the search strategy problem. It is therefore worth describing this result.

The experiment involved the Medlars Demand Search Service and the National Library of Medicine (NLM) in the United States [6]. At that time, in order to conduct a search on a medical topic, the user would have to communicate with the NLM, either directly or via a local expert. Local experts existed at various places around the world to help users make the best use of the service.

The requests that were collected for the experiment could be divided into those where the user talked face-to-face with an expert and those where the user wrote a letter requesting a search. The expectation was that the face-to-face communication would be beneficial to the search by enabling the development of a better search formulation. However, in the event, the reverse was the case: on average, the letter-based requests performed slightly better than the face-to-face ones.

The experimenters' explanation of this result, after studying the data, was to suggest that users often came to face-to-face meetings without clearly verbalized requests, and that intermediaries tended to suggest formulations that were easy in system terms. The letter writers, on the other hand, were forced to articulate their needs more systematically before encountering the constraints of the system.

This result made a clear link between the Taylor/Belkin view of information retrieval and the practical problems of searching and had a major impact on the training of search intermediaries.

The relation between Boolean searching and facet analysis is a simple one: an analysis of the search topic from a facet viewpoint fits naturally with a canonical form of search statement, as follows:

(A1 or A2 or . . . ) and (B1 or B2 or . . . ) and . . .

Here the separate facets (or component concepts of the topic) are A, B. . . ., and each search term A1 is one of the ways of representing concept A. A number of "intelligent" front-end systems for helping users to construct searches assume this canonical form. It is, however, an extremely limited form if interpreted literally: for example, it fails to allow that one of the Ais might itself be a phrase or a combination of concepts.

7. Associative methods

Experimenters and theorists in IR have been working for many years with alternatives to Boolean search statements. In particular, they have been using what might be described as "associative" methods, where the retrieved documents may not match exactly the search statement but may be allowed to match approximately. For example, the search statement may consist of a list of desirable characteristics, but the system may present as possibly useful items that lack some of the characteristics. Then the output of the system may be a ranked list, where the items at the top of the list are those that match best in some sense, but the list would include items that match less well.

Until relatively recently, this work had had little impact on operational systems. Most such systems continued to use Boolean (or extended Boolean) search logic. However, some recent systems have adopted some associative retrieval ideas. For reasons that will become apparent, I believe that associative retrieval offers far better possibilities for systems that genuinely help end-users to resolve information problems or ASKs. Therefore I welcome this development and indeed see it as long overdue.

There is a wide range of possible approaches to the problem of providing associative retrieval. This section will give a very brief overview of some of the approaches, and the following section will look at one particular model in somewhat more detail.

The associative approach with the most substantial history of development is the vector space model of Salton and others [11]. In this model, the documents and queries are seen as points in a vector space: retrieval involves finding the nearest document points to the query point. This model leads naturally to various associative-retrieval ideas, such as ranking, document clustering, relevance feedback, etc. It has been the basis of a number of experimental systems from the early 1960s on, and many different ideas have been incorporated at different times and subjected to experimental test. Mostly these tests have been along the lines described in section 4 above, with the system treated in input-output fashion.

A second approach is suggested by Zadeh's Fuzzy Set theory. There have been a few attempts to apply fuzzy set theory to information retrieval, though it has not received nearly as much attention as the vector-space model [19]. (This reference describes the original fuzzy set theory. Much related work has occurred since in, for example, fuzzy logic or fuzzy decision-making. An example of an application in IR is: W.M. Sachs, "An Approach to Associative Retrieval through the Theory of Fuzzy Sets," Journal of the American Society for Information Science, 27: 8587.) The main attraction for the IR application is that it seems to present the possibility of combining associative ideas with Boolean logic, although there are actually some serious theoretical problems in that combination [4]. There is a conspicuous lack of any attempt to evaluate fuzzy set theory-based systems.

A third approach is that based on statistical (probabilistic) models. Although statistical ideas have been around in IR for a very long time, most such work nowadays is based on a specific probabilistic approach, which attempts to assess the probability that a given item will be found relevant by the user. In this sense it belongs firmly with the evaluation tradition discussed in section 4 and with the ideas of relevance that emerged from that tradition, although it turns out to fit very naturally with more recent ideas of highly interactive systems. The probabilistic approach is discussed in more detail in the next section.

It is not strictly necessary to regard these three approaches as incompatible. It is possible to devise methods that make use of ideas based on more than one approach. However, they do suggest very different conceptions of the notion of degree of match between documents and queries.

8. Probabilistic models

Once it is assumed that the function of an IR system is to retrieve items that the user would judge relevant to his/her information need or ASK, then it becomes apparent that this is essentially a prediction process. These judgements of relevance have not yet happened. Or rather, if any items have been seen and judged for relevance, then those items are no longer of interest from the retrieval point of view because the user already knows about them. The system must in some fashion predict the likely outcome of the process in respect of any particular item should it present that item to the user. On the assumption that relevance is a binary property (the user would like to be informed of the existence of this item, or not), the prediction becomes a process of estimating the probability of relevance of each item and of ranking the items in order of this probability [9].

Translating this idea into a practical system depends on making assumptions about the kinds of information that the system may have on which to estimate the probability and how this information is structured. A very simple search-term weighting scheme, collection frequency weighting, seems to derive its power from being an approximation to a probabilistic function [5]. But more complex techniques may depend on the system learning from known judgements of relevance, either by the current user in respect of the current query or by other users in the past. The latter possibility has not yet, to my knowledge, been put into effect in any operational context, but the former is the basis for more than one operational system.

This is the idea of relevance feedback: after an initial search, the user is asked to provide relevance judgements on some or all of the items retrieved, and the system uses this information for a subsequent iteration of the search. Once again, the idea of relevance feedback is not exclusive to the probabilistic framework but fits very naturally within it. Indeed, the idea was first demonstrated in the context of the vector-space model.

Relevance feedback information can be used by the system partly to re-estimate the weights of the search terms originally used, but mainly to suggest to the system new terms that might usefully form part of the query. These new terms can again be weighted automatically, and might then be used automatically or presented to the user for evaluation. Thus, on iteration the search statement may not only be imprecise, it may also be actually invisible to the user. The system can locate items that the user might want to see on the basis of criteria of which the user is not aware.

Although relevance feedback seems at first glance to be not too far removed from the input-output model (being an explicit form of feedback within the same framework), and also seems to embody a relatively mechanical notion of relevance, its implications are actually revolutionary. We begin to perceive the user not as feeding in a question and getting out an answer, but as exploring a country that is only partially known and where any clue as to location in relation to where the user wants to be should be seized upon. This concept of retrieval is explored further in the next section.

An example of a system that incorporates relevance feedback is OKAPI [17]. Although an experimental system, it functions in an operational environment, with a real database of realistic size and real users, in order to allow a variety of evaluation methods to be applied. Some results of a recent experiment using OKAPI will help to inform the next section.

9. Information-seeking behaviour

A recent experiment investigated various aspects of searching or Information-seeking behaviour, including the behaviour of repeated users of the system [18]. The system was accessible over the network in an academic environment and was available to many users through terminals on their desks or very close by. There was no direct cost to using the system, and since it is very easy to use (being designed so that someone walking in off the street could be expected to be able to use it), there was no barrier of any kind to its repeated and frequent use. Individual users were logged.

What was found was that a number of users made repeated use of the system, quite often (surprisingly) starting with a query that was very similar to or even absolutely identical to their previous query. It is clear that they were not simply asking the same question again, but rather using the entry point that they already knew about as a way into this somewhat unfamiliar country, a familiar starting point for a new exploration. Relevance feedback (which is just one of the mechanisms of which they might make use) is not a matter of saying "this is correct," but rather of saying, "supposing we try this direction, where will it take us?"

Thus it seems that starting with a theoretical approach based on a traditional, input-output model of IR has led us to methods and techniques that fit very well with the ASK hypothesis and a problem-solving or exploratory view of IR. We have arrived at the right answer, but for the wrong reasons!

There are, of course, researchers in the field who are entitled to say "I told you so!" Examples include Oddy's THOMAS system and Swanson's view of retrieval as a trial-and-error process [7]. However, we do now have evidence that we are capable of providing information retrieval systems that can have a genuine impact on information-seeking behaviour in a broad sense. One task that faces us is to develop our methods and ideas of evaluation to take into account this broader view. We, researchers in information retrieval, need to know much more about how users (including scientists and technologists) approach their information-seeking or problem-solving tasks, preferably over a period of time rather than simply in response to a suddenly perceived information need [10].

Indeed, I have found it instructive now to revisit some work that was (when it was undertaken) right outside the field of information retrieval: T.J. Allen's work on communication in science and engineering [1]. What is critical here is the user's perception of his or her information environment and the sources and channels of communication that are open. One of Allen's conclusions concerned the relative importance of informal as against formal channels. The more we can design systems that appear to the user to be less formal, perhaps the better we shall be able to serve him or her. An information retrieval system should be as accessible and as easy to communicate with as a colleague in the next office; only then will the real breakthrough occur.

10. Intelligence

It may be noted that I have not yet mentioned any of the work in the artificial intelligence (AI), expert system or knowledge-based system (KBS) areas. There have indeed been many attempts to apply such ideas to information retrieval, though there is in my view less evidence for their effect or effectiveness in the context of operational systems.

The possible role(s) for knowledge bases in IR is the subject of much debate. One approach is to treat the expert intermediary as the source of knowledge, in other words to try to encapsulate the intermediary's skill in a system [16]. However, a major component of the intermediary's expertise, at least as represented in such systems, seems to be the manipulation of Boolean search statements. If we can get by without such statements, then much of the point of these systems seems to be lost.

The other kind of knowledge that, in principle, should be of use would be that embodied in a thesaurus, classification scheme, or other formalized indexing language. But such knowledge does not seem to fit very easily with established KBS ideas.

My own opinion, for what it's worth, is that the way forward may be to incorporate selective and small-scale "intelligent" (or moderately clever) methods into the associative retrieval framework, without attempting to go all the way to an intelligent system. Cleverness need not take the form expected in the current KBS tradition: a relevance feedback system based on the probabilistic model already seems quite clever to the user. Perhaps the central point is that we are attempting to provide tools to help the user solve his or her own problems; we are not attempting to solve their problems for them. Relatively simple tools may be best suited to that purpose.

References

1. Allen, T.J. (1968). "Organizational Aspects of Information Flow in Technology." Aslib Proceedings 20: 433-454.

2. Bates, M. (1987). "How to Use Information Search Tactics Online." Online 11: 47-54.

3. .Belkin, N.J. (1980). "Anomalous States of Knowledge as the Basis for Information Retrieval." Canadian Journal of Information Science 5: 133-143.

4. Buell, D.A. (1985) "A Problem in Information Retrieval with Fuzzy Sets." Journal of the American Society for Information Science 36: 398-4()1.

5. Croft, W.B., and D.J. Harper (1979). "Using Probabilistic Models of Document Retrieval without Relevance Information." Journal of Documentation 35: 285295.

6. Lancaster, F.W. (1968). "Evaluation of the Medlars Demand Search Service." Bethesda, Md.: National Library of Medicine.

7. Oddy, R.N. (1977). "Information Retrieval through Man-Machine Dialogue." Journal of Documentation 33: 1-14; D. Swanson, "Information Retrieval as a Trial-and-Error Process." Library Quarterly 47: 128-148.

8. Ranganathan, S.R. (1937). Prolegomena to Library Classification. Madras: Library Association. 2nd ed. London: Library Association, 1975.

9. Robertson, S.E. (1977). "The Probability Ranking Principle in JR." Journal of Documentation 33: 294-304.

10. Robertson, S.E., and M.M. Hancock-Beaulieu (1992). "On the Evaluation of IR Systems." Information Processing and Management. Forthcoming.

11. Salton, G. (1971). The SMART Retrieval System: Experiments in Automatic Document Processing. Englewood Cliffs, N.J.: Prentice Hall.

12. Schamber, L., M.B. Eisenberg, and M.S. Nilan (1990). "A Re-examination of Relevance: Toward a Dynamic, Situational Relevance." Information Processing and Management 26: 755-776.

13. Taylor, H. (1915). "Selective Devices." U.S. Patent no. 1165465.

14. Taylor, R.S. (1968). "Question-Negotiation and Information-seeking in Libraries." College and Research Libraries 29: 178-194.

15. Rijsbergen, C.J. (1989). "Towards an Information Logic." Proceedings of the Twelfth ACM SIGIR Conference on Research and Development in Information Retrieval, 7786.

16. Vickery, A., H.M. Brooks, B. Robinson, and B.C. Vickery (1987). "A Reference and Referral System Using Expert System Techniques." Journal of Documentation 43: 1-23.

17. Walker, S., and R. DeVere (1990). Improving Subject Retrieval in Online Catalogues, 2: Relevance Feedback and Query Expansion. London: British Library.

18. Walker, S., and M. Hancock-Beaulieu (1991). OKAPI at City: An Evaluation Facility for Interactive IR. BL Report no. 6056. London: British Library.

19. Zadeh, L.A. (1965). "Fuzzy Sets." Information and Control 8: 338-353.

Computerized front-ends in retrieval systems


Abstract
1. Introduction: The information environment
2. Definition of front-ends in retrieval systems
3. Taxonomy of front-ends
4. Examples of front-ends
5. Evaluation of front-ends
6. Directions for research and development
7. Conclusion: Implications for developing countries
References


Linda C. Smith

Abstract

The paper explores the role of expert systems and gateway software in retrieval systems as aids to individual researchers. Covered are the definition of front-ends, taxonomy of front-ends, examples of and an evaluation of existing front-ends, directions for future research and development, and the implications for developing countries.

1. Introduction: The information environment

Information technology - the set of computer and telecommunications technologies that makes possible computation, communication, and the storage and retrieval of information - has changed the conduct of scientific, engineering, and clinical research [24].

While information technology offers "the prospect of new ways of finding, understanding, storing, and communicating information''[24], there are a number of barriers to realizing that potential. The number, diversity, and scatter of information sources and systems now available in electronic form challenge the ability of individual researchers to make effective use of these resources. This paper explores the role of computerized front-ends in retrieval systems as aids to individual researchers seeking to locate information.

The complexity of the electronic information environment is well known.

Databases now may contain bibliographic records, full texts, numeric data, images, or combinations of these types. The retrieval systems on which they are mounted are diverse, with differing interfaces and means of query formulation. The interconnectivity of telecommunications networks internationally provides the potential for interactive access to retrieval systems throughout the world. While most of the early front-ends were developed as aids to accessing a small number of commercial systems and the databases mounted on such systems, these resources now represent only a fraction of the resources of potential value to the researcher. Many universities and government agencies now provide public access to on-line catalogues, campus-wide information systems, and other electronic data repositories that supplement the resources provided by commercial vendors. While access to commercial systems is billed and restricted to those who are recognized users, many of the other resources on the network do not currently have such access restrictions.

Within the scope of this paper it is not possible to review in detail the many front-ends already developed. The reader is referred to two recent reviews by Drenth, Morris, and Tseng [10] and by Efthimiadis [12] for extensive lists of references to prior work. This paper provides a framework for understanding the potential roles of front-ends in retrieval systems by exploring: definition of front-ends, taxonomy of front-ends, examples of front-ends, evaluation of front-ends, directions for research and development, and implications for developing countries.

2. Definition of front-ends in retrieval systems

Turning first to dictionaries for aid in understanding what is meant by "front-ends in retrieval systems," one finds the following definitions in the Macmillan Dictionary of Information Technology [21]:

Front-end system. Synonymous with Intermediary system.

Intermediary system. In online information retrieval, a kind of expert
system used to assist end users searching online databases. Such systems offer assistance with query definition, database selection, search strategy formulation and search revision. Compare Gateway software. Synonymous with Front-end system, Intelligent intermediary system.
Gateway software. In online information retrieval, dedicated
communications software that acts as an interface between end users and online databases. The software typically offers automatic dialing and logon, offline search formulation, downloading and may also allow text or data processing. Compare Intermediary system.

While these definitions attempt to clearly distinguish between software providing assistance with communications (gateway software) and software pro viding expert assistance in query processing (front-end systems, intermediary systems, intelligent intermediary systems), in practice the terms have not been used consistently, and a single system may perform multiple functions. For the purposes of this paper, the term "front-end" will be used to encompass all forms of assistance "which in some way, and to some degree, make the differentness or difficulty of database use transparent to the user" [40]. This assistance can range from the purely clerical, such as automatic dial-up, to fully intellectual, such as selecting the best way to modify an unsuccessful search query. The front-end controls what the user can request and how, as well as what the user is given from the external system and the way it is presented.

3. Taxonomy of front-ends

In order to characterize what has been accomplished in the development of front-ends to date as well as to suggest new directions for research and development, it will be helpful to provide a taxonomy by which front-ends can be classified. A number of dimensions are useful in distinguishing among the efforts to provide computer-based assistance to users of retrieval systems.

3.1 Resources Accessible via the Front-end

The front-end is the user's "window" on the world of information available on-line. The front-end may be highly specialized, intended to help the user in answering a particular type of question. In this case, it may link to a single database on a single system and focus on only a part of its contents (e.g., locating literature on particular cancer therapies in a database like Medline, which covers many other aspects of clinical medicine as well). A slightly more general front-end would support full use of a single database on a single system. Increased accessibility would be provided by front-ends linking to multiple databases on a single system, and finally to multiple databases on multiple systems. In this context, it is also important to note the types of resources supported: bibliographic, full text, numeric, and/or image. Recognizing the value of networks to support communication between individuals as one important type of information resource, front-ends may also assist users in navigating directories of other people accessible on the networks.

3.2 Location of the Front-end

"Front-end" implies that the software is located somewhere between the user and the system being accessed, but there are multiple possible locations. Front-ends may reside on the user's workstation, on another computer on the network, or on the host system itself. Meadow [22] has provided a detailed analysis of the trade-offs in the location of a front-end. Front-ends on the user's workstation have the advantages of reducing costs by allowing local editing of queries, supporting graphic displays [25], and performing functions such as data analysis that may not be available in the host system. However, it may be difficult to update such front-ends to reflect changes in the resources to which they are providing access. While front-ends on the host system can be maintained by the managers of the host, they may be a more costly mode of access for the user than using that same host with its "native" command language. They also limit the user to databases available on that particular host. Front-ends resident on the network computers can simplify access to multiple host systems. Use of such a front-end may add to the cost of performing a search, and the user is dependent on the developer of the front-end to accommodate any changes in the host systems being accessed.

3.3 Types of Assistance Provided by the Front-end

The definition of "intermediary system" given above indicates that front-ends could offer assistance in query definition (concept identification), database selection (identification of databases to be searched and the systems where they reside), search strategy formulation (selection of subject terms, selection of other types of terms, selection of logical operators), and search revision (broadening, narrowing, or other changes). In practice, not all of these forms of assistance may be provided by a single front-end, which may instead focus on a particular step in the process, such as database selection. Going beyond information retrieval, a front-end may also support post-processing functions, ranging from formatting records for easier reading to performing statistical analyses on the content of records. Sormunen, Nurminen, Hamalainen, and Hiirsalmi [31] have developed a quite complete requirements specification for a front-end. They analyse the types of information and decisions as well as the information sources associated with each of seven on-line search tasks: setting terminal configurations; database selection; search profile formulation; log-on and log-off; search execution; evaluating and editing; and post-processing of results. This is followed by a statement of functional requirements for the front-end together with identification of problems and issues for further development. The list includes both clerical and intellectual tasks. Several operational front-ends are limited to assisting with clerical tasks because they are easier to automate.

There have been efforts to characterize the knowledge that needs to be incorporated in front-ends to provide assistance with the various search tasks. Pollitt [26] suggests four categories: (1) system (the command language and facilities available in the search system); (2) searching (strategy and tactics to be employed in searching); (3) subject; and (4) user (knowledge about each individual user). Often there is a trade-off, with some front-ends having general knowledge to support searches of multiple databases on multiple systems while others have in-depth knowledge to support specific categories of questions and/or databases.

3.4 The Nature of Assistance Provided by the Front-end

When examining the user assistance provided by front-ends, it is helpful to distinguish between two possible roles for the computer, i.e., computer-assisted vs. computer-delegated [7]. In computer-assisted mode, the system provides advice to the user in making decisions. In computer-delegated mode, the system makes decisions automatically, given some initial input from the user. Thus database selection could be handled by suggesting some alternatives to the user or by automatically making the selection of which database(s) to search, given some initial information about the query to be answered. Buckland and Florian [7] suggest that a computer-assisted approach is likely to be more effective because the intelligence of the system and the intelligence of the user ought to augment each other. Bates [3] has also identified the need to provide optimal combinations of searcher control and system retrieval power, arguing that many users would not want to delegate the entire search process to the front-end.

3.5 User Modelling Capabilities

Front-ends are designed to simplify access to electronic information resources by masking some of the complexity. A further level of assistance could be provided by user modelling capabilities, such that responses would be tailored to particular users. As Allen [1] notes, while the term "user model" emphasizes the information about the person, situational, task, or environmental information may also be encoded in the model. In front-ends, user models could be employed to adapt explanations to the user's level of expertise as well as to adapt to user preferences. They may affect database selection, query formulation, and the natural language interaction with the user. Brajnik, Guida, and Tasso [6] point out that information about the user may be obtained through two different mechanisms: (1) external acquisition, where information is obtained in response to questions posed by the front-ends, or (2) internal derivation, where information about the user is obtained through inference from the search session. Borgman and Plute [5] call these forms of models stated vs. inferred. They also distinguish user models as being either static (an unchanging model that is embedded in the system) vs. dynamic (changes throughout the search session and over a period of time to incorporate new information about the user). They caution that user models make assumptions about users' goals and intents and make decisions for them. Therefore, "while accurate models indeed are helpful and reduce the burden on the searcher, inaccurate user models may do more harm than good"


Contents - Previous - Next