Contents - Previous - Next


This is the old United Nations University website. Visit the new site at http://unu.edu


Session 1: Access to science and technology and the information revolution


Introduction: Access to science for the benefit of mankind
Keynote presentation: the impact of information technology on the access to science


Chairperson: Huzihiro Araki

Introduction: Access to science for the benefit of mankind


References


Sir John Kendrew

I think that this audience will find no difficulty with the idea that science is for the benefit of mankind. But I have to remind you that not everyone would agree with this opinion; there are some who think that science has had a negative influence on the spiritual values of mankind, and indeed in my own country a certain Dr. Appleyard has just published a book [1] that, I understand, strongly maintains this point of view. But as I say, I believe it will not be necessary to argue this point with those present here today.

Communication is an integral part of science and of the mechanism for increasing human knowledge. The lonely genius remains frustrated and useless without communication. In mathematics the classical example was Ramanujan, a poor and uneducated Indian whose quality was only revealed when he wrote a letter containing some of his results to the Cambridge mathematician Professor Hardy, who recognized his genius and brought him to England to work. Without this letter most of Ramanujan's remarkable theorems would probably have been lost to the human race.

Communication is of basic importance at three levels. The first is communication within science itself. I think we would all agree that free communication is essential for progress in science, even though there have been problems at the frontier with technology because in the world of commerce you cannot reveal everything you are doing. These frontier problems have been exacerbated in recent times; in many fields, my own of molecular biology in particular, the old openness has to some extent disappeared. Some of my younger colleagues do not want to talk in public about their work, because they think that if they speak about their own "secret," somebody else will exploit it and make money out of it that they could have made themselves.

When I began research in the 1950s I don't think any of us ever imagined that molecular biology would one day have any practical value; we thought of it only in terms of increasing human knowledge for its own sake; but now of course it has given rise to the major industry of biotechnology and fundamental advances in medical treatment. Of course, this is only a partial view: science is also for the benefit of mankind and this is indeed reflected in the title of my present talk. But the process of transmission of knowledge from the academic world, with its tradition of openness, to the world of technology and commerce does present difficulties that have still not been resolved. In my view they are very serious problems, though they do not fall within the remit of the present meeting.

The second level at which communication is of basic importance is especially in the developing world, as the Rector has already mentioned. And in this context I am not thinking only of what, as a western European, I characterize as the South, but also of the East; many of the problems that exist, for example in Africa, are present also in the countries of Eastern Europe following, and in spite of, the big political changes there. In the developing world, in this wider sense, there is a tremendous thirst for knowledge and at the same time a severe lack of journals, of books, and of means of rapid communication at low cost. The phrase "at low cost" is tremendously important in countries where financial resources are extremely limited; and here, I think, we enter a field where the experts present at this meeting can make enormous contributions. Improvements here can help, can be of really practical assistance, in beginning to reduce the gap between the affluent nations from which most of us here come and the poor nations of the world - a gap that, alas, in spite of all that has been done so far, is still increasing rather than getting smaller. In the most literal sense the future of the human race depends on changing the increase into a decrease, and the new technologies to be discussed at this meeting can make a very important contribution.

My third problem of communication is one that more particularly concerns the affluent countries. I refer again to the development of an anti-science movement, indeed in some circles an anti-intellectual movement. It becomes increasingly difficult to persuade young people to enter the profession of science. In the United States, for example, young foreign scientists are increasingly being imported to fill the gap. And even in Japan, where a few years ago problems of this kind seemed to be non-existent, we read that there are now difficulties in recruitment to the profession. At bottom, the problem is one of communication: young people simply do not understand the importance or the excitement of science. They need better education, and so indeed do their parents who give them their values. And beyond them, ministers and administrators need to know more of science. These are often very competent and well-trained people, but their education was generally not in science; and this is why in most countries governments do not provide adequate support for science and do not understand that science is at the basis of the civilization and health that they enjoy, and of the improvements for which they hope in the future.

So we have a whole set of world problems that demand better communication for their solution. In this meeting we are to discuss the technical means for communication - only a part of the problems, but certainly a part sufficiently important to justify the United Nations University's choice of the topic for this symposium.

Historically the first of the mechanisms for communication was the human voice; in eighteenth century Europe, scientists used to travel a great deal and discuss their results verbally; and this is a mechanism very important today on a larger scale, with the proliferation of international meetings; the only problem here is to provide funds and facilities for younger scientists of all countries, including the developing ones, to attend them. It's all very well to have professors moving around to talk to one another, but it is perhaps even more important for young people to do so. At least some of the political obstacles to travel in Europe have been removed, but the financial ones remain there and in many other parts of the world.

Next came the written word, the journals with their scientific papers and the newspapers with their mission of educating and informing the public. Here I believe we have a relatively new difficulty. In a recent issue of Nature you will find a very interesting article by Professor Donald Hayes of the Department of Sociology at Cornell entitled "The Growing Inaccessibility of Science" [2]. It is an account of a piece of statistical research examining the language used over a period of years in various types of publication. He allots an index of intelligibility to the language used by various types of publication; for an internationally read newspaper the index is 0; technical articles in scientific journals are in the range +40 to +60; fiction is-20; casual conversation between adults about -40; and at the bottom of his scale, farm workers talking to cows are about -60. He then examines popular science journals; until 1947 Nature was near 0, since when its index has risen decade by decade, until today it is about +30; Science began in 1883 at-8.5 and took off from a near-zero level around 1960; today it is +28; Scientific American remained near zero until 1970 and then increased; when its index reached +15, there was a decline of over 125,000 subscribers; when later its index dropped back to + 10, there was a coincident increase in subscriptions. These figures mean that even for professional scientists much of the literature is unintelligible except to those actually working in the same field - including, by the way, most of the contributions to the present symposium! Experiments like Basic English have been abandoned, and we see no solution in sight. Is this adequate communication? How can we expect to communicate with citizens at large when we cannot even communicate with other scientists?

One of my professors was Lawrence Bragg, who held the view that young scientists should not read too many journals; if they did, he believed, they would probably discover that the experiment they had thought of doing had been done already; if they did not, they would do their experiment with an open mind and might discover something new. He also used to say that if you did read a journal, the important thing was flipping over the pages so that your eye might be caught by something entirely unexpected. I don't think he would have approved of abstract journals or Current Contents, or of the computerized database searches that are commonplace today.

Another of my professors was Desmond Bernal, and I have been thinking a great deal of him in these last days because just 44 years ago the Royal Society held its Scientific Information Conference [3] with a purpose not dissimilar to that of our meeting today, discussing as it did some of the then contemporary technical advances like punched cards and microfilm. I was present myself when Bernal proposed his scheme whereby journals should not be published in their present form but only as a set of titles and abstracts, so that you would write in to the editorial office and request an offprint of papers that interested you rather than subscribing to the whole journal. Well, of course the idea got nowhere, but it has been echoed in recent times with the promotion of publications like Current Contents. Of course in those first postwar years, one spent an immense time sending off postcards to authors asking for offprints; and a little later on, when photocopiers became common, in copying papers page by page - another technical advance of great importance.

Now we have in our hands a still more important technical advance, the computer; and this has given us access to databases and abstracts, and simple pieces of software like word processors have enormously facilitated the preparation and revision of manuscripts for press and the handling of them by the printer. Now things move still further; I went to a meeting last September in the United States where I heard an account of a new journal in the medical field that will be entirely computerized; that is to say, you will read it by calling down papers onto your computer screen, and then copying anything you wish to retain: still another step in the direction of that "unrealistic" 1948 proposal by Desmond Bernal.

Of course another important development in this field is electronic mail. I personally use it every day and I only wish I could give a better report of its efficiency. The last of the Rector's symposia was about chaos, and this would aptly describe what often goes on in electronic mail: dozens of different networks with different types of address, not always linking with one another; the same address has to be read backwards or forwards depending on whether you are in the United States or in the United Kingdom. In spite of months of effort, I have been able to establish only one-way communication between myself and a certain important international organization located in Belgium. It's not a technical mess, but it is certainly an administrative mess, in strong contrast with the telefax system, which is well standardized and in general provides excellent, trouble-free service. I mention electronic mail because I believe it is particularly important for isolated laboratories and in particular for scientists in the developing world (where the problems are not of course just administrative but also technical, since the system depends on adequate telephone links). If electronic mail works, and you are trying to do an experiment described in the literature, you can ask one of the authors for help and get the answer back in just a few minutes.

Communication in science is a means to an end, and the technical advances we are going to discuss are means to that means. But they are tremendously important, not only to those of us who are fortunate enough to work in the advanced countries of the world, but even more so to those in developing countries who cannot hope to improve their position without the provision of these technical means at prices they can afford.

References

1. Appleyard, B. (1992). Understanding the Present: Science and Soul of Modern
Man. London: Picador.
2. Hayes, D.P. (1992). "The growing inaccessibility of science." Nature 356: 739.
3. The Royal Society (1948). The Royal Society Scientific Information Conference (21 June-2 July 1948). The Royal Society.

Keynote presentation: the impact of information technology on the access to science


Abstract
1. Introduction
2. Diversity of information requirements
3. Numeric and factual databases
4. Evaluation and quality control
5. Traditional access mechanisms
6. Electronic access to scientific data
7. Data as an international commodity
8. The future
References


David R. Lide

Abstract

Access to scientific information is crucial to continued scientific advance and to technological progress. After discussion of the diversity of information requirements, this paper takes up mechanisms for the organization of data. The importance and advantages of computer and telecommunications technology in access to scientific data are described, and a brief overview is given of the availability of numerical and factual databases in different areas of science. The paper concludes with a consideration of data as an international commodity and of prospects for future developments.

1. Introduction

It is no coincidence that the birth of modern science followed shortly after the introduction of the printing press, more reliable sea transportation, and the other technological innovations that brought Europe out of the medieval world. Science could not have developed in the way it did without the capability of scientists to communicate their results, ideas, and speculations to each other. While a few profound advances in science have resulted from the insights of a single individual working in isolation, closer analysis always shows that those insights rested upon a body of information developed by other scientists, often over a long period of time, and made available to their colleagues through one or another mechanism of information transfer. The importance of access to scientific information experimental results, interpretations, and theories - looms even greater when one looks at the translation of basic scientific advances into useful technology that improves the lot of mankind. Advances in our understanding of nature would have little impact on humanity at large if the knowledge remained confined to the laboratory or university. An effective mechanism for communicating that knowledge to scientists and engineers is crucial to the development of new and better technology that benefits us all.

This paper will give an overview of the ways in which modern information technology is affecting the access to scientific and technical information. The development of digital computers and high-speed communication networks has already had profound effects on information storage, retrieval, and dissemination. Nevertheless, we are probably still in the early stages of this electronic revolution. Even in highly developed countries, only a small fraction of scientists make significant use of the information technology now available. We can expect changes in the next 40 years just as dramatic as those we have seen in the 40 years since digital computers entered our lives.

2. Diversity of information requirements

Access to technical information is crucial to all phases of the scientific process. However, a scientist's information needs can vary from simple items like laboratory instrument manuals and catalogues to megabytes of data telemetered from a space probe millions of miles away. In discussing the effect of modern information technology on data access, it is helpful to break the subject down into several categories of information requirements; for example:

- access by research scientists to raw data obtained elsewhere
- access to the archival scientific literature
- access to reliable factual data
- access by government officials and the public at large to scientific findings that affect the general welfare

These aspects will be discussed in turn and important differences noted.

In certain fields of science, the common pattern is for many scientists, often in different countries, to collaborate on analysing the data obtained in a single large facility. One example is high-energy physics, where a few extremely expensive particle accelerators provide data for a worldwide community of theoretical physicists. Another is space science, where the data gathered by satellites and space probes are distributed to many investigators for interpretation. In such areas, networks are already in place that allow access to massive amounts of data by dozens (sometimes hundreds) of researchers who are collaborating on a project. Electronic bulletin boards and computer conferences allow the participants to try out new ideas and obtain their colleagues' reactions virtually in real time. This has introduced a new dimension to scientific collaboration, especially at the international level. While those involved in "big science" led the way, many others who are working on more modest research problems have adopted the same approach. We can expect a rapid growth of this type of research collaboration as low-cost, high-capacity networks are introduced throughout the world. The possibilities for bringing third-world scientists into collaborations of this kind are particularly intriguing.

The second type of access is to the archival scientific literature. As this literature has grown over the last generation, it has become increasingly difficult for scientists to follow their fields of interest. The introduction of on-line searching of abstract files 25 years ago has been a major factor in alleviating this problem. This facility is now available in every major field of science, making it possible to search millions of papers in a very short time and retrieve citations to pertinent documents. The next step in this evolution will be to make the full text of scientific papers accessible electronically. Experiments of this type have already started. The American Chemical Society provides on-line access to its journals, but without the graphical items. European publishers in the biomedical area have established the ADONIS program, which provides current journals to libraries in CD-ROM form. A purely electronic journal, Current Clinical Trials, has been started by the American Association for the Advancement of Science to publish papers on testing of new drugs. A new "paper" is accessible via an on-line network within 24 hours of its acceptance by the editor. Thus, the momentum is building for a transition from the traditional printed journal, which has served as the archival record of science for the last 300 years, to a new pattern of electronic dissemination.

Access to numerical data and other forms of factual information presents a different set of considerations. This type of information has traditionally been published in handbooks and compilations; it represents a distillation (ideally, including a measure of critical analysis) of the data reported in the archival literature. Such data are needed at every level, from basic research to engineering. Great strides have been made in the use of computer technology for accessing this kind of data; a more detailed discussion appears later in the paper.

The final topic deals with the needs of public officials and private citizens. In the early days of computers, certain visionaries predicted that every government official would soon be able to access the full information base of science, leading to a better understanding and wiser decisions. Expensive demonstration systems have even been built for this purpose. However, the reality has not quite met the promise. There are too many opportunities for a non-technical person to misinterpret or misuse the results that he can instantly access via an electronic system. Quick access is not so important as a balanced, intelligent analysis of the information. In this arena, the human mind is still far ahead of the computer.

It should not be inferred that these dramatic advances in the techniques for accessing scientific information have occurred without problems. In fact, the introduction of electronic technology has put a great deal of stress on our traditional information mechanisms. There is a widespread perception in the scientific community that electronic access is too expensive. On the other hand, many organizations in the information business have found that the revenue from new electronic services does not make up for the ensuing loss of income from their traditional printed products. The scientific societies that publish scholarly journals in their fields are deeply concerned about the economics of the new media. Many questions of copyright and protection of intellectual property in the electronic age remain unsolved. The institution of peer review, which has been so important in maintaining the integrity of scientific publications, is threatened by the computer bulletin boards offering rapid but unscreened access to new research. New mechanisms will be needed to assure that scientists receive proper credit for their intellectual contributions, since the origin of individual pieces of data is sometimes lost when they are incorporated in large databases. Like all upheavals, the information revolution is causing its share of disruptions in the behaviour patterns and culture of the scientific community.

3. Numeric and factual databases

One class of scientific information where the new technology promises to have major impact is the hard factual data, usually numeric in nature, which form the lifeblood of science and are an essential ingredient in the transfer of scientific knowledge into useful technology. At its most simplistic level, a scientific datum involves an object and an attribute; examples are the boiling-point of benzene, the radius of the earth, and the gestation period of the elephant. Of course, it is usually necessary to specify certain auxiliary parameters if the meaning of the datum is to be completely clear. Thus the boiling-point of a liquid is ambiguous unless the pressure is specified, and the value of the radius of the earth differs between the equator and the poles. It is often necessary to associate a complex set of "metadata" with each data point in order to make that datum useful.

The preservation and dissemination of such data have been recognized as crucial since the beginning of modern science. In our own time, the cost to society of acquiring scientific data has risen enormously, not only in regard to data produced by expensive devices such as satellites and particle accelerators, but even for routine measurements done in the laboratory. It is therefore incumbent on us to assure that scientists and engineers have ready access to all the existing data that might expedite their work.

The particular field of science has a large bearing on the nature of the data encountered in that discipline. In particular, data tend to have different characteristics in the physical sciences, geosciences, and biosciences. The table is taken from a report [8] prepared by CODATA, the ICSU Committee on Data for Science and Technology. This table indicates the various ways to categorize data and gives examples in the three broad areas of science. While the details of the classification are not relevant to this paper, it is important to note that most physical science data are independent of location and time; in principle, the measurements can be repeated at a different place and time with the same result. Furthermore, much of the data in physics and chemistry can be analysed in terms of well-established quantitative theories. It is therefore possible to cross-check data against theory and compare them with data on other materials. This provides a means of evaluating a set of data to establish its level of quality and, in some cases, to represent a large amount of data in a concise mathematical form.

In the geosciences, on the other hand, much of the pertinent data is location dependent and some (such as the data associated with earthquakes or solar flares) come from non-repeatable observations. Bioscience data, at least of the classical variety, are dominated by the variability of living organisms. Thus one must specify not only the central value of some characteristic, but also the range of values found, and sometimes even the form of statistical distribution. Because of these discipline-dependent factors, the design of a data storage and dissemination system must be approached very carefully, taking into consideration the inherent nature of the data and the way they are going to be used.

Varieties of categories of data

 

Categories of data

Chemistry/physics

Geo-/astro-sciences

Biosciences

a1 Data that can be measured repeatedly Most data Geol. structures, rocks Accel. due to gravity Fixed stars Most data
a2 Data that can be measured only once   Volcanic eruptions Solar flares, novae Rare specimens Fossils
b1 Location-independent Most data Minerals Global tectonics Most data, excluding extraterrestrial
b2 Location-dependent   Rocks, fossils Astronomical data Meteorological data Rare specimens Fossils
c1 Primary observational or experimental data Optical spectra Crystallographic F-values Seismographic records Weather charts Physiological data (e.g., respiration rates, blood volumes, etc. ) Biochemical data (e.g composition of tissues and organs)
c2 Combinations of primary data with the aid of a theoretical model Fundamental constants Crystal structures Fossil zoning Temp. distribution in sun Genetic code Body surface area Model of vascular bed Dimensions of tracheobronchial tree
C3 Data derived by theoretical calculation Molecular properties calculated by quantum mechanics Solar eclipses predicted by celestial mechanics Prediction of phenotypic expression from genotypes
d1 Determinable data Most macroscopic data Elements of planetary orbits Gene loci Chromosome numbers
d2 Stochastic data Polymer data Structure-sensitive properties Soil and rock composition Solar flares Frequency of visible meteors per unit interval Most data
e1 Quantitative data Most data Seismic data Meteorological data Physiological data Biochemical data
e2 Semi-quantitative data Mohs hardness scale Wind force scale  
e3 Qualitative data Chemical struc. Formulae
Properties of nuclides
Rock classification
Classification of stellar spectra Fossil shapes
Amino acid sequences
Taxonomic classification of organisms
f1 Data presented as numerical values   Meteorological data Physiological data
Biochemical data
f2 Data presented as graphs or models Phase diagrams
Stereoscopic molecular diagrams
Molecular models
Geological maps
Weather maps
Sky mapping at a particular radio frequency (e g., 21 cm)
Metabolic pathways
Electrocardiograms
Electroencephalograms
f3 Symbolic data   Lithology in bore hole data  

Note: A given group of data can be categorized simultaneously by several "facets" a, b, c, etc.; for instance, the nature of meteorological data characterized as a2, b2, c2, d2, e1, and f1 (or f2).

4. Evaluation and quality control

The quality of experimental or observational data may vary widely, depending on the care taken by the scientist who did the research. Furthermore, most measurements depend on some form of calibration, which can change over the years. The risk has long been recognized, especially in the physical sciences, of assuming a piece of data taken from the literature is valid without further checking. A distinct methodology of data analysis and evaluation has evolved, leading to compilations of "evaluated data" that can be used with confidence by the general scientific community. The details of the methodology vary with the type of data but it usually includes a careful study of the way the measurement was made (as described by the author); application of various corrections needed because of changes in temperature scale, fundamental constants, and the like; and comparison with applicable theory. Ideally, this evaluation procedure is applied systematically to a large body of data, so that any discrepant numbers are more visible.

This approach to quality control is not so easily applied in the geosciences and biosciences because of the different nature of the data, as already discussed. The most important consideration is to establish quality control before the experiment or observation is made. Thus the calibration of the instruments should be carefully documented and a valid statistical design established. Nevertheless, an independent peer review after the results are published often turns up errors and inconsistencies.

5. Traditional access mechanisms

Until the present generation, most scientific data were stored as ink on paper, in the form of tables of numbers or graphs. These data can be accessed in the archival research literature that is preserved in major libraries. Some journals maintain depositories, often as microfilm, where authors can put additional data too voluminous to print in a journal article. However, retrieving data from the primary literature is not an easy task, even with the help of abstracting services. Most abstracting and indexing services are oriented more to concepts, ideas, and theories than to the data content of a paper.

In the physical sciences, the need to aggregate and organize the data in the primary literature became apparent more than a century ago. The great German handbooks, such as Beilstein, Gmelin, and Landoldt-Bornstein, were started at that time in order to give scientists easier access to data. This represented a great advance, and the handbooks still function today. Another important step occurred in the 1920s, when the International Critical Tables were published [13]. This project introduced the idea of critical evaluation and selection of the best data, rather than simply recording all data found in the literature. More recently, other publication outlets for evaluated data in physics and chemistry have appeared. The Journal of Physical and Chemical Reference Data was started in 1972 as a joint project of the American Chemical Society, American Institute of Physics, and the National Bureau of Standards. This journal publishes papers with recommended data based upon an evaluation of all pertinent values found in the literature; the method of evaluation and criteria for selecting the data are fully documented. Somewhat similar publication series have been started in Germany and in the former Soviet Union. International organizations such as CODATA, IUPAC (the International Union of Pure and Applied Chemistry), and IUCr (the International Union of Crystallography) have also published many high-quality data books. Such efforts not only make data easier to locate but also assure that the user gets the most reliable values.

Much data in the geo- and biosciences is also preserved in the primary literature and in handbooks and compilations. In addition, certain types of data have traditionally been kept at the site of the measurements or in special depositories. Museums and culture collections are important repositories for biological data. Astronomical observatories have collections of photographic plates of stellar observations going back many years. The system of World Data Centers was set up by the ICSU at the time of the 1957 International Geophysical Year to preserve data from various geophysical observations, including earthquakes, solar flares, and tidal waves. An important feature of the World Data Centers is that records are duplicated at several sites throughout the world, in order to protect against loss of data through a disaster at one of the centres.

Patterns for storing and making accessible scientific data, even before the computer age, were therefore quite diverse. Many directories have been prepared to help scientists locate data, especially data in fields outside their own specialties. Two recent efforts of this type can be mentioned. CODATA has produced a CODATA Referral Database in computerized form that contains descriptions of data centres and depositories throughout the world [3]. It does not supply factual data but is intended to guide a user to organizations that can possibly provide the data needed; many of these data sources are particularly oriented to developing countries. The International Council of Scientific and Technical Information (ICSTI) has recently published a directory of numerical databases [11] that is also a useful guide.


Contents - Previous - Next