Contents - Previous - Next
This is the old United Nations University website. Visit the new site at http://unu.edu
As systems become large, basic issues of organization assume an importance of their own. As long as the total size of the system is much smaller than the effective size of the machine, one can be fairly cavalier about simply putting things together and running them. But with very large systems new issues arise as to how the codes are to be accommodated on the machine. Depending on the size and characteristics of both the system and the machine, the consequences of trying to ignore the impact of size may be as gross as vastly increased costs or inability to run at all, or as subtle as slightly reduced performance or response. Independently of what the host machine or operating system looks like, it is generally desirable to keep as low as possible the host system's perception of how large the applications system is in operation (the working set). At the same time, the user's perception of application system size should be one of a vast and growing collection of capabilities and functions. While the user will be happy to see more facilities in the system, that happiness will disappear quickly if the presence of additional facilities causes last year's task to operate more slowly or at higher cost. A system may be thought of as having two separate "sizes," one determined by the machine resources required to execute its individual programs, and the other determined by its total extent when stored. If a system has many and diverse capabilities, no single user is going to use more than a small fraction of the system on any given day or even in one lifetime. As a result, the first of these sizes must depend on what the user is doing in a particular session - the programs being run - rather than on the total system size (the second type of measurement). Very few operating systems provide facilities that make this easy, and simulating it in an application is quite difficult. An application can simulate it, however, and a few existing systems and packages have managed to do so, but it is not to be taken lightly and requires an intimate understanding of the operating environment that must be tricked.
An application that provides linkage facilities that depend only on what programs the user asks for, as they are asked for, has several advantages in addition to keeping the bills down, the performance up, and the machine requirements at a minimum for the user's application. Among the most important of these is the ability to utilize user-supplied codes to supplement system facilities without having to create private copies of the system or major components of it. Such copies are a liability, not merely because of the costs or inefficiency they entail and the burdens they place on the user, but because they promote retention of ancient versions of systems and codes and subsequent conversion problems.
Very large virtual memories may provide some of the facilities that will help in tricking an operating system into behaving well when confronted with large systems, but they are not in themselves a solution. Misuse of such memories can introduce severe performance degradation. The major benefit of large virtual memories, in addition to programming convenience, is that as application size limits are reached the system degrades more or less gracefully; in the absence of virtual memories, the same situation leads to catastrophic system collapse. The goal and the challenge should be to avoid both alternatives, especially when the amount of software and data actually being used fall well within the host system's natural limits for application sizes.
The issue of stability is very important if the system is to be large and to have a long life expectancy. The problem is similar to that of advanced operating systems: the developers and their staff want things to be better, current, and as sophisticated and clever as possible; programmers want a completely stable interface (this is especially true of those amateur users who write a little code to add one small facility to the system to make it perfect for their purposes). Users also believe that they want commands and results that are predictable: what worked last year should work today in exactly the same way. On the other hand, they also want the programs, algorithms, functional capabilities, and data to reflect last month's journal article. Of course, no one but developers sees incompatibilities among these objectives.
An obvious possibility is to make a rule that a routine, once installed under a particular name, has the same behaviour forever, and that new things or versions of things must have new names. This is an easier approach to apply to a subroutine library - especially a subroutine library from which users are permitted to copy and imbed "obsolete" routines that have been replaced by newer ones with different names' - than to a major integrated collection, where such an approach requires a great deal of discipline on the part of the maintainers and tolerance on the part of the users. Unless the architectural and linkage problems have been solved exceptionally well, the accumulation of semi-obsolete codes will also cause degradation and high disk costs much sooner than if the growth of the system resulted from an extension and replacement strategy alone.
While the stress here has been on programs rather than on data, the problems with data are quite similar. The ability to reproduce an old result may require that the associated data be retained forever (or nearly so), even when it has been found to be out of data or substandard. There have been a few cases in which users have rather carefully built procedures to compensate for data inadequacies that they were aware of, only to have their procedures produce seriously incorrect answers when the original data values were replaced with "corrected" ones. There are strong arguments for being able to associate particular vintages of data with particular analyses or studies, but such requirements impose great difficulties on the system's design and its human managers. So the data and program problems are not very much different after all.
Many of the problems - with programs if not with data - that have been discussed here can be avoided by designing a system around primitive tools that provide no more facility than what is necessary for a user to put the things together to produce the computations needed. Such a system provides adequate facilities for the right user, tends to be very extensible, and can typically be kept very small in spite of being integrated and powerful. Most important, such systems are conceptually very simple. However, they do tend to be disastrous for unsophisticated users and even sophisticated users spend too much time fussing around with the tools themselves. In a rich environment, that fussing often has more to do with the process of moving objects back and forth - looking for tools to make square pegs fit round holes than with anything substantively interesting. Further, all other things being equal, systems of primitives tend to be slower in operation than higher-level integrated systems, and are sometimes so slow as to result in poor response rather than merely poor resource consumption. None the less, such an architecture may be a reasonable choice for some audiences.
The critical challenge in developing a large integrated system is the same as the critical challenge in developing any system: figuring out what the real goals and requirements are, and what is to be sacrificed in order to get there. The costs, complexities, and additional problems that arise when systems become very large, or when the integration requirements become more stringent, are sufficiently major that the decision to build such a system should be justified on the basis of real requirements.
Once the decision to build a large and integrated system has been made, and the necessary resources for planning, building prototypes and tools, examining design issues, and actually building the system have been secured, the short-term implications of a variety of questions that really make sense only in the long term must be considered. Each of these questions and issues poses a significant challenge for which there are no clear answers that are right for all cases. Those discussed in this paper that directly affect the development process itself include the choice of the operating base, the language of implementation, organization of the system, and how stability and growth are to be managed and accommodated over time. More user-related issues include documentation, command languages, presentation and output, and how to permit users to extend the system when needed. A final design concern is how to contain damage resulting from incorrect decisions, which are inevitable no matter how much care is taken. The challenges are nearly overwhelming, but can and must be met.
1. American National Standards Institute, Proposed American National Standard Programming Language BASIC, ANSI BSR X3.113-198X, 1983.
2. American National Standards Institute, American National Standard Programming Language COBOL, ANSI X3.23-1974. Revision (ANSI BSR X3.23-198X, June 1983; also ISO DP 1989.2) undergoing public review, autumn 1983.
3. American National Standards Institute, American National Standard Programming Language PL/I, ANSI X3.53-1976. Equivalent document approved by ISO, as ISO 6160-1979.
4. American National Standards Institute, American National Standard for Minimal BASIC ANSI X3.60-1978.
5. American National Standards Institute, American National Standard Programming Language PL/I General Purpose Subset, ANSI X3.74-1981. Equivalent document under ISO review as ISO DP6522.
6. American National Standards Institute, American National Standard Programming Language FORTRAN, ANSI X3.9- 1978. Equivalent document approved by ISO, as ISO 1539- 1980.
7. K. N. Berk and 1. Francis, "A Review of Manuals for BMDP and SPSS," J. Amer. Stat. Assoc., 73, 361: 65-71 (1978).
8. S. Buhler and R. Buhler, p-STAT 78 User's Manual (P-STAT Inc., Princeton, N.J., 1979).
9. G. J. Culler and B. D. Fried, "The TRW Two-Station, On-Line Scientific Computer: General Description," in M. A. Sass and W. D. Wilkinson, eds., Computer Augmentation of Human Reasoning (Spartan Books, Washington, D.C., 1965), pp. 66-67.
10. R. Dawson and J. C. Klensin, "User Extensions to Statistical Software," in American Statistical Association, 1980 Proceedings of the Statistical Computing Section (American Statistical Association, Washington, D.C., 1980), pp. 332-334.
11. US Department of Defense, Reference Manual for the Ada(R) Programming Language, ANSI/MILSTD-1815A-1983. (Ada is a registered trademark of the US Department of Defense).
12. 1. Francis, R. Heiberger and P. Velleman, "Criteria and Considerations in the Evaluation of Statistical Packages," Amer. Stat., 29: 52-56 (1975).
13. Intemational Mathematical and Statistical Laboratories, Inc., IMSL Library 2 Reference Manual (IMSL, Houston, Tex., 1975).
14. International Standards Organization, International Standard Programming Language - Pascal, ISO 7185, 1983. Also published in the US as ANSI/IEEE770X3.97-1983 and in the UK as BS 6192-1982.
15. J. C. Klensin, "Short-temm Friendly and Long-term Hostile?" paper presented at the Conference on Easier and More Productive Use of Computer Systems 20-22 May 1981, Ann Arbor, Mich., and reprinted in SIGSOC Bulletin, 13, 2-3: 105-110 (1982).
16. M. R. Muller, "A Review of Manuals for BMDP and SPSS," J. Amer. Stat. Assoc., 73, 361: 71-80 (1978).
17. Numerical Algorithms Group, NAC Fortran Library Manual, Mark 10 (Numerical Algorithms Group, Oxford, 1983).
Contents - Previous - Next