Contents - Previous - Next


This is the old United Nations University website. Visit the new site at http://unu.edu


Session 4 : Intelligent access to information: Part 2


Machine translation
The new world of computing: The sub-language paradigm
Real-world computing and flexible information access: MITI's new programme
Discussion


Chairperson: Meinolf Dierkes

Machine translation


Abstract
1. A brief history of machine translation
2. System configurations
3. Ability of current machine translation systems
4. Introduction and use of machine translation
5. Evaluation factors of machine translation systems
6. Japanese machine translation systems
7. Japanese governmental efforts
8. Dictionary
9. State of the art in Europe and the United States
10. The international association for machine translation
11. The future of MT


Makoto Nagao

Abstract

After a quick review of the history of machine translation, the paper sketches the machine translation process. The capacities of currently available systems are indicated. The installation and use of these systems are described, as are the factors to be considered in their evaluation. The Japanese efforts and experience, as well as international experience and prospects, are outlined.

1. A brief history of machine translation

An attempt at computer use for translation was proposed by Weaver around 1945, at the very beginning of digital computer development. During the 1950s, there were several research efforts on machine translation. A famous system is the Georgetown University-IBM joint project of machine translation from Russian into English. The US Government invested a significant amount of research money in this area in the late 1950s and early 1960s, mainly to develop the Russian-English system. Around 1960, 20-30 widely scattered research groups in the United States were involved with machine translation and natural language processing. However, several years of research effort did not produce really usable translation results, and therefore the US Government stopped funding machine translation research and development in 1966. After that, the boom of machine translation there faded quickly, and US researchers turned to the more basic field of computational linguistics. Europe, Japan, and the Soviet Union started machine translation research in the middle of the 1950s. The Electrotechnical Laboratory in Tsukuba, Japan, demonstrated EnglishJapanese machine translation in 1959. Kyushu University constructed a machine translation computer that translated Japanese to English or German and the reverse. I started research on machine translation and natural language processing in 1961 when I finished my M.Sc. degree and entered research life at Kyoto University.

At that time, we had many difficulties in natural language processing by computer because we could not handle Chinese characters. We had to struggle for more than 10 years to solve these difficulties. By around 1975, we had arranged for relatively easy Chinese character input and output. But really satisfactory input/output was finally reached with the widespread usage of Japanese word processors at the beginning of the 1980s. Although we encountered many difficulties, we completed a comparatively good machine translation system called TITRAN in 1978. This translated the titles of research papers from English into Japanese. It was used on a trial basis by the researchers at Tsukuba Science City to quickly scan current research information. The success of this system encouraged the development of Japanese-English and Japanese-French title translation systems, which were completed in 1980 and 1981, respectively.

The Japanese Government, attracted by the success of this title translation, launched a national project of machine translation that aimed at the translation of abstracts of scientific and technical papers between Japanese and English. The project started in 1982 and lasted four years. I was the project leader, and four organizations, namely Kyoto University, Electrotechnical Laboratory, Tsukuba Information and Computing Center, and Japan Information Center of Science and Technology (JICST), cooperated with each other. We constructed two prototype systems, one from Japanese to English and the other from English to Japanese. After the project was completed, the Japanese to English system was further improved at JICST, and the dictionary was enlarged from 70,000 words to 540,000 words and is now in daily operation at JICST. An abstract that is composed of about 300 Japanese characters on the average is translated in 25 seconds. Some pre-editing and post-editing are done by outside companies, which require about three weeks for 1,000 abstracts. This translation speed is more than twice that of human translation. The total cost of translation by machine is about 60 per cent that of human translation. JICST is now translating about 91,000 abstracts of Japanese scientific and technical documents into English in a year. These translated abstracts are in the JICST database and can be accessed from the United States and Europe as well as from inside Japan.

This national project had a great impact on Japanese industry. Many computer companies and other information-related companies started the development of machine translation systems in the early 1980s. Nowadays, there are about 10 commercial machine translation systems, some from Japanese to English and others from English to Japanese. Furthermore, several other companies are developing machine translation systems for commercialization.

Figure 1 Machine translation process

2. System configurations

The machine translation process is shown in figure 1. Morphological analysis determines the word form, including inflections, tense, number, part of speech, and so on. Syntactic analysis determines which word is the subject, which the object, and so on. Semantic and contextual analysis determines the correct interpretation of a sentence from the multiple results produced by the syntactic analysis. Syntactic and semantic analyses are very often combined and executed simultaneously. The result is the internal representation of a sentence. Then the internal representation in the target language is often the same as that of the source language, but sometimes a change in the internal representation is required. The generation phrase is just the reverse of the analysis process.

Present-day machine translation systems are still quite imperfect, and we must be skilled to use such imperfect systems. However, if they are used with care we can profit from machine translation in both translation speed and translation cost. The technology will make rapid progress, experience will be accumulated, and grammars and dictionaries will be improved step by step. Therefore, we should not underestimate the usefulness and the cost-effectiveness of machine translation in the near future.

A machine translation system consists of several components, as shown in figure 2. For the analysis of a source language sentence, a set of grammar rules and the dictionary of the source language are necessary. The same applies for the target-language generation.

The transfer dictionary contains information on the correspondence between the source language word (phrase) and the target language word (phrase). The rules of grammar for a language usually range from a few hundred to more than one thousand, and dictionaries include 50,000 or more ordinary words and many special terms for a subject field.

One of the obstacles in machine translation systems is text input. If electronic media such as floppy disks are the usual way of text input, there is no problem. But if printed pages are to be input, either human typing or optical character reading (OCR) has to be used. In the first case, the speed and the expense of typists are the problem. In the other, the problem is the expense of an OCR and the proofreading and error correction by humans. Japanese text input is very difficult for European or American organizations. They can use Japanese OCR, but they have to employ Japanese persons for error correction tasks.

Machine translation is automatic, but the translation output is poor in quality. Errors are included in translation and sometimes no translation comes out at all. The machine gives up on the translation when an input sentence is too complex. Therefore, post-editing is an essential task for current machine translation systems. For human translation, two-stage translation is often adopted; namely raw translation and brushing-up. Post-editing in machine translation may correspond to the brushing-up process. Some brushing-up is required even for human translation, and this is, of course, much heavier in machine translation. Post-editing is done either on printout paper or on a computer screen. The latter is more profitable and economical because no printout is required during error correction. Editing, such as the insertion of figures and tables, is also very easy on computer screens nowadays. Page formatting can be done very conveniently on the computer, and only the final results need be printed out.

Pre-editing is sometimes required. In particular, the segmentation of a long sentence into two or more shorter sentences is very often performed. Pre-editing and post-editing have a certain correlation. When a heavy or elaborate pre-editing is performed, very simple post-editing is sufficient, and vice versa. However, two persons have to be employed in this case, one for pre-editing who knows the source language, and the other for post-editing who is a native speaker of the target language. In any case, the post-editing is unavoidable, and the overall cost of pre-editing and post-editing is generally much more than that for elaborate post-editing alone. Therefore, many machine translation users do only post-editing.

Figure 2 Machine translation system

The organizations that introduce machine translation systems must not imagine a situation in which "everybody who needs copy comes to a copying machine, puts in documents, presses a button, and receives copies." Present-day machine translation systems are not that simple. Organizations must employ professionals to operate a system with pre-editing, post-editing, dictionary maintenance, and so on. An organization must sometimes change the flow of its documents to achieve the maximum from a machine translation system. Such changes include the training of people in document-writing sections to write in a clear, unambiguous style. This is ultimately profitable to both human readers and the translation machine.

3. Ability of current machine translation systems

Present-day machine translation systems are not perfect, so that we have to be very careful about the use of these systems. Otherwise we cannot achieve the anticipated productivity. One of the important factors in the use of machine translation systems is the text categories for machine translation. Headlines of news articles and the titles of papers can be translated rather easily if the subject areas of these texts are specifically restricted to a narrow domain. Technical information and news articles are suitable for machine translation without post-editing if the readers do not care about the sentence structure of translated materials and just want a rough idea of what is happening (information-gathering reading). The most widely used text category for machine translation is the operation and maintenance manuals of industrial products, such as computers, automobiles, nuclear plants, and so on. The texts themselves are rather simple, but the volume is enormous. Therefore machine translation is inevitable.

There are increasing demands for the translation of scientific and technical papers all over the world. Japan produces a lot of scientific and technical information every year, but almost all of the information is published in Japanese, and many foreigners say that Japan is protected by a language barrier. We are making great efforts to break this barrier by developing Japanese-to-English machine translation systems. There are many demands for the translation of legal texts, contract documents, and patent documents from Japanese into English. These are very hard for machines to translate, because the translation must be very accurate. The sentence structures of these documents are rather specific, usually very long and so complex that people must sometimes read them several times to grasp their precise meaning. Machines cannot handle such sophisticated sentences and generally fail at translation. Literary works and dialogue sentences are the most difficult categories of all for machines because they are mentally sophisticated, contain lots of omissions, and presuppose the implicit application by the reader of world knowledge.

We can also classify sentence categories by their difficulty from the standpoint of sentence or text types. Sentences that express only facts are very easy for machines to understand and translate. Sentences that include time relations, expectations, assumptions, conditions, etc., fall into the next category for machine translation. Then come sentences that describe the speaker's intention and mental state. These are sometimes very difficult to translate because the interpretation must include discourse information. Present-day systems translate input sentences one by one independently. They do not have the ability to see the interrelationship among adjacent sentences. Therefore, the translation of those sentences that require contextual information is very imperfect, although the minimum discourse phenomena, such as anaphora and ellipsis, are treated to a certain extent.

Other information that is required in the understanding of sentences is world knowledge, everyday-life knowledge, or common-sense knowledge. This information is particularly needed for the interpretation of dialogue, which presupposes the listener's knowledge. A new model for interpretation and translation has to be developed for such cases that includes speaker and listener models in the translation system. Intensive research is being done on this topic at ATR, the Automatic Translation Telephony Research Institute located in Kyoto, Japan. There are many sentences that include metaphors and culture-specific expressions. We still do not know how to handle these expressions, which are heavily connected to world knowledge. The study has just started.

Present-day machine translation systems are based on the compositionality principle; that is, the translation is based on a one-to-one correspondence with the words and phrases in the original input and can rarely use the information implicitly given in the sentence. In other words, the system transforms partial word sequences of an input sentence mechanically into different forms again and again before arriving at the final translation. When a given sentence is long, the process becomes very complex and very often fails. There is a statistic such that Japanese sentences composed of less than 20 characters are translated successfully more than 80 per cent of the time, but that those of more than 60 characters are translated successfully less than 30 per cent of the time, and that almost all sentences that have more than 100 characters are untranslatable by machine. However, we have recently developed a very interesting method for conquering this difficulty, and the success score for sentences of more than 100 characters has become 66 per cent.

4. Introduction and use of machine translation

In previous sections it was mentioned that machine translation systems are very complex and still incomplete and that their users must be very careful in their introduction and use. Here more details will be given.

First, the projected use for machine translation must be clearly established. Then the kinds of documents must be specified, and the expected quality, speed, and cost of translation must be clarified.

When a system is introduced in an organization, it cannot achieve its full potential the next day. At least two specialists must be assigned to the possible pre-editing and to the post-editing. They need to have backgrounds in linguistics or the languages that the machine will translate. They must be trained for a few days for operations such as pre-editing, post-editing, and dictionary change and enhancement. Special terms in the subject field of the documents must be collected, and the dictionary must be enhanced. The styles of the sentences in the input and output documents must be carefully studied, and translation equivalents must be determined. The grammar must sometimes be changed to fit the document styles. The special terms used in documents in some subject fields may total more than 10,000. A substantial number of documents have to be translated on a trial basis, and the dictionary and the grammar must be improved. This test period will last at least half a year and may sometimes exceed two years, depending on the type of texts.

Users must measure the translation rate, speed, post-editing time, and estimated cost during this text period. Users must always compare the achievement of the machine with that of human translation. When the user of a system has come to the conclusion that the overall cost of running the system is no more than that of human translation, or that machine translation has become the more rapid method, then the operational stage will start. During the operational stage, the system must constantly be improved by feedback from the post-editor, who is in the best position to catch the weak points of the system.

A few users in Japan have reported a 30 per cent decrease in translation cost by machine compared to that of human translation, as well as a 30 per cent speed-up in overall translation time. The latter depends heavily on the post-editing practice. Many post-editors still prefer to work on printout pages, but post-editing on the CRT display is much more efficient and speedy. Translation speed by machine alone varies from 3,000 words to 80,000 words per hour, depending on the computer. Nowadays the shift is being made from mainframe computers to workstations, which are more convenient and cost-effective. In the latter, the raw translation speed will be between 3,000 and 10,000 words per hour. Total translation cost depends heavily on how post-editing is done; for example, on whether it is performed on printout paper or on the CRT screen, whether or not it is performed by subject specialists of the area or not, and whether a high quality of translation is required.

Machine translation is expensive, particularly when the cost of possible pre-editing and of post-editing is included. Therefore, there must be a considerable and regular volume of documents throughout the year. One vendor says that 2,000 pages per year is a break-even point for machine translation. This volume depends on many factors and varies greatly case by case.

5. Evaluation factors of machine translation systems

There is a growing interest in machine translation. Some people have an accurate understanding of the capability of present-day machine translation systems, but many others have the idea that machine translation can perform perfectly, that it will deprive translators of jobs, or at the opposite extreme, that the machine cannot do anything significant, and that a machine translation system is useless.

For the proper use of present-day systems, there is need for a kind of common standard for the evaluation of machine translation as well as detailed knowledge of how machines translate sentences. Serious discussions have just been started about establishing such a standard, but it will take a long time for wide agreement, because there are so many factors to consider. Evaluation factors will differ according to the purposes of usage. This further complicates establishment of a standard. The following are factors that must be considered in the evaluation.

(1) The cost of purchase, operation, and maintenance; training, text input, dictionary enhancement for ordinary words and for special terminology; post-editing, final text preparation, and the relation between translation quality and cost.
(2) The speed of text input, pre-processing (including the handling of tables, figures, and so on), machine translation, post-editing final text format arrangement (including tables, figures, and so on), and the relation between pre-editing and post-editing time.
(3) The quality in terms of fidelity, intelligibility, and naturalness of style.
(4) The capacity of the system for improvement in terms of dictionary change, enhancement of the system's main dictionary and user dictionary, grammar improvement, particularly for sub-language expressions in a specific domain, and the extent to which users are allowed to improve the dictionaries and the grammars, and to test the method after improvement.
(5) The capacity for system extension from one text field to another and from one pair of languages to another.
(6) The extent of effort required of users in terms of the number of operating staff required, including the pre-editor, post-editor, and dictionary maintenance person; the period of time required for testing and tuning, for adapting the organization's documentation system structure to machine translation, and for controlling source language expressions when authors create original texts.
(7) The capacities and limitations of the system for dealing with grammar, semantics, discourse - including anaphora and ellipsis - dictionary information, software speed, learning capability, interactive translation or post-editing, ambiguity resolution, and for producing only the best possible translation output or all the possible outputs for a sentence.

6. Japanese machine translation systems

The Japan Information Center of Science and Technology (JICST) is translating abstracts of scientific and technical papers from Japanese to English by the improved version of the Mu machine translation system that we developed in 1985. Sentences in abstracts are comparatively long, and sentence structures tend to be very complex. Therefore, a certain degree of pre-editing is performed on the original Japanese abstracts. Post-editing is also done for the translated English abstracts. There is a complementary relation between the pre-editing and post-editing. That is, when a heavy pre-editing is done, the post-editing is light, and vice versa. JICST is now measuring what degrees of pre-editing and post-editing are best from the standpoint of overall cost.

The original system had a dictionary of about 70,000 words. Among them, about 20,000 words were common words and the rest were terminology in computer and electrical engineering. This dictionary was quite insufficient for JICST because abstracts from various scientific and technical fields were to be translated. So, JICST increased the vocabulary to 200,000 words and obtained an improved but still unsatisfactory translation rate. Finally, when the vocabulary was increased to 540,000 words, the expected translation rate was achieved.

Measurement of the translation quality at JICST is very difficult because both pre-editing and post-editing are performed to achieve a tolerable quality for general readers. JICST reports that the cost of machine translation, including pre-editing and post-editing, is about 60 per cent that of human translation, and that the speed of translation has been improved significantly.

Many private companies are involved in machine translation. Although they do not yet make a profit, they invest a lot because they consider that natural language processing will be a key technology in the future information society. The systems listed below are typical ones in Japan. Some other companies are developing small systems on personal computers. Those marked with an asterisk are R&D systems.

ALT J/E: Information Processing Laboratories, NTT*
ARGO: CSK(JE)
AS-TRANSAC: Toshiba Corp.
ATLAS-I: Fujitsu, Ltd.
ATLAS-II: Fujitsu, Ltd.
CONTRAST: ETL*
DUET-E/1: Sharp Corp.
HANTRAN: CBU
HICATS E/J: Hitachi, Ltd.
HICATS J/E: Hitachi, Ltd.
KAN-TRAN III: Carozelia Japan*
LAMB: Canon Inc.*
MELTRAN-J/E: Mitsubishi Electric Corp.
MU: Kyoto University* MU2: JICST
PAROLE: Matsushita Electric Industrial Co., Ltd. PENSEE, PENSEE II: Oki Industrial Co., Ltd.
PIVOT: NEC Corp.
RMT E/J: Ricoh Co., Ltd. SHALT: IBM Japan, Ltd. SHALT/JETS: IBM Japan, Ltd. STAR: CATENA Resource (on Unix) SYSTRAN: Systran
THE TRANSLATOR: CATENA Resource (on Macintosh)
Translation Word Processor SWP-7800: Sanyo Electric Co., Ltd.
Transer/EJ: Nova
UNNAMED: Nippon-Data General Corp.*
XJE: SPIRIT*

Two or three nationwide on-line machine translation services via computer network are commercially used.

7. Japanese governmental efforts

The Japanese Government, particularly the Ministry of International Trade and Industry (MITI), the Science and Technology Agency (STA), and recently the Ministry of Post and Telecommunications (MPT), has been interested in machine translation. The STA in cooperation with MITI supported the national project of machine translation (Mu project) during 1982-1986. MITI was the main promoter of the First Machine Translation Summit Meeting in Hakone in 1987, which was succeeded by the second in Munich in 1989 and the third in Washington, D.C., in 1991. MITI realized the importance of dictionary construction for machine translation, development of which requires a long time and a heavy investment. Individual companies independently construct their own MT dictionaries. This is a waste of money. MITI established a neutral organization for MT dictionary construction named Japan Electronic Dictionary Research Institute (EDR) in 1986. 70 per cent of the financing comes from the Government and the other 30 per cent is provided by eight major electronic companies. It constructs a kind of neutral dictionary based on concepts that are matched to lexical items and phrases of individual language. Within the project length of nine years, the EDR will establish a dictionary of 200,000 concepts and its corresponding English and Japanese dictionaries for machine use. The project is now in the improvement stage and its quality is being tested by a limited number of users. When the dictionaries are complete, they will be available to any organization in the world at a proper price.

MITI supports another project conducted by the CICC (Center for International Cooperation for Computerization). The project includes the development of a multilingual machine translation system between the

Japanese, Chinese, Malaysian, Thai, and Indonesian languages. The project started in 1987 and is continuing. They successfully demonstrated a first prototype system in 1990.

The Ministry of Post and Telecommunications supports a speech translation project, the ATR Automatic Speech Translation Telephony Research Institute, which was established five years ago. It is now located in Kansai Research Park, 30 km south of Kyoto City. The project aims at simultaneous speech translation by computer between Japanese and English. It comprises the recognition of spoken dialogue in Japanese and English, the translation of dialogue between the two languages, and the synthesis of speeches from translated sentences. At present, the dialogue domain is very narrowly restricted, and present knowledge in this domain is fully utilized. The first phase of the project is almost completed, and a demonstration of speech translation with a limited vocabulary will be given shortly. The project formulated a plan for a second stage for another 7-8 years.

8. Dictionary

Dictionaries are drawing increasing attention from more and more types of people. This is true not only of language dictionaries but also of terminological dictionaries in every field, and also of encyclopaedias. People are interested in learning the meaning of terms that appear in newspapers, books, TV programmes, and so on. In particular, they want not only the definition of a term they encounter but also that of related terms. People want on the one hand to know the specific details of a problem and, on the other, to grasp the total picture or to have a general understanding. This is one of the main reasons for consulting dictionaries. In the language area, there appeared a few new good language dictionaries, such as the Longman dictionaries of contemporary English and the COBUILD English Language dictionary. These dictionaries provoked discussion about what constitutes a good dictionary.

Dictionaries are becoming more and more important in the computer processing of natural language. Not only machine translation but also man-machine dialogue systems, information retrieval systems, and so on require good dictionaries for their own purposes. Here we have to clarify what kind of information must be included in a dictionary.

Dictionary construction requires much time as well as a big investment. It is quite difficult to change and reconstruct one. Therefore, there must be a careful design at the outset. In the past, dictionary contents were severely limited by the number of pages. This condition is no longer true for electronic dictionaries, which can memorize large amounts of information. A problem in an electronic dictionary, particularly for use by computer programs, is the representation structure as well as the kinds of information to be memorized. Humans can interpret dictionary contents in a flexible and adaptive way, but the computer lacks such flexibility. Therefore, we have to provide all detailed information explicitly for the computer.

It is difficult to construct a bilingual dictionary because a word of a source language has varieties of meanings or concepts, and for each meaning there exist many corresponding words and expressions in a target language. Very often there are cases where concepts in a language correspond to several concepts in another language, and it is not easy to fix minimum "grains" for concepts that have one-toone correspondence to linguistic expressions in both languages.

When we consider the multilingual dictionary, we have first to identify minimum "grains" of concepts in each language and culture, and then to determine those that are common to the concepts set for all the languages concerned. In this way, a neutral multilingual dictionary depends on the language set. Bilingual dictionaries are difficult to construct, and so multilingual dictionaries of several languages are even more difficult to construct, particularly when these languages have very different cultural backgrounds. It is very hard to identify pivot concepts through which words and phrases can be exchanged between every language pair. It would be possible to establish pivot concepts for major conceptual words, but it will be very difficult for many other concepts in ordinary life. These concepts, particularly those strongly related to human life, must be explained in terms of the cultural background where the language is used. Therefore, for very detailed machine translation dictionaries of ordinary words, it is essential to construct bilingual dictionaries. It is difficult to construct multilingual dictionaries covering several languages that reflect completely different cultural backgrounds.

Multilingual dictionaries will be possible, however, for concepts in natural science and technology, which are less dependent on different cultures. Also, if high quality translation is not required, we may be able to construct multilingual dictionaries among certain sets of languages.

Another question that needs to be discussed is how to use dictionary information. Dictionaries are useful not only in machine translation but also in natural language processing systems such as dialogue systems and information retrieval systems. Internal organization of a large dictionary is important from the standpoints of speed and flexible reference to the related information. Hypertext structure will help people use complex electronic dictionaries. We have to develop more powerful ways for using electronic dictionaries.

9. State of the art in Europe and the United States

There were several basic research projects in Europe in the 1960s, but the first system that aimed at practical use was the Systran Russian-English machine translation system at CEC Euratom, Ispra, in the late 1960s. Systran EnglishFrench was introduced to the EC Commission in Luxembourg in 1976. After long trial use and improvement of the system, it began semi-operational use at the beginning of the 1980s. Now Systran English-French is very satisfactorily used by many translators in the EC through computer net work communication. For example, the translation speed is 4() pages (pages averaging 250 words) per minute, or 500,000 words per hour. On the average, minor corrections are necessary at about 10 spots on a page.

The EC Commission has several other Systran systems for language pairs, such as French-English, English-German, English-ltalian, French-German, etc. The translation quality of these systems is less good than the English-French system.

The EC has to deal with nine languages in the Community. This might have meant that 72 systems of machine translation from one language to another had to be constructed. To avoid this difficulty, it seemed necessary to develop a multilingual machine translation system that could translate one language to all the others at the same time. The EC started preliminary investigation for this multilingual machine translation system in 1978 under the EUROTRA project. The official EUROTRA project was started in the early 1980s for the seven languages then used in the EC (extended to nine languages afterwards), and many researchers from every EC country participated in it. However, the R&D of multilingual machine translation systems was too difficult, and the project was stopped at the beginning of the 1990s. The EC has shifted its research from machine translation to dictionary construction and language industries.

Germany has started a speech translation project called VERBMOBILE, recently stimulated by the Japanese project of speech translation conducted by the ATR Speech Translation Telephony Research Institute. The project, which will continue for eight years, is divided into two four-year phases.

The United States had leading research groups in machine translation at its beginnings, but unfortunately the US Government abandoned R&D in machine translation and stopped governmental research funds in 1966, following issue of the well-known ALPAC report. Since then, the United States has made no significant progress in machine translation research and has adopted the research direction of computational linguistics. However, there is now growing interest in machine translation in sections of the Government, and new funding for research is expected.

In the US private sector, there are some commercial machine translation systems, such as Systran, Logos, Weidner, and recently Glovalink. Systran was based on the research results of the Georgetown machine translation project conducted in the late 1950s and early 1960s. It was improved and has been used mainly at the EC Commission since 1976 and served to increase the available translation language pairs greatly. It is one of the more successful machine translation systems in the world. Logos was first developed for EnglishVietnamese, and then added as language pairs English-French and some others.

Canada is a bilingual country using English and French. The Canadian Government was interested in machine translation long ago, and supported several R&D projects. One of the most successful systems is TAUM METED, which translates weather forecast sentences from English into French. It has been in operation on a 24-hour basis since 1978. It has no pre editing nor post-editing. It is fully automatic. When a complex sentence comes in that the system cannot handle, it is displayed on the operator's screen and is translated and typed in immediately. The Canadian Government has a large translation bureau under the Secretary of State, and does daily translation manually. The bureau is now using the Logos English-French system on a trial basis as well as translators' workstations developed internally.

10. The international association for machine translation

As mentioned before, Japan took the initiative for organizing the first Machine Translation Summit Meeting, and already three summit meetings have been held, in Hakone (Japan), Munich (Germany), and Washington, D.C. (USA). We also organized the International Forum for Translation Technology (IFTT) in 1989 at Oiso, Japan, where we proposed the organization of the International Association for Machine Translation (IAMT). Discussions were continued in Munich, and finally at the Third MT Summit in July 1991, IAMT was established. The purposes and activities of the Association are outlined:

(i) Purposes

1.1 The International Association for Machine Translation (IAMT) brings together users, developers, researchers, sponsors, and other individuals or institutional or corporate entities interested in machine translation (MT) for the purpose of promoting and fostering, by every available means, the development and active use of MT systems. The IAMT offers opportunities and occasions to exchange information, study MT technologies and applications, and discuss and establish reference criteria or standards in areas of common interest to its members.

1.2 The specific concerns of the IAMT may include:

1.2.1 Collection and compilation of information

The IAMT may serve as a repository for historical and current documentation on MT, converting the texts to machine-readable form when and as feasible.

1.2.2 Exchange of information

The IAMT may serve as a clearing-house for current information of interest to MT users and developers, including:

- For users: translation market trends, available MT systems, types of MT applications, introduction of MT systems, approaches to text file input, pre- and post-editing strategies, evaluation of MT, etc.
- For developers: theories of MT, MT technologies, improvement of MT, approaches to dictionary-building, databases and other files available for exchange, etc.

1.2.3 Dissemination of information

The IAMT may publish information of general interest in the field of machine translation, such as:

- A regular newsletter on MT
- An MT handbook, including agreed definitions of terminology
- Bibliographies

1.2.4 Standardization

The IAMT may develop and propose reference criteria and standards in such areas as:

- Common document format for MT input Exchange format for dictionaries
- Design of controlled language
- Evaluation of MT output and MT systems

(ii) Activities

2.1 The IAMT shall convene the biennial General Assembly of its membership. The Assembly shall preferably be held in conjunction with the MT Summit, a technical forum that is open to the general public and that shall also be convened by the Association.

The IAMT may also undertaken the following activities:

2.2 Sponsorship and support of workshops, symposia, and conferences on MT and related technologies and applications.

2.3 Organization of tutorials and training courses on MT applications and skills involved in the use of MT.

2.4 Organization of tutorials and workshops on MT technologies.

2.5 Establishment of technical committees, special interest groups, and study teams.

(iii) Membership

Membership in the IAMT shall be open to active or potential MT users, developers, and researchers, and to any other individuals or institutional or corporate entities with interest in the purposes of the Association. There shall be three categories of membership: (1) individual, (2) corporate, and (3) institutional. All of these must belong to a regional association, except where there is no suitable association in the region where a person interested in IAMT lives.

(iv) Organization

4.1 The IAMT is the federation of the Regional Associations for machine translation (one each in Europe, the Americas, and Asia).

4.2 The organs of the IAMT shall be the General Assembly, which is the supreme governing body of the Association, and the Council, which executes the decisions of the General Assembly and carries out the business of the Association.

4.3 The IAMT shall have a permanent Secretariat. The location of the Secretariat shall be decided by the General Assembly.

The Japan Association for Machine Translation (JAMT) is trying to extend its membership and to become the Asian Association for Machine Translation. The CICC is considering the linkage of Asian countries for the exchange of technology on machine translation. For example, we can imagine the establishment of a machine translation centre in each country, where R&D and services of machine translation will be performed, and these centres working together for the progress and wider usage of machine translation. The IAMT will play an important role in such activities in the future.


Contents - Previous - Next