ICT4LT Module 3.5It is the aim of this module to explore some of the aspects and challenges in Human Language Technologies (HLT) that are of relevance to Computer Assisted Language Learning (CALL). Starting with a brief outline of some of the early attempts in HLT, using the example of Machine Translation (MT), it will become apparent that experiences and results in this area had a direct bearing on some of the developments in CALL. CALL soon became a multi-disciplinary field of research, development and practice. Some researchers began to develop CALL applications that made use of Human Language Technologies, and a few such applications will be introduced in this module. The advantages and limitations of applying HLT to CALL will be discussed, using the example of parser-based CALL. This brief discussion will form the basis for first hypotheses about the nature of human-computer interaction (HCI) in parser-based CALL.
This Web page is designed to be read from the printed page. Use File / Print in your browser to produce a printed copy. After you have digested the contents of the printed copy, come back to the onscreen version to follow up the hyperlinks.
Piklu Gupta: At this time of writing this module Piklu was a lecturer in German Linguistics at the University of Hull, UK. He is now working for the Fraunhofer Integrated Publication and Information Systems Institute: http://www.ipsi.fraunhofer.de.
Mathias Schulze: At this time of writing this module Mathia was a lecturer in German at UMIST, now merged with the University of Manchester, UK. He is now working at the University of Waterloo, Canada: http://www.uwaterloo.ca. His main research interest is in parser-based CALL and linguistics. He is an active member of NLP SIG, within the EUROCALL professional association, and ICALI, within the CALICO professional association.
Graham Davies, ICT4LT Editor, Thames Valley University, UK. Graham has been interested in Machine Translation since 1976.
Human Language Technologies (HLT) is a relatively new term that embraces a wide range of areas of research and development in the sphere of what used to be called Language Technologies or Language Engineering. The aim of this module is to familiarise the student with key areas of HLT, including a range of Natural Language Processing (NLP) applications. NLP is a general term used to describe the use of computers to process information expressed in natural (i.e. human) languages. The term NLP is used in a number of different contexts in this document and is one of the most important branches of HLT. There is a Special Interest Group in Language Processing, NLP SIG, within the EUROCALL professional association, and a Special Interest Group in Intelligent Computer Assisted Language Instruction, ICALI, within the CALICO professional association. Both have similar aims, namely to further research in a number of areas that are mentioned in this module, such as:
All of the above are areas of research that have produced results which have proven, are proving and will prove very useful in the field of Computer Assisted Language Learning.
Of course, this module cannot teach you everything there is to know about HLT. This is neither necessary nor possible. The two main authors of this module are living proof of that; they both started off as language teachers and then got interested in HLT.
The title of this module has been brought into line with the rebranding of the Language Engineering sector of the Telematics Applications Programme (TAP) of the European Commission as Human Language Technologies.
A useful introductory publication, entitled Language and technology: from the Tower of Babel to the Global Village, was published by the European Commission in 1996.
A multilingual CD-ROM entitled A world of understanding was produced in 1998 on behalf of the Information Society and Media Directorate General of the European Commission under its former name, DGXIII. The aim of the CD-ROM was to demonstrate the importance of HLT in helping to realise the benefits of the Multilingual Information Society, in particular forming a review and record of the Language Engineering Sector of the Fourth Framework Programme of the European Union (1994-98).
See this more recent EU publication (2006) in downloadable PDF format, entitled Human Language Technologies for Europe: http://europa.eu/languages/en/document/88/17. This publication is a valuable source of information on the state of the art in Human Language Technologies. It gives an overview of the challenges and opportunities waiting Europe in this important research field. The document has been issued by the EU-funded research project TC-STAR (Technology and Corpora for Speech to Speech Translation): http://www.tc-star.org
[...] there is no doubt that the development of tools (technology) depends on language - it is difficult to imagine how any tool - from a chisel to a CAT scanner - could be built without communication, without language. What is less obvious is that the development and the evolution of language - its effectiveness in communicating faster, with more people, and with greater clarity - depends more and more on sophisticated tools. (EC: Language and technology 1996:1)
Language and technology lists the following examples of language technology (using an admittedly broad understanding of the term):
Many of these are already being used in language learning and teaching. Today most of the research and development that aims to enable humans to communicate more effectively with each other (e.g. email and Web-conferencing) and with machines (e.g. machine translation and natural language interfaces for search engines) is carried out in the context of Human Language Technologies:
The field of human language technology covers a broad range of activities with the eventual goal of enabling people to communicate with machines using natural communication skills. Research and development activities include the coding, recognition, interpretation, translation, and generation of language. ... Advances in human language technology offer the promise of nearly universal access to online information and services. Since almost everyone speaks and understands a language, the development of spoken language systems will allow the average person to interact with computers without special skills or training, using common devices such as the telephone. These systems will combine spoken language understanding and generation to allow people to interact with computers using speech to obtain information on virtually any topic, to conduct business and to communicate with each other more effectively. (Cole 1996)
Facilitating and supporting all aspects of human communication through machines has interested researchers for a number of centuries. The use of mechanical devices to overcome language barriers was proposed first in the seventeenth century. Then, suggestions for numerical codes to be used to mediate between languages were made by Leibnitz, Descartes and others (v. Hutchins 1986:21). The beginnings of what we describe today as Human Language Technologies are, of course, closely connected to the advent of computers. In a report entitled Intelligent Machinery, which was written in 1948 for the National Physical Laboratory, Alan Turing, one of the fathers of Artificial Intelligence (AI), who led work on cryptanalysis in World War II using the Colossus machine at Bletchley Park, mentions a number of different ways in which these new computers could demonstrate their "intelligence", including learning and translating natural languages:
(i) Various games, e.g. chess, noughts and crosses, bridge, poker
(ii) The learning of languages
(iii) Translation of languages
(iv) Cryptography
(v) Mathematics
Of these (i), (iv), and to a lesser extent (iii) and (v) are good in that they require little contact with the outside world. For instance in order that the machine should be able to play games its only organs need be 'eyes' capable of distinguishing the various positions on a specially made board, and means for announcing its own moves. Mathematics should preferably be resticted to branches where diagrams are not much used. Of the above possible fields the learning of languages would be the most impressive, since it is the most human of these activities. This field sees however to depend too much on sense organs and locomotion to be feasible. (Turing 1948:9)
Later on, Machine Translation enjoyed a period of popularity with researchers and funding bodies in the United States and the Soviet Union:
From 1956 onwards, the dollars (and roubles) really started to flow. Between 1956 and 1959, no less than twelve research groups became established at various US universities and private corporations and research centres. [...] The kind of optimism and enthusiasm with which researchers tackled the task of MT [Machine Translation] may be illustrated best by some prophecies of Reifler, whose views may be taken as representative of those of most MT workers at that time: "... it will not be very long before the remaining linguistic problems in machine translation will be solved for a number of important languages" (Reifler 1958:518), and, "in about two years (from August 1957), we shall have a device which will at a glance read a whole page and feed what it has read into a tape recorder and thus remove all human co-operation on the input side of the translation machines" (Reifler 1958:516 ), (Buchmann 1987:14)
Although linguists, language teachers and computer users today may find these predictions ridiculous, it was the enthusiasm and the work during this time that form the basis of many developments in HLT today.
Research and development in HLT is nowadays more rapidly transferred into commercial systems than was the case up until the 1980s. Indeed HLT is becoming increasingly pervasive in our everyday lives. Here are some examples:
Other previously unexpected areas of use are emerging. It is now, for instance, common for mobile phones to have what is known as predictive text input to aid the writing of short text messages: http://www.tegic.com. Instead of having to press one of the nine keys a number of times to produce the correct letter in a word, software in the phone compares users' key presses to a linguistic database to determine the correct (or most likely) word. Most Internet search engines also now incorporate some kind of linguistic technology to enable users to enter a query in natural language, for example "What is meant by log-likelihood ratio?" is as acceptable a query as simply "log-likelihood ratio".
What are the possible benefits for language teaching and learning of using HLT? Here are some examples:
Machine Translation (MT) has been the dream of computer scientists since the 1940s. The student's attention is drawn in particular to the following publications, which provide a very useful introduction to MT:
See also other sources of information and publications on Machine Translation at the website of the The British Computer Society Natural Language Translation Specialist Group: http://www.bcs-mt.org.uk
Initial work on Machine Translation (MT) systems was typified by what we would now consider to be a naive approach to the "problem" of natural language translation. Successful decoding of encrypted messages by machines during World War II led some scientists, most notably Warren Weaver, to view the translation process as essentially analogous with decoding. The concept of Machine Translation in the modern age can be traced back to the 1940s. Warren Weaver, Director of the Natural Sciences Division of the Rockefeller Foundation, wrote to his friend Norbert Wiener on 4 March 1947 - shortly after the first computers and computer programs had been produced:
Recognising fully, even though necessarily vaguely, the semantic difficulties because of multiple meanings, etc., I have wondered if it were unthinkable to design a computer which would translate. Even if it would translate only scientific material (where the semantic difficulties are very notably less), and even if it did produce an inelegant (but intelligible) result, it would seem to me worth while.
Also knowing nothing official about, but having guessed and inferred considerable about, powerful new mechanized methods in cryptography - methods which I believe succeed even when one does not know what language has been coded - one naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say "This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode".
Have you ever thought about this? As a linguist and expert on computers, do you think it is worth thinking about? Cited in Hutchins (1997).
Weaver was possibly chastened by Wiener's pessimistic reply:
I frankly am afraid the boundaries of words in different languages are too vague and the emotional and international connotations are too extensive to make any quasi-mechanical translation scheme very hopeful.
But Weaver remained undeterred and composed his famous 1949 Memorandum, entitled simply "Translation", which he sent to some 30 noteworthy minds of the time. It posited in more detail the need for and possibility of MT. Thus began the first era of MT research.
The first generation (henceforth referred to as 1G) of MT systems worked on the principle of direct transfer; that is to say that the route taken from source language text to its target language equivalent was a short one consisting essentially of two processes: replacement and adjustment. A direct system would comprise a bilingual dictionary containing potential replacements or target language equivalents for each word in the source language. A restriction of such MT systems was therefore that they were unidirectional and could not accommodate many languages unlike the systems that followed. Rules for choosing correct replacements were incorporated but functioned on a basic level; although there was some initial morphological analysis prior to dictionary lookup, subsequent local re-ordering and final generation of the target text, there was no scope for syntactic analysis let alone semantic analysis! Inevitably this often led to poor quality output, which certainly contributed to the severe criticism of MT in the 1966 Automatic Language Processing Advisory Committee (ALPAC) report which stated that it saw little use for MT in the foreseeable future. The damning judgment of the ALPAC report effectively halted research funding for machine translation in the USA throughout the 1960s and 1970s.
We can say that both technical constraints and the lack of a linguistic basis hampered MT systems. The system developed at Georgetown University, Washington DC, and first demonstrated at IBM in New York in 1954 had no clear separation of translation knowledge and processing algorithms, making modification of the system difficult.
In the period following the ALPAC report the need was increasingly felt for an approach to MT system design which would avoid many of the pitfalls of 1G systems. By this time opinion had shifted towards the view that linguistic developments should influence system design and development. Indeed it can be said that the second generation (2G) of "indirect" systems owed much to linguistic theories of the time. Modularity is an important design feature of 2G systems, and in contrast to 1G systems, which operate on a 'brute force' principle in which translation takes place in one step, the steps involved in analysis of source text and generation of target text ideally constitute distinct processes. 2G systems can be divided essentially into "interlingual" and "transfer" systems. We will look first of all at interlingual systems, or rather those claiming to adopt an interlingual approach.
Although Warren Weaver had put forward the idea of an intermediary "universal" language as a possible route to machine translation in his 1947 letter to Norbert Wiener, linguistics was unable to offer any models to apply until the 1960s. By virtue of its introduction of the concept of "deep structure", Noam Chomsky's theory of transformational generative grammar appeared to offer a route towards "universal" semantic representations and thus appeared to provide a model for the structure of a so-called interlingua. An interlingua is not a natural language, rather it can be seen as a meaning representation which is independent of both the source and the target language of translation. An interlingua system maps from a language's surface structure to the interlingua and vice versa. A truly interlingual approach to system design has obvious advantages, the most important of which is economy, since an interlingual representation can be applied for any language pair and facilitates addition of other language pairs without major additions to the system. The next section looks at "transfer" systems.
In a transfer model the intermediate representation is language dependent, there being a bilingual module whose function it is to interpose between source language and target language intermediate representations. Thus we cannot say that the transfer module is language independent. The nature of these transfer modules has obvious ramifications for system design in that addition of another language to a system necessitates not only modules for analysis and synthesis but also additional transfer modules, whose number is dictated by the number of languages in the existing system and which would increase polynomially according to the number of additional languages required. (For n languages the number of transfer modules required would be n(n-1) or n(n-1) /2 if the modules are reversible).
An important advance in 2G systems when compared to 1G was the separation of algorithms (software) from linguistic data (lingware). In a system such as the Georgetown model the program mixed language modelling, translation and the processing thereof in one program. This meant that the program was monolithic and it was easy to introduce errors when trying to rectify an existing shortcoming. The move towards separating software and lingware was hastened by parallel advances in both computational and linguistic techniques. The adoption of linguistic formalisms in the design of systems and the development of high level programming languages enabled MT workers to code in a more problem-oriented way. The development in programming languages meant that it was becoming ever easier to code rules for translation in a meaningful manner and arguably improved the quality of these rules. The declarative nature of linguistic description could now be far more explicitly reflected in the design of programs for MT.
Early MT systems were predominantly parser-based, one of the first steps in such a system being to parse and tag the source language: see Section 5 on Parsing and Tagging in HLT. More recent current approaches to MT rely less on formal linguistic descriptions than the transfer approach described above. Translation Memory (TM) systems are in widespread commercial use (see below and Chapter 10 of Arnold et al. 1994). EBMT (Example-Based Machine Translation) is a relatively new technology which aims to combine both traditional MT and more recent TM paradigms by reusing previous translations and applying various degrees of linguistic knowledge to convert fuzzy matches into exact ones. However, some early definitions of EBMT refer to what is now known as TM, and often exclude the concept of fuzzy matches. An even more radical approach to MT is the statistical approach (v. Brown et al. 1993) which requires the use of large bilingual corpora which serve as input for a statistical translation model. Thus we have, in a sense, come full circle in that Weaver's ideas of applying statistical techniques are seen as a fruitful basis for MT.
There are many cheap translation packages on the market - as well as free packages on the Web. While such packages may be useful for extracting the gist of a text they should not be seen as a serious replacement for the human translator. Some are not all that bad, producing translations that are half-intelligible, letting you know whether a text is worth having translated properly. See:
Google has a translation tool too - click on "more" on the Google opening page menu and then select "Translate Tool".
Professional human translators are making increasing use of Translation Memory (TM) packages. TM packages store texts that have previously been translated, together with their source texts, in a large database. Chunks of new texts to be translated are then matched against the translated texts in the database and suggested translations are offered to the human translator wherever a match is found. The human translator has to intervene regularly in this process of translation, making corrections and amendments as necessary. TM systems can save hours of time (estimated at up to 80% of a translator's time), especially when translating texts that are repetitive or that use lots of standard phrases and sentence formulations. Producing updates of technical manuals is a typical application of TM systems. Examples of TM systems include:
The European Commission uses TM tools. Have a look at these sites, which contain useful information about translatng in the EU and the tools that are used to speed up workflow:
An example of automatic translations can be found at the Newstran website. This site is headed "Automatically Translate Virtually Every Major Newspaper in the World!". It is extremely useful for locating newspapers in a wide range of languages. You can also locate selected newspapers that have been translated using Babel Fish. The quality and accuracy is what you can expect from Babel Fish or any another automatic translation system - but you can get the gist, e.g. "The red-green coalition agreement pushes in the SPD obviously increasingly on criticism" as a rendering of "Die rot-grüne Koalitionsvereinbarung stößt in der SPD offensichtlich zunehmend auf Kritik": http://www.humanitas-international.org/newstran/
Another approach to translation is the stored phrase bank, for example LinguaWrite, which was aimed at the business user and contained a large database of equivalent phrases and sentences in different languages to facilitate the writing of business letters. LinguaWrite was programmed by Marco Bruzzone in the 1980s and marketed by Camsoft, but it is no longer available and has not been updated. David Sephton's Tick-Tack (Primrose Publishing) began as a package consisting of "building blocks" of language for business communication but now embraces other topics: http://www.rmplc.co.uk/com/dsephton/TickTack/
Lost in translation
Just for fun, see what happens when an English sentence is translated by computer into and from a sequence of five different languages: http://www.tashian.com/multibabel
I typed: Once upon a time there were three bears who lived in the middle of a deep, dark forest.
The result was: It was, was not seriously three bears, that one had deeply lived in the average the one and sunk forest.
Contents of Section 4
Computers are normally associated with two standard input device, the keyboard and the mouse, and two standard output devices, the display screen and the printer. All these restrict language input and output. However, computer programs and hardware devices that enable the computer to handle human speech are now commonplace. Multimedia PCs allow the user to attach a microphone to the soundcard and record his/her own voice. Similarly, a cassette recorder can be attached to the computer and by transmitting the sounds recorded on the tape to the computer these sounds can be digitised. Storing these sound files is not problematic anymore due to the immensely increased capacity and reduced cost of storage media and improved compression techniques that enable the size of sound files to be substantially reduced. For further information on the applications of sound recording and playback technology to CALL see Module 2.2, Introduction to multimedia CALL.
A range of computer software is also available for processing and analysing speech. Spoken input can be analysed according to a wide variety of parameters and the analysis can be represented graphically or numerically. Of course, graphic output is not immediately useful for the uninitiated viewer, and hence we are not arguing that this kind of graphical representation will prove useful to the language learner. On the other hand, specialists are well capable of interpreting this speech analysis data. See http://cslu.cse.ogi.edu/tutordemos/SpectrogramReading/spectrogram_reading.html for details and some explanations about different analyses.
The information we get from speech analysis has proven very valuable indeed for speech synthesis and speech recognition, which are dealt with in the following two sections.
Speech synthesis describes the process of generating human-like speech by computer. Producing natural sounding speech is a complex process in which one has to consider a range of factors that go beyond just converting characters to sounds because very often there is no one-to-one relation between them. The intonation of particular sentence types and the rhythm of particular utterances also have to be considered.
Currently speech synthesis is far more advanced and more robust than speech recognition (see Section 4.2 below). The naturalness of artificially produced utterances is now very impressive compared to what used to be produced by earlier speech synthesis systems in which the intonation and timing were far from natural and resulted in the production of monotonous, robot-like speech. Many people are now unaware that so-called talking dictionaries use speech synthesis software rather than recordings of human voices. In-car satellite navigation (satnav) systems can produce a range of different types of human voices, both male and female in a number of different languages, and "talk" to the car driver guiding him/her to a chosen destination.
So far, however, speech synthesis has not been as widely used in CALL as speech recognition. This is probably due to the fact that language teachers' requirements regarding the presentation of spoken language are very demanding. Anything that sounds artificial is likely to be rejected. Some language teachers even reject speakers whose regional accent is too far from what is considered standard or received pronunciation.
There is, however, a category of speech synthesis technology known as Text To Speech (TTS) technology that is widely used for practical purposes. TTS software falls into the category of assistive technology, which has a vital role in improving accessiblity for a wide range of computer users with special needs, which is now governed by legislation in the UK. The Special Educational Needs and Disability Act (SENDA) of 2001 covers educational websites and obliges their designers "to make reasonable adjustments to ensure that people who are disabled are not put at a substantial disadvantage compared to people who are not disabled." See JISC's website on disability legislation: http://www.jisclegal.ac.uk/disability/accessibility.htm. See the Glossary for definitions of assistive technology and accessiblity.
TTS is important in making computers accessible to blind or partically sighted people as it enables them to "read" from the screen.. TTS technology can be linked to any written input in a variety of languages, e.g. automatic pronunciation of words from an online dictionary, reading aloud of a text, etc. These are examples of TTS software:
|
Voki is a website that enables you to create and customise your own speaking cartoon character that can be embedded in your favourite social networks, blogs and websites. You can choose the text-to-speech option (as in Graham Davies's example on the right) to give the character a voice, or you can record your own voice: http://www.voki.com An excellent tool that helps people with hearing impairments to learn how to articulate is the CSLU Speech Toolkit. This features BALDI, an animated 3D talking head that automatically synchronises natural or synthetic speech with realistic lip, tongue, mouth and facial movements. Ron Cole introduced conference participants at EUROCALL 2000 to BALDI in an impressive demonstration of the toolkit's features. The CSLU Speech Toolkit can be downloaded from here: http://cslu.cse.ogi.edu/toolkit/ To what extent speech synthesis systems are suitable for CALL is a matter for further discussion. See the article by Handley & Hamel (2005), who report on their progress towards the development of a benchmark for determining the adequacy of speech synthesis systems for use in CALL. The article mentions a Web-based package called FreeText, for advanced learners of French, the outcome of a project funded by the European Commission: http://www.latl.unige.ch/freetext/ |
Speech recognition describes the use of computers to recognise spoken words. Speech recognition not reached such a high level of performance as speech synthesis (see Section 4.1 above):
The accuracy of past generations of speech recognition software topped out at a little more than 90% (nearly one error every ten words), making them questionable as productivity-enhancing tools. The good news about the latest speech software is that most of the products provide recognition accuracy above 95% and help you get more done in less time - as long as you have sufficient PC speed, an adequate sound card and microphone, and the ability to speak clearly at all times. (Alwang 1999)
Speech recognition is a non-trivial task because the same spoken word does not produce entirely the same sound waves when uttered by different people or even when uttered by the same person on different occasions. The process is complex: the computer has to digitise the sound, transform it to discard unneeded information, and then try to match it with words stored in a dictionary.
The most efficient speech recognition systems are speaker-dependent, i.e. they are trained to recognise a particular person's speech and can then distinguish thousands of words uttered by that person. Speaker-independent systems are far less efficient, but they are gaining in use in CALL and there are a number of programs that make use of what is known as Automatic Speech Recognition (ASR): see Section 3.4.7, Module 2.2, headed CD-ROMs incorporating Automatic Speech Recognition (ASR).
Speech technology relies heavily on speech analysis which is mentioned in Section 4.1 above. If one remembers that each of the parameters analysed could have been affected by some speaker-independent background noise or by some idiosyncratic pronunciation features of this particular speaker then it already becomes clear how difficult the interpretation of the analysis data is for a speech recognition program. The problem is compounded by the fact words are normally not produced in isolation - they are normally uttered in what we call connected speech, i.e. the pronunciation of one word might influence another, and intonation and rhythm will have a bearing on how a single word is going to be pronounced. However, some commercial software that makes good use of this new technology has been on the market for a while: see Section 3.4.7, Module 2.2, headed CD-ROMs incorporating Automatic Speech Recognition (ASR).
The following information is taken from an article written by Norman Harris of DynEd, a publisher of CALL software incorporating ASR,
Speech recognition technology has finally come of age - at least for language training purposes for young adults and adults. Computer programs that truly "understand" natural speech, the Holy Grail of artificial intelligence researchers, may be a decade or more away, and today's speech recognition programs may be merely pattern matching devices, still incapable of parsing real language, of achieving anything like "understanding," but, nonetheless, they can now provide language students with realistic, highly effective, and motivating speech practice.
The essence of real language is not in discrete single words - language students need to practice complete phrases and sentences in realistic contexts. Moreover, programs which were trained to accept a speaker's individual pronunciation quirks were not ideally suited to helping students move toward more standard pronunciation. These technologies also failed if the speaker's voice changed due to common colds, laryngitis and other throat ailments, rendering them useless until the speaker recovered or retrained the speech engine.
The solution to these problems came with the development of continuous speech recognition engines that were speaker independent. These programs are able to deal with complete sentences spoken at a natural pace, not just isolated words. They require no special hardware, are small enough and fast enough to work on normal PCs, and importantly for the typical language training environment, do not require a training period - they allow a variety of individual language learners working on the same computer to practice speaking English from the first moment they talk into the microphone.
Such flexibility with regard to pronunciation paradigms means that today's speaker-independent speech recognition programs are not ideal for direct pronunciation practice. Nonetheless, exercises which focus on fluency and word order, and with native speaker models which are heard immediately after a student's utterance had been successfully recognized, have been shown to indirectly result in much improved pronunciation. Another trade off is that the greater flexibility and leniency which allows these programs to "recognize" sentences spoken by students with a wide variety of accents, also limits the accuracy of the programs, especially for similar sounding words and phrases. Some errors may be accepted as correct.
Native speakers testing the "understanding" of programs "tuned" to the needs of non-native speakers may be bothered by this, but most teachers, after careful consideration of the different needs and psychologies of native speakers and learners, will accept the trade off. Students do not expect to be understood every time. If they are required occasionally to repeat a sentence which the program has not recognized or which the program has misinterpreted, there may be some small frustration, but language students are much more likely to take this in their stride than would native speakers. On the other hand, if the program does "understand" such students, however imperfect their pronunciation, they typically experience a huge sense of satisfaction, a feel good factor native speakers simply cannot enjoy to anywhere near the same degree. The worst thing for a student is a program that is too demanding of perfection - such programs will quickly lead to student frustration or the kind of embarrassed, hesitant unwillingness to speak English typical of many classrooms. Even if we accept that accuracy needs to be responsive to proficiency in order to encourage students to speak, we must, as teachers, be concerned that errors do not become reinforced.
From Norman Harris: Speech recognition: considerations for use in language training: http://www.dyned.com/about/speech.shtml
See also Ehsani & Knodt (1998) and Chun (1998).
See the EyeSpeak website for information about software for helping students improve their English pronunciation: http://www.eyespeakenglish.com
Put in simple terms, a parser is a program that maps strings of a language into its structures. The most basic components needed by a parser are a lexicon containing words that may be parsed and a grammar, consisting of rules which determine grammatical structures. The first parsers were developed for the analysis of programming languages; obviously as artificial, regular languages they present fewer problems than a natural language. It is most useful to think of parsing as a search problem which has to be solved. It can be solved using an algorithm which can be defined as:
... a formal procedure that always produces a correct or optimal result. An algorithm applies a step-by-step procedure that guarantees a specific outcome or solves a specific problem. The procedure of an algorithm performs a computation in a finite amount of time. Programmers specify the algorithm the program will follow when they develop a conventional program (Smith 1990)
Parsing algorithms define a procedure that looks for the optimum combination of grammatical rules that generate a tree structure for the input sentence. How might we define these grammatical rules in a concise way that is amenable to computer processing? A useful construct for our purposes is a so-called context-free grammar (CFG). A CFG consists of rules containing a single symbol on the left-hand side and one or more on the right-hand side. For example, the statement that a sentence can consist of:
a noun phrase and a verb phrase can be expressed by the following rewrite rule
S ® NP VP
This means that a sentence S can be 'rewritten' as a noun phrase NP followed by a verb phrase VP which are in their turn defined in the grammar. A noun phrase, for example, can consist of a determiner DET and a noun N. These symbols are known as non-terminals and the words represented by these symbols are terminal symbols.
Parsing algorithms can proceed top-down or bottom-up. In some cases, top-down and bottom-up algorithms can be combined. Below are simple descriptions of two parsing strategies.
S ® NP VP
and then breaks them down into constituents. The strategy assumes we have an S and tries to fit it in. If we choose to search depth first, then we proceed down one side of the tree at a time. The search will end successfully if it manages to break down the sentence into all its terminal symbols (words).
A bottom up strategy looks at elements of an S and assigns categories to them to form larger constituents until we arrive at an S. If we choose to search breadth first, then we proceed consecutively through each layer and stop successfully once we have constructed a sentence.
Let's look now at one linguistic phenomenon which causes problems for parsers - that of so-called attachment ambiguity. Consider the following sentence:
The man saw the man in the park with a telescope.
Clearly there are a number of possible interpretations, for instance the telescope could be the instrument used to see the second man or the scope of 'in the park with a telescope' could be such that it defines the park, i.e. 'the one with a telescope'. Parser output can be represented as a bracketed list or, more commonly, a tree structure. Here is the output of two possible parses for the sentence above.
Figure 1: Parse Tree Version 1.0
Figure 1: Parse Tree Version 2.0
One way of dealing with the problem of sentences which have more than one possible parse is to concentrate on specific elements of the parser input and to not deal with such phenomena as attachment ambiguity. Ideally we expect a parser to successfully analyse a sentence on the basis of its grammar, but often there are problems caused by errors in the text or incompleteness of grammar and lexicon. Also the length of sentences and ambiguity of grammars often make it hard to successfully parse unrestricted text. An approach which addresses some of these issues is partial or shallow parsing. Abney (1997:125) succinctly describes partial parsing thus:
"Partial parsing techniques aim to recover syntactic information efficiently and reliably from unrestricted text, by sacrificing completeness and depth of analysis."
Partial parsers concentrate on recovering pieces of sentence structure which do not require large amounts of information (such as lexical association information); attachment remains unresolved for instance. We can see that in this way parsing efficiency is greatly improved.
Another strategy for analysing language is part-of-speech tagging, in which we do not seek to find larger structures such as noun phrases but instead label each word in a sentence with its appropriate part of speech. To give you some idea of what tagger output looks like, a paragraph of this Web page has been tagged using a tagger developed at the University of Stuttgart: http://www.ims.uni-stuttgart.de/projekte/corplex/DecisionTreeTagger.html. Here is the original paragraph from Section 3 of this document:
In a transfer model the intermediate representation is language dependent, there being a bilingual module whose function it is to interpose between source language and target language intermediate representations. Thus we cannot say that the transfer module is language independent. The nature of these transfer modules has obvious ramifications for system design in that addition of another language to a system necessitates not only modules for analysis and synthesis but also additional transfer modules, whose number is dictated by the number of languages in the existing system and which would increase polynomially according to the number of additional languages required.
The following table shows the tagger output , and we can see that most of the words have been correctly identified.
Table 1: Tagger output
|
In |
IN |
in |
|
a |
DT |
a |
|
transfer |
NN |
transfer |
|
model |
NN |
model |
|
the |
DT |
the |
|
intermediate |
JJ |
intermediate |
|
representation |
NN |
representation |
|
is |
VBZ |
be |
|
language |
NN |
language |
|
dependent |
JJ |
dependent |
|
, |
, |
, |
|
there |
RB |
there |
|
being |
VBG |
be |
|
a |
DT |
a |
|
bilingual |
JJ |
bilingual |
|
module |
NN |
module |
|
whose |
WP$ |
whose |
|
function |
NN |
function |
|
it |
PP |
it |
|
is |
VBZ |
be |
|
to |
TO |
to |
|
interpose |
VB |
interpose |
|
between |
IN |
between |
|
source |
NN |
source |
|
language |
NN |
language |
|
and |
CC |
and |
|
target |
NN |
target |
|
language |
NN |
language |
|
intermediate |
JJ |
intermediate |
|
representations |
NNS |
representation |
|
. |
SENT |
. |
|
Thus |
RB |
thus |
|
we |
PP |
we |
|
cannot |
VBP |
can |
|
say |
VB |
say |
|
that |
IN |
that |
|
the |
DT |
the |
|
transfer |
NN |
transfer |
|
module |
NN |
module |
|
is |
VBZ |
be |
|
language |
NN |
language |
|
independent |
JJ |
independent |
|
. |
SENT |
. |
|
The |
DT |
the |
|
nature |
NN |
nature |
|
of |
IN |
of |
|
these |
DT |
these |
|
transfer |
NN |
transfer |
|
modules |
NNS |
module |
|
has |
VBZ |
have |
|
obvious |
JJ |
obvious |
|
ramifications |
NNS |
ramification |
|
for |
IN |
for |
|
system |
NN |
system |
|
design |
NN |
design |
|
in |
IN |
in |
|
that |
DT |
that |
|
addition |
NN |
addition |
|
of |
IN |
of |
|
another |
DT |
another |
|
language |
NN |
language |
|
to |
TO |
to |
|
a |
DT |
a |
|
system |
NN |
system |
|
necessitates |
VBZ |
necessitate |
|
not |
RB |
not |
|
only |
JJ |
only |
|
modules |
NNS |
module |
|
for |
IN |
for |
|
analysis |
NN |
analysis |
|
and |
CC |
and |
|
synthesis |
NN |
synthesis |
|
but |
CC |
but |
|
also |
RB |
also |
|
additional |
JJ |
additional |
|
transfer |
NN |
transfer |
|
modules |
NNS |
module |
|
, |
, |
, |
|
whose |
WP$ |
whose |
|
number |
NN |
number |
|
is |
VBZ |
be |
|
dictated |
VBN |
dictate |
|
by |
IN |
by |
|
the |
DT |
the |
|
number |
NN |
number |
|
of |
IN |
of |
|
languages |
NNS |
language |
|
in |
IN |
in |
|
the |
DT |
the |
|
existing |
JJ |
existing |
|
system |
NN |
system |
|
and |
CC |
and |
|
which |
WDT |
which |
|
would |
MD |
would |
|
increase |
VB |
increase |
|
polynomially |
RB |
<unknown> |
|
according |
VBG |
accord |
|
to |
TO |
to |
|
the |
DT |
the |
|
number |
NN |
number |
|
of |
IN |
of |
|
additional |
JJ |
additional |
|
languages |
NNS |
language |
|
required |
VBN |
require |
|
. |
SENT |
. |
As with partial parsing, we are not trying to find correct attachments and since it is a limited task the success rate is quite high. The information derived from tagging can itself have input into partial parsing or into improving the performance of traditional parsers. Some of the decision task as to what is the correct part of speech to assign to a word is based on the probability of two or three word sequences (bigrams and trigrams) occurring, even where words can be assigned more than one part of speech. For instance, in our example tagged text the sequence 'the transfer module' occurs. Transfer is of course also a verb, but the likelihood of a determiner (the) being followed by a verb is lower than the likelihood of a determiner noun sequence.
For more detailed discussion see Chapter 10 of Manning & Schütze (1999).
See also the Visual Interactive Syntax Learning (VISL) website: http://visl.hum.sdu.dk. An online parser and a variety of other tools concerned with English grammar, including games and quizzes, can be found here. The site also contains links to corpora in different languages: http://visl.hum.sdu.dk/visl/corpus.html
Of course, in CALL we are dealing with texts that have been produced by language learners at various levels of proficiency and accuracy. It is therefore reasonable to assume that the parser has to be prepared to deal with linguistic errors in the input. One thing we could do is to complement our grammar for correct sentences with a grammar of incorrect sentences - an error grammar, i.e. we capture individual and/or typical errors in a separate rule system. The advantage of this error grammar approach is that the feedback can be very specific and is normally fairly reliable because this feedback can be attached to a very specific rule. The big drawback, however, is that individual learner errors have to be anticipated in the sense that each error needs to be covered by an adequate rule.
However, as already stated it is not only in texts that have been produced by language learners that we find erroneous structures. Machine translation is facing similar problems. Dina & Malnati review approaches "concerning the design and the implementation of grammars able to deal with 'real input'." (Dina & Malnati 1993:75). They list four approaches:
Consequently, the "most plausible interpretation of a [...] sentence is the one which satisfies the largest number of constraints." (Dina & Malnati 1993:80)
They conclude that "weak constraint-based parsing has proven to be useful in increasing the robustness of an NLP system" (Dina & Malnati 1993:88), basing their conclusion on the following advantageous features of the approach: non-redundancy, built-in preference mechanism, globality, efficiency, linguistic flexibility.
We have seen in Section 3 that Machine Translation (MT) and the political and scientific interest in machine translation played a significant role in the acceptance (or non-acceptance) as well as the general development of Human Language Technologies.
By 1964, however, the promise of operational MT systems still seemed distant and the sponsors set up a committee, which recommended in 1966 that funding for MT should be reduced. It brought to an end a decade of intensive MT research activity. (Hutchins 1986:39)
It is then perhaps not surprising that the mid-1960s saw the birth of another discipline: Computer Assisted Language Learning (CALL). The PLATO project, which was initiated at the University of Illinois in 1960, is widely regarded as the beginning of CALL - although CALL was just part of a huge package of general Computer Assisted Learning (CAL) programs running on mainframe computers. PLATO IV (1972) was probably the version of this project that had the biggest impact on the development of CALL. At the same time, another American university, Brigham Young University, received government funding for a CALL project, TICCIT (Time-Shared Interactive, Computer Controlled Information Television) (Levy1997:18). Other well-known and still widely used programs were developed soon afterwards:
In the UK, John Higgins developed Storyboard in the early 1980s, a total Cloze text reconstruction program for the microcomputer (Higgins & Johns 1984:57). (Levy 1997:25) describes how programs that extended this idea further:
Other programs such as Fun with Texts extended the total text reconstruction idea considerably by adding further activities.
See Section 8, Module 1.4, headed Text manipulation, for further information on total Cloze programs.
In recent years, the development of CALL has been greatly influenced by the technology and by our knowledge of and our expertise with it, so that not only the design of most CALL software, but its classification has been technology-driven. Wolff (1993:21) distinguishes five groups of applications:
The late 1980s saw the beginning of attempts which are mostly subsumed under Intelligent CALL (ICALL), a "mix of AI [Artificial Intelligence] techniques and CALL" (Matthews 1992b:i). Early AI-based CALL was not without its critics, however (Last 1989):
In going down this dangerous path, it seems to me that we are indeed seeking to marginalise humanity and create a race of computerised monsters which, when the power of decision-making is given into their hands, will decree that the human race, with its passions, inconsistencies, foibles and frailties, should be declared redundant, and that the intelligent machine shall inherit the earth. And that, fundamentally, is why my initial enthusiasm has now turned so sour. (Last 1989:153)
For a more up-to-date and positive point of view of Artifical Intelligence, see Dodigovic (2005).
Bowerman (1993:31) notes: "Weischedel et al. (1978) produced the first ICALL [Intelligent CALL] system which dealt with comprehension exercises. It made use of syntactic and semantic knowledge to check students' answers to comprehension questions."
As far as could be ascertained, this was just the early swallow that did not create a summer. Krüger-Thielmann (1992:51ff.) lists and summarises the following early projects in ICALL: ALICE, ATHENA, BOUWSTEEN & COGO, EPISTLE, ET, LINGER, VP2, XTRA-TE, Zock.
Matthews (1993:5) identifies Linguistic Theory and Second Language Acquisition Theory as the two main disciplines which inform Intelligent CALL and are (or will be) informed by Intelligent CALL and adds "[t]he obvious AI research areas from which ICALL should be able to draw the most insights are Natural Language Processing (NLP) and Intelligent Tutoring Systems (ITS)" (Matthews1993:6). Matthews shows that it is possible to "conceive of an ICALL system in terms of the classical ITS architecture" (ibid.). The system consists of three modules - expert, student and teacher module - and an interface. The expert module is the one that "houses" the language knowledge of the system. It is this part which can process any piece of text produced by a learner - in an ideal system. This is usually done with the help of a parser of some kind.
The use of parsers in CALL is commonly referred to as intelligent CALL or 'ICALL'; it might be more accurately described as parser-based CALL, because its 'intelligence' lies in the use of parsing - a technique that enables the computer to encode complex grammatical knowledge such as humans use to assemble sentences, recognise errors, and make corrections. (Holland et al. 1993:28)
This notion of parser-based CALL not only captures the nature of the field much better than the somewhat misleading term "Intelligent CALL" (Is all other CALL un-intelligent?), it also identifies the use of Human Language Technologies as one possible approach in CALL alongside others such as multimedia-based CALL and Web-based CALL and thus identifies parser-based CALL as one possible way forward for CALL. In some cases, the (technology-defined) borders between these sub-fields of CALL are not even clearly identifiable, as we will see in some of the projects mentioned in the following paragraphs.
To exemplify recent advances in the use of sophisticated human language technology in CALL, let us have a look at some of the projects that were presented at two conferences in the late 1990s. The first one is the Language Teaching and Language Technology conference in Groningen in 1997 (Jager et al. 1998).
Carson-Berndsen (1998) discussed APron, Autosegmental Pronunciation Teaching, which uses a phonological knowledge base and generates "event structures for some utterance" (op.cit.:15) and can, for example, visualise pronunciation processes of individual sounds and well-formed utterances using an animated, schematic vocal tract. Witt & Young (1998), on the other hand, are concerned with assessing pronunciation. They implemented and tested a pronunciation scoring algorithm which is based on speech recognition (see Section 4.2