ICT4LT Module 2.4
This is a very large module: around 50 pages. Please wait for it to load completely; it may take several minutes.
This Web page is designed to be read from the printed page. Use File / Print in your browser to produce a printed copy. After you have digested the contents of the printed copy, come back to the onscreen version to follow up the hyperlinks.
The aim of this module is to introduce language teachers to the use of concordances and concordance programs in the Modern Foreign Languages classroom. Concordancing is part of Corpus Linguistics, which is dealt with by Tony McEnery & Andrew Wilson in Module 3.4. Section 2.2 of this module includes a brief introduction to corpus linguistics.
Marie-Noëlle Lamy, Open University, UK.
Hans Jørgen Klarskov Mortensen, Vordingborg Gymnasium, Denmark.
With an introduction by Graham Davies, Thames Valley University, UK.
A concordance, according to the Collins Cobuild English Dictionary, is:
An alphabetical list of the words in a book or a set of books which also says where each word can be found and often how it is used.
I first came across the term concordance from one of the lecturers who taught me at university during the early 1960s. He had produced a concordance of the complete works of a modern German writer - manually and without the help of a computer. It was a massive and laborious task, during the course of which a good deal was revealed about the writers use of language, and it gained the lecturer a PhD. Nowadays such an undertaking would not qualify for the award of a doctorate because a computer can do the job in a matter of hours or - even minutes.
I was introduced to concordancing programs - concordancers for short - in the late 1970s, initially using COCOA and OCP, both of which ran on mainframe computers. In the early 1980s I wrote my own concordance program in BASIC on a Prime minicomputer and used it with language students at Ealing College of Higher Education in connection with my classes on text analysis. A version of this concordancer was also incorporated into the 1985 BBC Micro version of Fun with Texts and adapted for the 1992 DOS version by Marco Bruzzone.
Nowadays I often use a concordancer to check my own writing style. It picks up my over-frequent use of certain words, and it is particularly helpful when used in conjunction with a thesaurus. A thesaurus never gives you enough authentic examples of usage to tell you how to use a word with which you are unfamiliar, but a concordancer does - providing you have a decent corpus of authentic texts: see Activity 12 in Section 4.
Concordancers are used extensively these days for creating glossaries and dictionaries, and they are extremely valuable tools for the language teacher. Curiously, they tend to be used rarely by teachers of Modern Foreign Languages in the UK but are well established in the EFL profession worldwide. Lets hope that this module will make a few converts.
Tim Johns was one of the first language teachers to make use of concordancers in the languages classroom: see Tim Johns's website. Back in the early 1980s Johns began to make use of the concordancers available on the big mainframe computers at the University of Birmingham. He then wrote a concordance program that ran on one of the first popular microcomputers, the Sinclair ZX81: see Higgins & Johns (1984:88-93). Johns later developed the concept of Data Driven Learning (DDL): see Johns (1991). DDL is an approach to language learning whereby the learner gains insights into the language that he/she is learning by using concordance programs to locate authentic examples of language in use, i.e. what this module is all about. In DDL the learning process is no longer based solely on the teacher's initiative, his/her choice of topics and materials and the explicit teaching of rules, but on the learner's own discovery of rules, principles and patterns of usage in the foreign language. In other words, learning is driven by authentic language data. Johns wrote one of the first commercially available classroom concordancers, MicroConcord, which was published by Oxford University Press. Murison-Bowie (1993), the author of the MicroConcord Manual, gives some very persuasive reasons for using a concordancer:
Whether one opts for putting up a case, or for knocking one down, any search using [a concordancer] is given a clearer focus if one starts out with a problem in mind, and some, however provisional, answer to it. You may decide that your answer was basically right, and that none of the exceptions is interesting enough to warrant a re-formulation of your answer. On the other hand, you may decide to tag on a bit to the answer, or abandon the answer completely and to take a closer look. Whichever you decide, it will frequently be the case that you will want to formulate another question, which will start you off down a winding road to who knows where. (Murison-Bowie (1993:46), cited in Rézeau (2001:153)
Rézeau writes:
It is precisely this "winding road", along which one may come across serendipity learning, which give concordances a certain appeal. In addition, once you have started relying on the evidence of the data for checking the "rules" found in grammar-books as well as your own "intuitions" about language, concordances tend to become an indispensable tool. It is hoped that the rationale and examples given in this chapter will have convinced its readers to take a trip to the country of concordancers to observe "the company that words keep" (Firth 1957:187), cited in Rézeau (2001:154)
A memorable phrase: "the company that words keep". It's what we have been teaching language learners to be aware of for many, many years. Now technology makes it easier.
The remainder of this module has been written by Marie-Noëlle Lamy and Hans Jørgen Klarskov Mortensen. Over to them
Contents of Section 1
What is a concordance? The simplest way to answer this is to look at some English ones to begin with. For instance here is a concordance for the word "sin", prepared manually, and shown with the text from which the four separate occurrences of this word are taken.
Concordance 1 on the word "sin":
|
1. Thus from my lips, by yours, my |
Sin
|
is purged. |
|
2. Then have my lips the |
Sin
|
that they have took. |
|
3. |
Sin
|
from thy lips? O trespass sweetly urged! |
|
4. Give me my |
Sin
|
again. |
Text used as basis for the concordance, with the keyword in bold:
JULIET
Ay, pilgrim, lips that they must use in prayer.
ROMEO
O, then, dear saint, let lips do what hands do;
They pray, grant thou, lest faith turn to despair.
JULIET
Saints do not move, though grant for prayers sake.
ROMEO
Then move not, while my prayers effect I take.
Thus from my lips, by yours, my sin is purged.
JULIET
Then have my lips the sin that they have took.
ROMEO
Sin from thy lips? O trespass sweetly urged!
Give me my sin again.
So a concordance is a list of words (called keywords, e.g. here "sin"), taken from a piece of authentic language (corpus, e.g. here Romeo and Juliet), displayed in the centre of the page and shown with parts of the contexts in which they occur (here maximum 29 characters to the left of the keyword and to the right). This is also known as a Key Word In Context or KWIC concordance.
Now look at that same concordance, displayed with fuller context (here between 75 and 80 characters each side, including blank spaces):
|
1. move not, while my prayers effect I take. Thus from my lips,
by yours, my sin is purged. JULIET Then have my lips the sin that they have took. ROMEO |
| 2. Thus from my lips, by yours, my sin is purged. JULIET Then have my lips the sin that they have took. ROMEO Sin from thy lips? O trespass sweetly urged! |
| 3. is purged. JULIET Then have my lips the sin that they have took. ROMEO Sin from thy lips? O trespass sweetly urged! Give me my sin again |
| 4. they have took. ROMEO Sin from thy lips? O trespass sweetly urged! Give me my sin again. |
The KWIC and the fuller context display are both useful, depending on what you want to do with the material.
So there you have the basic ingredients for any concordance: a text base and a procedure. But whereas the procedure was manual and it gave us an extremely limited concordance (the concordance had only four citations), the meanings of the word "sin" that appear in it are rooted in the poetic world of Romeo and Juliet. Below, in contrast, is a concordance on the same keyword, based this time on a 25-citation sample created by a concordancer, using contemporary including British and American books, ephemera, newspapers, magazines, radio transcripts and transcriptions of ordinary conversations.
Concordance 2 on the word "sin":
|
1. said cohabiting was no longer a |
sin
|
. Serbs free last six |
|
2. daily care of others was the ultimate |
sin
|
. We arranged for Ted to spend a |
|
3. remarkable. Shaws rendition was a |
sin
|
against culture, an insult to Eliot |
|
4. them that God wants them to turn from |
sin
|
and transform their lives. Women |
|
5. the ascendancy to and loss of power; |
sin
|
and redemption; self-doubt and |
|
6. to prove that all that a life of sex, |
sin
|
and St Tropez sun brings is wrinkles |
|
7. taken seriously. Julians account of |
sin
|
and forgiveness stands unexcelled |
|
8. deepening anxiety over the question of |
sin
|
and evil, she took it up. Carolly |
|
9. can spring as much from a sense of |
sin
|
as from sanctity. That, thank God, |
|
10. Roebuck was dismissed to the |
sin
|
bin for 10 minutes for his part in |
|
11. is pride, covetousness, deceit and |
sin
|
, but say youll accept adultery and |
|
12. is like Sodom and Gomorrah -you know, |
Sin
|
City. So the very word Youngstown |
|
13. of rubber safety bumpers, as ugly as |
sin
|
. Few mourned its passing. [p] That |
|
14. White.26 He finds the earthly ideas of |
sin
|
, guilt, punishment, good and evil |
|
15. BERLIN CABARET NOW Decadence, satire, |
sin
|
bohemian excess Once |
|
16. sumptuous food shops. with a sense of |
sin
|
, I bought some on Nevsky Prospekt |
|
17. to mine without a tumble. The only |
sin
|
Ive committed is not having you |
|
18. sin of all: I have heard of a certain |
sin
|
. I thank God that I do not know of |
|
19. cannot announce Gods forgiveness of |
sin
|
in the Absolution and cannot |
|
20. It was during the Reformation that |
sin
|
in Scotland really got going. Any |
|
21.sin is prevalent. Although this |
sin
|
is a comment on all of mankind, it |
|
22. sounds a bit stage-ethnic: `The only |
sin
|
is to believe that happiness is gone |
|
23. insisting on the concept of original |
sin
|
. It would take on a kind of |
|
24. bed the selfsame one! More primal than |
sin
|
itself, this fell to me. [f] |
|
25. do nothing to deal with her problem of |
sin
|
. Joni was disturbed by Carls |
In contrast to the Shakespeare concordance in which the original lines from the play were short enough to fit entirely within the display, here the left and right are chopped off, in this case to 38 maximum characters including blank spaces, a number which in many concordancers can be adjusted to give a less disorienting look to the citations. We will see in Section 5 how important (and also how contentious) the issue of doctoring the results of a search is.
First, for those teachers who like to work with both the target language and the mother tongue, we will say a few words about bilingual or multilingual concordances, also known as parallel concordances. Imagine a novel in Language A and a translation of that text in Language B. Or, in a European context, think of an official document translated into all the languages of the EU. Suppose you want to study how a French word like the preposition "pour" is phrased in different parts of the original texts. Using normal concordancing techniques, the program is able to find all occurrences of pour in French, also identifying the paragraphs and sentences in which those instances occur - e.g. sentence 3 in paragraph 2, sentence 4 in paragraph 3, and so on. Then the parallel concordancer finds the equivalent sentences in the translated text. Preparation of the corpus for use with parallel concordancers has to be meticulous. The two (or more) texts must have been aligned in advance paragraph by paragraph, so that paragraph 3 in one language is equivalent to paragraph 3 in the other (but not sentence by sentence, as we know that translators may well render one sentence by two, or two sentences by one, and so on). Here is an example showing how "pour" relates to various structures in English
A parallel French-English concordance on "pour" using an extract from Le Petit Prince by Antoine de Saint Exupéry
| Original text | Translation |
| 1. Ainsi, quand il aperçut POUR la première fois mon avion [...] | 1. The first time he saw my aeroplane, for instance [...] |
| 2. Alors elle avait forcé sa toux POUR lui infliger quand même des remords. | 2. Then she forced her cough a little more SO THAT he should suffer from remorse just the same. |
| 3. -Approche-toi que je te voie mieux, lui dit le roi qui était tout fier dêtre enfin roi POUR quelquun. | 3. Approach, so that I may see you better, said the king, who felt consumingly proud of being at last a king OVER somebody. |
| 4. Car, POUR les vaniteux, les autres hommes sont des admirateurs. | 4. For, TO conceited men, all other men are admirers. |
| 5. Cest comme POUR la fleur. | 5. It is just as it is WITH the flower. |
| 6. Cest donc POUR ça encore que jai acheté une boîte de couleurs et des crayons. | 6. It is FOR THAT PURPOSE, again, that I have bought a box of paints and some pencils. |
| 7. Cest le même paysage que celui de la page précédente, mais je lai dessiné une fois encore POUR bien vous le montrer. | 7. It is the same as that on page 90, but I have drawn it again TO impress it on your memory. |
| 8. Elle ferait semblant de mourir POUR échapper au ridicule. | 8. She would [...] pretend that she was dying, TO avoid being laughed at. |
| 9. et cétait bien commode POUR faire chauffer le déjeuner du matin | 9. and they were very convenient FOR heating his breakfast in the morning., |
| 10. Il commença donc par les visiter POUR y chercher une occupation et POUR sinstruire. | 10. He began therefore, by visiting them, IN ORDER TO add to his knowledge. |
| 11. Il me fallut longtemps POUR comprendre doù il venait. | 11. It took me a long time TO learn where he came from. |
| 12. Javais le reste du jour POUR me reposer, et le reste de la nuit POUR dormir... | 12. I had the rest of the day FOR relaxation and the rest of the night FOR sleep. |
| 13. POUR toi je ne suis quun renard semblable à cent mille renards | 13. TO you, I am nothing more than a fox like a hundred thousand other foxes |
This example can be found (along with other useful examples) with an exploitation for the classroom at Joseph Rézeaus website on Data Driven Learning. See also St John (2003).
Multiconcord is an example of a multilingual parallel concordancer. The following Web page describes the work undertaken at the University of Birmingham as a contribution to an EC-funded Lingua project, coordinated by Francine Roussel, Université de Nancy II, to develop a parallel concordancer for classroom use (programmer David Woolls, with Birmingham University support from Philip King and Tim Johns): http://artsweb.bham.ac.uk/pking/multiconc/l_text.htm
If anyone tries to tell you that this sounds like the sort of work that goes on only at university level, dont believe them! Secondary school children are quite capable of making use of concordancers, providing you and they are well prepared for the task, as we will try to illustrate in Section 5.
Another interesting use of concordances is to compare texts produced by native and learner speakers of a language. For example, you could put your students French or German essays into a concordancer (assuming they had prepared them on a word-processor in the first place), alongside a body of authentic French or German texts. Then you could study how students position words in sentences, and compare this with native speakers. Better still, you could get students to do this comparison themselves, as in Activity 13 in Section 4.
Although powerful professional concordancers can produce many different types of concordances and other sophisticated data, and as such are invaluable to linguistics and literary researchers and to lexicographers, for most teachers KWIC concordances in the target language are quite sufficient to their needs, and the rest of this module will concentrate on those, with some interesting exceptions.
Have a go at creating your own KWIC concordance, using an English keyword of your choice. See our list of online concordancers and corpora below: Websites.
Contents of Section 2
Two things are required to produce a set of Key Words in Context (KWIC): A piece of concordance software, a concordancer, and a corpus of (electronic) texts. The software first:
A simple concordancer can make a concordance of a string of words, parts of words or other parts of writing (such as punctuation etc.). As mentioned in Section 1.1, we call this a KWIC concordance. But some concordancers are also able to produce a full concordance comprising all the words and other linguistic elements of the corpus. In reality there are numerous parameters to look for, such as speed, the size of the corpus the software can handle, the languages supported, the amount and quality of the documentation; especially the last point might be important if you are new to concordancing.
It is also important to bear in mind that this brief presentation of some of the concordancers on the market is not a software review as such but simply a presentation to make you familiar with some of the key features to look for and the screens that youll be working with. Trial or demo versions of most of these concordancers are available on the Web. All the necessary information can be found at their websites. Here well only deal with the very basic differences. Pricing also varies a lot - and so does the amount and quality of the documentation.
MicroConcord, which was written by Mike Scott in collaboration with Tim Johns, was originally released by Oxford University Press in 1993, together with a substantial corpus of texts from the Independent newspaper and a manual by Tim Johns. This publication is now out of print.
MicroConcord can only produce a KWIC concordance, but it is very fast indeed. There are other limitations to it too, but if one can get used to a somewhat dated DOS interface (no icons, no graphics) it does the job very well indeed. It also allows printing of a concordance (.txt format), and it is possible to blank out the search word if you want to make a fill-in exercise. It supports most European languages but characters bearing diacritics are a problem.
Wordsmith
is a set of language tools developed by Mike Scott. The concordancer which comes
along with this pack can make both KWIC concordances and a full concordance.
The search is quite fast, and it has many features, but it may be a bit difficult
for the beginner. Naturally it allows printing of a concordance. The documentation
comes in the form of an online help file. This gives a very detailed explanation
of Wordsmiths immense range of features.
Concordance by R.J.C. Watt of Dundee University makes both a full concordance and a KWIC-concordance (by Watt called Fast Concordance). The Fast Concordance is really fast. The Full Concordance is, of course, a bit slower, and making a full concordance of a very large corpus will require a lot of computer power and patience. But a full concordance of Sir Walter Scotts Ivanhoe (about 200,000 words) took about 5 minutes on a Pentium 166MHz machine with 64MB of RAM.
The user interface is quite intuitive once you have worked a little bit with it. The split screen, with a wordlist on the left and the concordance on the right, is a nice feature. Printing a concordance is possible. This concordancer supports most European languages. Unlike the other concordancers, Concordance is able to convert a full concordance into HTML format so that the concordance can be used interactively through a Web browser. This makes it well suited for literary studies: see Activity 14 in Section 4.
Figure 2: A screenshot from Concordance with Text view window opened
MonoConc by Athelstan is much like a Windows version of MicroConcord. It can only produce single word concordances, but it is very fast indeed, and since it is not so crammed with features the screen layout is very simple to work with. Like the others, this piece of software allows printing of the concordances.
Figure 3: A screenshot from MonoConc
Another concordancer is known as the Simple Concordance Program (SCP). Written by Alan Reed, this program is available free of charge. This program lets you create word lists and search natural language text files for words, phrases, and patterns. SCP is a concordance and word listing program that is able to read texts written in many languages.There are built-in alphabets for English, French, German, Greek, Russian, etc. SCP contains an alphabet editor which you can use to create alphabets for any other language.
A different kind of analysis tool is PhraseContext. According to the author, Hans Jørgen Klarskov Mortensen, the main idea behind it was not to create yet another concordancer, but to produce a more interactive tool. Most concordancers mainly present results which can be perused on the screen. PhraseContext can export nearly all its results in plain text format, which is directly editable in the small editors that it features. These results can also be sent to the Clipboard, and/or in most cases be saved to a text file. An extension of this is what the author calls a "PhraseBook", a collection of annotated keywords and KWICs. In this way people - and there seems to be more and more of them - who use a specialised corpus as a language reference in their research, can build a collection of linguistic problems they have already solved.
Another of PhraseContext's features is the ability to save wordlists, concordance lines and the PhraseBook to XML-files. This output can be manipulated by means of CSSs and/or Javascript and/or XSL-formatting files for use in Web browsers. So far such scripts are sadly lacking, but the current version of PhraseContext comes with some basic XSL-formatting files.
Besides ordinary concordancing tasks such as word frequency lists, application of stoplists etc, PhraseContext also calculates statistical significance (T-score, Z-score, MI and standard deviation) of collocations and it retrieves clusters of words up to a length of 6 words.
The documentation explains the main features of the software and outlines the necessary linguistic choices the author had to make. References to relevant literature are also included. Speed: Generation of a wordlist of about 700,000 tokens (all in all 57 files): about 20 sec. (on a 650 Mhz machine (sic!)). On the same machine: 23,232 concordance lines of "the" in a corpus of slightly less than 1,000,000 words (i.e. Shakespeare's collected works) took 930 msec. The last unique word in that corpus was displayed in about 110 msec.
Figure 3a: A screenshot from PhraseContext

It is necessary to have some notion of what a corpus is, in order to work with a concordancer. Concordancing is part of Corpus Linguistics, which is dealt with by Tony McEnery & Andrew Wilson in Module 3.4. See also Michael Barlows Corpus Linguistics Site.
In this module we will only cover the most basic elements.
A corpus is either just one text or a collection of texts. In Section 1 samples of KWIC concordances from Romeo and Juliet are shown. In this case the corpus was Shakespeares play. A corpus can also be just one students essay. It goes without saying that if the intention is to study the style of, say, Shakespeare the corpus must be limited to his works, but if the intention is to study the grammar and semantics of a whole language, the corpus must contain many texts representing many genres. Likewise: If we want to study 18th-century English we must make sure that the corpus contains a representative amount of texts from the 18th century only. So the contents of a corpus depend on the aims of the user.
How big a corpus one needs also depends on what it is to be used for. Basically the corpus must be so big that there are enough occurrences of the language elements we want to study. For comparison: Cobuild uses a corpus of about 200 million words of written and spoken UK, US and NZ English in dictionary compilation. Birmingham Universitys The Bank of English corpus comprises about 500 million words, and is well suited for linguistic research. Letting our students loose on such vast masses of text is, in most cases, likely to create more confusion that clarity. Less will often do. But, of course, if confronted with a really ardent advocate of misguided ideas of what is correct usage and what is not, a failure to find examples of the misguided expressions in a corpus of 400-500 million words just might make an impression on him/her. Chris Tribble argues that a specialist micro corpus of about 25,000-30,000 words will be quite adequate for most educational purposes. On the other hand, see Tribble and Jones (1997:11): We tend to think that a word like crime is a common word but it actually occurs only about 20 times in every one million words of a 'balanced" collection of texts such as the Longman-Lancaster corpus. Later well show examples of what can be done with a corpus of about 50,000 German words.
One of the prime advantages of concordancing in language teaching is the opportunity to use relevant, authentic and interesting examples as opposed to made-up traditional grammar examples. This means that if we are trying to teach students how to write an argumentative essay, we should use authentic argumentative texts to teach them the language that such essays call for. And likewise, if the subject is imaginative writing, we should use model texts that fit this genre. How difficult an issue this really is can also be seen from the following example. Recently a Danish publisher released a massive 2277-page English-Danish dictionary based primarily on a corpus of 19th century texts. As one reviewer of the dictionary comments:
If you are reading Unsworths medieval novel from 1995 you will not be able to find Ostler, Tourney, Morality Play, Lychgate nor Mead. [ ] If you are reading classical ballads, you will not be able to find fain. [ ] Reading a Bram Stoker short story, Draculas Guest, you will not be able to find "he answered fencingly.
These examples are more than just a pedants protest - they illustrate how vast and complex our languages are (Source: Mogens Kjær: To-i-en?, Gymnasieskolen, Nr. 3, 2000, pp. 27ff.).
In a few cases both concordance software and a useful corpus can be found online. Here are some examples:
English and Multilingual
British National Corpus: A very large corpus of modern British English designed to present as wide a range of modern English as possible. English only.
Collins Cobuild: In a few cases the technical problems have already been solved for you. We have already mentioned Cobuilds Bank of English corpus. The corpus comprises about 56 million words of contemporary UK, US and NZ spoken and written English. Access is by subscription. English only.
Google: Using Google as a simple concordancer, e.g. to check for possible collocations, works quite well. Is is possible, for example, to say "a metal wood"? Yes, indeed! Google cites numerous examples. In German does one say "Ich bin im Internet gesurft" or "Ich habe im Internet gesurft"? Well, both are used, but one form definitely dominates. Enter the whole phrase in inverted commas in Google's search box and you will find hundreds of examples of how the phrase is used. You can use a wildcard (* - the asterisk character) if you are not sure of the spelling of a word or wish to look for two words used together but separated by other letters or words, e.g. a search for ich * habe gesurft (no inverted commas round the phrase) will find "Ich habe gesurft" and "Ich habe gestern mittag noch normal gesurft" - very handy in German when different parts of the verb are separated. Enter the combination ich * habe * Internet * gesurft (no inverted commas round the phrase) and you should find examples such as "Dann habe ich im Internet nach Rezepten gesurft": http://www.google.co.uk. See Robb (2003). Multilingual.
KWICFinder: A concordancer that rides on the back of a standard search engine, enabling the whole Web to be used as a text corpus - very impressive! Multilingual.
KWICionary:
A Web-based, Data Driven Learning tool, produced at Penn State University, that
permits users to search a corpus of over 70 million words in three genres of
German: literary, journalistic, and conversational usage.
Web Concordancer: A simple Web concordancer is available at the EDICT Virtual Language Centre of the Polytechnic University of Hong Kong. The corpus that this service provides is somewhat smaller than Cobuild but still very useful. See Activity 11 in Section 4. English, French, Chinese, Japanese.
WebCorp: A concordancer that works right across the Web, riding on the back of different search engines. It's quite slow, but it produces good results. WebCorp includes a word-list generator that will produce a word frequency list of a Web page in a wide variety of languages. Operated and maintained by the Research and Development Unit for English Studies (RDUES) at the Birmingham City University, UK. English.
German
Mannheim Corpus: A very big - and free - corpus
of German texts. This includes a choice of corpora and a lot of search facilities.
French
Corpus Lexicaux Québécois: Canadian French corpora
with search facilities.
Other languages
See the references under Websites.
A characteristic feature of online corpora and concordancers is their size - they are in fact very big indeed. They can be used to create your own handouts for your students - or for your own reference. But classroom use of them is perhaps only suitable for quite advanced students who are really interested in linguistic details and who really understand what a corpus is, what the search facilities do and how they work. Later we will show some examples of their use.
For this project we need a German corpus. Let us suppose that this does not already exist so we will have to make it ourselves. In this case we are only interested in examples of relatively elementary German grammar, so almost any modern German text written by a professional writer will do. We aim to use the Internet to get the texts. This is the step-by-step process:
If for some reason you do not want to use this method you can save the Web pages. Internet browsers such as Netscape Navigator and Internet Explorer allow saving of the page on the screen to .txt format. In these cases you also choose: Save as Internet Explorer has a dropdown menu from which you can choose .txt format. Netscape Navigator will save to .txt format if you change the extension - usually .htm or .html - to .txt. Otherwise you will have to use Windows Clipboard: Mark the text, copy it to the Clipboard, paste it into your word-processor and save it as plain ASCII/ANSI-text: see Section 2.2.6 below for further explanations of these terms.
CD-ROM
Instead of using texts collected from online sources you can use CD-ROM
encyclopaedias or any other source of electronic text as suggested by Chris
Tribble. The practical method is basically the same as the one described
above in Section 2.2.5.
Texts on
paper
It is also possible to convert texts on paper into machine-readable text.
For this youll need a scanner that can convert the printed text into a
computer image. The scanner only makes a digitised image of the text. But a
so-called OCR (Optical Character Recognition) program can convert the
text into machine-readable text. Nowadays good scanners and quite sophisticated
OCR software are quite reasonably priced. Usually OCR software is supplied with
the scanner, and more often than not that software will adequately suit these
needs. Of course scanning and recognising paper texts takes much longer than
just copying them from the Internet, although it may take you time to find what
you want on the Internet.
Typing text
The most time consuming way of getting machine- readable text is to type
it into a word-processor. But it can be done!
What
format? ASCI or ANSI?
All the concordancers described in this module require plain ASCII/ANSI
text-format and usually the concordancers prefer the text formatted with
CR/LF (a so-called hard return) after each line. All modern word-processors
can save in ASCII or ANSI format. Usually you choose Save as
and
then you get a drop-down menu with the different formats.
There is a difference between ASCII and ANSI text format, which is important if you are working with other languages than English. ASCII is the oldest computer text format and was created on the basis of English. ANSI text, a variant of ASCII format, is used by Windows. The advantage is that ANSI text-format includes consistent codes for characters using diacritical signs allowing us to make concordances of all the European languages - and non-European if the appropriate fonts are installed on the computer. The Windows concordancers mentioned above all work with ANSI text-format, whereas MicroConcord (a DOS program) works with ASCII text-format.
Stevens (1995) notices that the corpus preparer may introduce bias into the corpus if he/she selects data based on preconceived notions of what ought to be there, or on pedagogic grounds. Is this a tricky issue at all? If so: why? Compare also with traditional grammars and textbooks.
Contents of Section 3
Concordances date back to the Middle Ages, when, like other massive undertakings like Gothic cathedrals or the Bayeux Tapestry, they took up an unimaginable amount of person power. An early example of this, according to Tribble & Jones (1997), is the first known complete concordance of the Latin Bible, the work of some five hundred Benedictine monks working under Hugo de Sancto Charo. Biblical concordances are indexes comprising the words in the Bible and the location of the texts where they can be found. The Encyclopaedia Britannica lists a number of early biblical concordances, including that drawn up by Mercator, the 14th century cartographer. The other favourite corpus of texts for early concordancers, at least in the English-speaking world, is Shakespeare. Encyclopaedia Britannica tells us that Bartlett, the American bookseller and editor best known for his Familiar Quotations wrote, after many years of labour, a Complete Concordance to Shakespeares Dramatic Works and Poems (1894), a standard reference work that surpassed any of its predecessors in the number and fullness of its citations.
Because of the canonical status which they have in the culture of the English-speaking world, Biblical and Shakespearian texts have two things in common: they need to be frequently and efficiently accessed, and they have to be interpreted (and reinterpreted). So these early concordances functioned as archiving tools, answering the access need, and as text analysis tools, facilitating interpretation of meanings by bringing words and their contexts into closer proximity on the page, thus sharper focus.
Todays computerised concordances still fulfil these two functions, the practical and the scholarly. Professional archives, on the Internet or on the intranet in libraries and companies, illustrate the more practical use. For example, if I am a lawyer or a law student, I can access a concordance of legal contexts for the keyword Im interested in, and I then can assess the currency and coverage of the legal concept under scrutiny. This is clearly of great practical advantage to me. On the other hand, if I am working with language itself, whether as a lexicographer, a translator, a terminologist, a researcher in linguistics, a literary scholar, a language policy specialist, or even perhaps a forensic linguist, I will be interested both in accessing the right texts fast, and in interpreting the language which I discover. So intense is interest in the scholarly application of concordancing that since the 1980ies many national cultures have invested heavily in the creation of great electronic searchable databases, which are real monuments to their language and their literature. For non-English-speaking cultures, such initiatives can also be an important part of their political strategy for linguistic survival.
Search the Internet to find out as much as possible about one of the following great national language corpora. For instance what is it called, when was it set up, by whom, how big is it, what kind of corpus does it use, what are the conditions for access to it, how frequently is it updated, what kind of search facilities does it offer?
French: The ARTFL Project, the project for American and French Research on the Treasury of the French Language
German: The Mannheim Corpus
Italian: The OVI Project (Opera del Vocabolario Italiano)
English: The British National Corpus
Educational concordancing too has a history, although it is much shorter, having started in the 1980ies. For a summary of its evolution in ELT have a look at Stevens (1995). Claims made for concordancing for educational purposes typically have several facets. One is that concordancers facilitate access to 'real' target language (TL) lexical and grammatical structures. Another is that they can make students more active and independent analysers of language, turning them to an extent into language researchers. The rest of this section looks at how they fit in with current teaching methodologies and practices.
Language teachers want to provide activities and materials that conform with native speakers use of the language. The belief is that this is motivational and provides better preparation for learners when they come into contact with written or spoken native speaker utterances. Many traditional grammar books, textbooks and dictionaries contain only invented examples, and that can only reflect the particular ways in which their authors, eminent scholars though they may be, use their mother tongue. However, a language is owned by all its native speakers, not by one small subset, and furthermore, it evolves all the time. But can we as teachers, whether we are trained non-TL speakers or TL native speakers, always claim to have a realistic perception of real usage as it evolves? How many of us have had the embarrassing experience of giving a "rule" to a learner, only to be contradicted by some piece of TL evidence? Sinclair (1986:185-203) encapsulates the teachers problem with the comment that we need to find explanations that fit the evidence, rather than adjusting the evidence to fit a pre-set explanation. Working with real language data, also called Data Driven Learning (DDL) by Tim Johns, fits this aim perfectly. DDL allows very easy access to a huge number of extremely varied native speaker productions (although unsurprisingly it is a little more difficult to find transcriptions of spoken language). See also Module 3.4, Corpus Linguistics.
However, this raises the question of how prescriptive teachers should be in the choice of TL models offered: it is all very well for French natives to write des grands bateaux, flouting the grammar book rule that says it should be de grands bateaux, but we should teach grammar book usage or street usage? This question arises in every form of language teaching, but in concordance-based teaching the issue is brought into sharper focus. If the corpus that we use contains unedited material - as it should, if we want to be authentic - then concordancing searches will throw up some questionable usages. There are ways of avoiding this (for example pre-editing the corpus ourselves or using only carefully edited TL texts such as encyclopaedias and other pedagogical texts) but this re-introduces an element of teacher control by the back door and defeats the purpose of exposing learners to real TL forms. Pragmatic decisions will be needed, based on learners proficiency levels and teaching objectives.
3.2.1.1 Discussion topic
It is important to provide students with examples taken from real corpora, according to McEnery & Wilson (1996), because they expose students at an early stage in the learning process to the kinds of sentences and vocabulary which they will encounter in reading genuine texts in the language or in using the language in real communicative situations. Discuss some arguments in support or in contradiction of this idea (perhaps successively adopting the perspectives of grammar-based teaching and of communicative teaching).
3.2.1.2 Learning task
Think about how you would explain the difference between "uninterested" and "disinterested" to (a) native English speakers and (b) non-natives. Use one of the online concordancers and corpora that we list below under Websites to help you determine the difference. How would you use this data to enrich your explanations?
Kettemann (1996) cites the Council of Europe's document (1994:10) concerning the Common European Framework of Reference (CEFR) for Languages: "Language pedagogy has hitherto paid little attention to this dimension but should in fact develop explicit objectives and practices to teach methods of discovery and analysis." This, as Kettemann reminds us, is a very good reason why teachers should use computers in the classroom. The computer, he points out, is a powerful hypothesis testing device on vast amounts of data, [ ] allows controlled speculation, makes hidden structures visible, enhances at the same time imagination and checks it by inductivity, thus making higher degrees of objectivity possible. See also Kettemann's article entitled "On the role of context in syntax and semantics".
Why would we want a powerful hypothesis-testing device? Because learners need to have tested any rule against as many examples as possible before they can fully internalise it. Why involve the imagination? Because learners remember the knowledge which they have formulated themselves rather than formulations which have been imposed on them.
These are only assumptions, but they have found support in research, see Stevens (1995). Undeniably, whether they are displayed as KWIC lists (like the example of "sin" in Section 1) or as columns of matching data (like the parallel concordance example using "pour"), concordance outputs make patterns more noticeable.
Have a look at Activity 4 (French) or Activity 9 (German) in Section 4, and formulate an empirically-based "rule" for the patterns of behaviour of the word.
Talking about the teaching of linguistics McEnery & Wilson (1996) have remarked:
In our own teaching we have found that students who have been taught using traditional syntax textbooks, which contain only simple example sentences such as Steve puts his money in the bank [ ] often find themselves unable to analyse longer, more complex corpus sentences such as The government has welcomed a report by an Australian royal commission on the effects of Britains atomic bomb testing programme in the Australian desert in the fifties and early sixties...
Discuss to what extent this also applies to the teaching of English to non-native English speakers
Communicative teaching coupled with exclusive target language use has undoubtedly brought many benefits to learners but some re-evaluation of its merits is currently taking place. Discussing which language is used in the classroom, Klapper (1998:22-28) approves of that revolution which has come over many secondary classrooms in recent years: the use of the FL as the principal language of instruction but goes on to point out the need to avoid immersion dogmatism. A common L1 in an L2 learning setting is an obvious classroom resource which should not be overlooked. Raising learners awareness of the TL is linked to raising their mother-tongue awareness, which argues for a reassessment of bilingual (or multilingual) work with learners, and for greater encouragement to learners to "notice" forms, rather than simply use them. In other words, let learners fluctuate between L1 and L2 as appropriate, as this will help them work on the myth that there are one-to-one equivalences between one language and the next, as well as helping them gain a better grasp of what language forms are and get into the habit of discussing them.
Another source of support for the concordancer as a facilitator of language awareness comes from Willis (1999) and his efforts to promote the "lexical syllabus". Challenging the distinction between grammar and lexis, he shows that words should be taught in their "patterns" or "frames". For instance a pattern like the idea (or risk, or thought, or hope) of -ing as the right "feel" for English. Other frames might be possible but simply dont occur, such as the wish of -ing . Teaching all these words individually, or teaching the rule about of + -ing is not sufficiently helpful to the learner. Willis denies that words are single items and grammar tells us how these items combine. For him Rather than grammar on the one hand and lexis on the other, we have an intricate relationship between the two. The interrelationship of lexis and syntax is something that jumps off the page or the screen, for anyone who is at all familiar with concordancer searches. KWIC concordances are rich in such patterns as the one Willis mentions, and, given a little dexterity with search techniques, users can easily create large collections of them for further learning.
Language learning pedagogy has for a few years now argued in favour of the development of learner autonomy. For example, by Little (1996) claims that successful language use over time depends on continued language learning, and that to develop proficiency in a second language we need to be ready to turn almost any occasion of language use into an occasion of conscious language learning. So good language learning means regularly stepping back from purely communicative activities, and casting a critical eye over ones own understandings and ones own strategies. Using concordancers, because they facilitate language awareness, also provide opportunities for such critical activities.
Furthermore, increased exposure to authentic texts has turned teachers (and some students) into more discriminating users of textbooks and led them to question the authority of grammars and dictionaries. McEnery & Wilson (1996) list four separate studies of ESL textbooks that have shown that teaching materials not based on authentic data can be positively misleading to students. Combating the effects of this has been one major aim of the Collins Cobuild project, through the publication of hard-copy corpus-based dictionaries and textbooks like the Collins Cobuild English Language and the Collins Cobuild Students Grammar. But even these tools are edited, and therefore not as useful as a raw concordance for those who like to exert their critical faculties on a piece of language.
What about curiosity, though? Students can be in charge without necessarily growing more curious. For example, they can be given data to manipulate, individually or in groups. Or a group can be asked to create concordances for use as gap-filling tests to set for another group, as part of a competitive game. Classroom applications are numerous, as Section 4 aims to show. But unless these tasks are given a validity other that conferred by the teaching setting, students may well not sustain interest beyond the initial buzz of working with a new piece of software. Student-centeredness implies a large measure of real freedom to choose learning tasks, in this case tasks outside of the teaching context, for instance searching the Internet for texts of real relevance to their individual lives. Teachers willing to grant this freedom will have to evolve pedagogies reconciling student interests (unpredictable and changeable) with the cognitive content of the learning activity (which has to be delivered in a planned and reliable way). Well-thought-out concordancer tasks using as varied a corpus as can be mustered, can offer one way to square this particular circle, particularly if students preferences are genuinely allowed to influence the choice of corpus texts.
In this section we aim to provide you with some practical examples of worksheets which you can print and use with your students if you think its appropriate, But we have only given you a small selection of what can be achieved. We hope that these examples will act as triggers to stimulate your creativity! Some of the examples are in French, others in English, others in German. In most cases the kind of thinking used in a French example can equally well be used in a Spanish or Swedish context if it is translated not only into that language as such but into the grammar, syntax and lexis of that language.
When working with concordances, whether paper-based or interactive, students have to cope within a double set of limitations: their level of language proficiency and their level of familiarity with concordances. When you offer a series of concordance-based exercises you will not always be able to grade them according to both criteria. Our experience is that you cannot overestimate the students need for familiarity with the appearance of concordances, and their need for guidance as to how to derive conclusions from lists of citations. One way of ensuring this is to provide plenty of practice with paper-based exercises first, so that students get used to inductive reasoning before they are asked to cope with the additional burden of manipulating a piece of software, however simple it may seem to you. Also, by providing paper handouts in the early stages of classroom work with concordances, you will be able to simplify a little the sometimes startling physical appearance of concordances, so that learners get well used to them and can move on to use the inevitably more complex ones which they will create when they start using concordancers interactively.
In any preparation for teaching it goes without saying that you should try the exercises on guinea pigs before presenting them to a class. But with concordancing, this really becomes essential: your results will depend to a large extent on the composition of your corpus, so be warned and always try activities out first!!
Since a corpus and a concordancer in principle lets you and your students examine almost all aspects of your target language the crucial point is to discover ways of making the relevant aspects of the target language appear in the concordances. Or put in a less formal lingo: The teachers task in concordancing in the classroom is to ask precisely the right questions.
Contrary to teaching with traditional textbooks, exercise books and grammars the teacher will often discover that a particularly productive question or activity brings up material and linguistic facts that neither student nor teacher expected. This calls for another type of teacher-role than the most traditional one(s). Well return to that point in Section 6.
|
ACTIVITY NAME AND LANGUAGE |
PURPOSE OF ACTIVITY |
PRESENTATION |
| 1. Guess the mystery word (F) | beginners, lexis | paper,online (LAN) |
| 2. Donc on peut dire que (F) | style, usage | paper, online (LAN) |
| 3. Sagit (F) | derive a rule, grammar | paper, online (LAN) |
| 4. Beware false friends (F) | lexis | paper, online (LAN) |
| 5. Coffee or tea? (F) | cultural differences | paper, online (LAN) |
| 6. Les Anglais et les Britanniques (F) | political correct, usage, lexis | paper, online (LAN) |
| 7. Changing lifestyles (F) | cultural differences, usage | paper, online (LAN) |
| 8. dürfen and müssen (G) | lexis, usage | paper |
| 9. Preposition am (G) | usage | paper |
| 10. Syntax of adverbs (E) | driving a rule, grammar, syntax, statistics, usage | online (WWW) |
| 11. reason + because (E) | correctness, usage | online (WWW) |
| 12. Students own writing (E) | variety, usage | online, standalone PC |
| 13. Laigle noir (F) | literary analysis | online (LAN) |
| 14. Mariana (E) | literary analysis | online (LAN or WWW) |
Aim of the activity: To familiarise students with the physical appearance of a KWIC concordance and with the importance of left-context and right-context when working with keywords. This is to be used with people who are complete beginners at concordancing.
Worksheet: Read the grid below, where the nonsense word "gloup" has been entered instead of a real word. Your job is to decide with your group what that mystery real word is. When you have made up your mind, discuss with your group what the answers to the three questions listed below the grid should be.
|
1. pport critique sur certaines utilisations abusives de la |
gloup |
est devenu un geste banal plus quune décision. |
|
2. que pour beaucoup dentre nous le fait dallumer une |
gloup |
. |
|
3. laquelle on est pris pour gens qui sabrutissent à la |
gloup |
, dans une proportion croissante depuis 1896 |
|
4. Tous les grands moments de |
gloup |
superposent un message recherché et un messa |
|
5. sieurs postes et laugmentation du temps de diffusion ( |
gloup |
du matin et de la nuit). |
|
6.
dailleurs 21% des Français reconnaissent regarder la
|
gloup |
Même si le programme les ennuie. 34% seulem |
|
7. rmettent une plus grande maîtrise individuelle de la |
gloup |
. Les comportements des téléspectateurs en ont é |
|
8. ux Pays-Bas que lon regarde le moins longtemps la |
gloup |
: 89 minutes par jour, contre 228 en Grande-Bret |
|
9. publications, diffusent des émissions de radio ou de |
gloup |
. Et découvrent les vertus des communications |
Ideas for the creation of handouts in different languages: try similar exercises with words that have dual meanings or different meanings in different varieties of the language. With such words, contexts will contribute strongly to the guesswork required. E.g. English chip or bathroom (in its US meaning), French dépanneur (and its Québécois French meaning), Spanish and Latin American pasaje or manejar, Italian penna or colpito.
Aims of the activity: To raise students awareness of stylistic differences between different positions of "donc" in sentences. the placement of "donc" in French in formal written texts compared to informal ones. Ideally, this comparison should be carried out contrasting a written corpus with a corpus of spoken language. However, these are very hard to come by for most languages, so instead we have used two corpora from Corpus Lexicaux Québécois, one of them from an archive of letters written by people with no formal education, in a style very close to spoken French. Many of the words were spelt phonetically in the original and to avoid confusing students we have rectified spelling errors. However, original word order and punctuation have been preserved.
Worksheet: In French you could place a word like "donc" at the very beginning of a sentence, (e.g. "Donc on peut dire que "). But you often find it between the main verb and what follows the main verb (e.g. "On peut donc dire que "). Is one better than the other? Where should you put "donc"? To find out, have a look at the two lists below, both from Québec (French Canada). List A is from a set of engineers reports about the building of an airport in Inuit territory. List B is from a collection of handwritten letters sent by poor farmers to the authorities (in this case a priest), to ask for financial help. These letter-writers have not studied composition and they write more or less as they speak.
Your job is to look at how many times "donc" is found at the very beginning of sentences and how many times it occurs immediately after the main verb, in each list. When you have done this, draw a conclusion about where you should place "donc" when you write a formal essay, and where you may put it when you are writing (or speaking) in an everyday situation.
List A
|
Il devient |
donc |
difficile de proposer un plan de gestion de ces troupeaux. |
|
u Québec et de tout promoteur développer les terres et |
donc |
, à restreindre les droits des autochtones. |
|
Elle permet |
donc |
la circulation des avions de grandes dimensions en cas de besoin. |
|
Ce droit exclusif natténue |
donc |
pas du tout les droits de ces derniers puisque ils ont accès à toutes les |
|
Les autochtones ont |
donc |
priorité quant à la récolte. |
|
Les Inuit ont |
donc |
fait part de leurs points de vue sur limpact du Complexe ainsi que leur |
|
Il serait |
donc |
intéressant de comparer des données plus récentes afin de |
|
Ces chiffres révèlent |
donc |
une tendance à la baisse entre 1976 et 1980 mais sûrement aussi |
|
Les données furent |
donc |
suggéré de répartir les territoires de chasse selon des zones |
|
Ils ont |
donc |
peur de ne pas avoir suffisamment obtenu de terrains pour permettre |
|
Lun des aspects négatifs de la mise en application de |
donc |
de ne pas avoir atteint lobjectif de mettre en place un mécanisme |
|
Il faut |
donc |
déterminer les espèces touchées et leur importance relative |
|
Cest |
donc |