Association of British Science Writers

Looking for
a science writer?
Advertise with
the ABSW

Association of British Science Writers
Wellcome Wolfson Building
165 Queen's Gate
London
SW7 5HD

Tel: 0870 770 3361

absw"at"absw.org.uk

These pages were designed, well, cobbled together, by Michael Kenward on behalf of the ABSW.

Computers, language and speech*


Themes of the meeting
Combining approaches: models
Combining approaches: sentences
Combining approaches: sounds
New words
Ambiguity
Proper names
Changing focus
Machine translation
Computer-generated text and personality
Train timetables
Post-operative briefings


Summary

If you are trying to send a document to people in different countries, you may well decide to approach them in their own language. Your computer may translate your text for you; and to do this, its program will incorporate rules about the structure of the different languages it translates. If, instead, you are researching the way television news presents science, you might like to run videos of last year's news bulletins through a computer which will spot key words and print them out in a database showing when and how they were used. This is speech processing, and it has historically used very different methods from the language processing needed for machine translation. The discussion meeting held at the Royal Society on 22 and 23 September, 1999, aimed to build on a recent trend: to show researchers in each area how their work can benefit from using each other's methods.

Language-processing is a top-down process driven by people writing rule-sets. The rules have been grounded in the philosophical outlook of researchers like Noam Chomsky, who see language as the physical manifestation of some underlying process in the minds of speakers and listeners. The process is governed by grammar, and our propensity to understand language through grammar is something we inherit. Speech processing, on the other hand, is a bottom-up approach which looks at what people actually say and sorts out useful information from it on a statistical basis. Before a closer association began to develop between the two groups about ten years ago, the speech-processing engineers and statisticians did not see the importance of rules, while the language-processing linguists regarded statistics as a limited tool. In recent years, however, each group has begun to see how useful the other's methods could be.

This document outlines the contributions of the various participants in the meeting. It ranges over the spectrum of research in the two communities, to give an impression of the questions that are engaging them.

* Summary of a discussion meeting held on 22 and 23 September 1999


Themes of the meeting

The meeting began with one of its organisers, Dr Karen Sparck Jones (University of Cambridge), speaking about the needs of both communities to ground their models in data or use a model to guide abstractions from data. The amount of work along these lines has grown over the past few years, partly because of the growth in computing power, which has enabled ever-increasing bodies of data to be processed. The papers would help answer questions about the sample sizes needed to build rules; how to convert patterns observed in data into rules that would enable speech- and language processors to work; whether the statistical approach is suitable for all levels of language, from syllables to discourse; how to combine quantified and non-quantified data in a language processor; and which applications of language processing are well-suited to using statistical data. 

Combining approaches: models

Dr F Pereira (AT&T Laboratories, New Jersey) argued that statistical analysis of speech has been surprisingly successful. In speech recognition, researchers have used an analytical technique called Markov models to capture some of the predictability of language. These models break up sentences into a sequence of words. They predict what the first word will be, then the second on the basis of the first, the third on the basis of the first two, and so on. They have become more sophisticated now with so-called hidden Markov models, which incorporate hidden variables that can represent past experience, uncertainty about correct grammar, and ambiguity. These models can now be learned from data, and the speech processing community has greatly benefited from them. Statistical information retrieval methods have beaten all alternatives in tests, and in some tests, machine learning and statistical techniques are reaching close to human performance. Dr Pereira, however, called for an analytical framework that combines elements of grammar with statistical methods. He outlined several requirements for probabilistic modelling and stressed that researchers should be thinking in terms of the information which is available in language that a machine can extract, rather than of specific machines to pull out available information.

Combining approaches: sentences

Professor Ronald Rosenfeld (Carnegie Mellon University, Pittsburgh) drew attention to the two distinct communities at the meeting: the linguists, and the speech processors working with practical applications of models of speech. He argued that their work is different because the statistical language-modellers estimate the prevalence of word-sequences in the language, whereas the computational linguists try to come up with rules and theories for the probability that the underlying structures of the language will produce those word-sequences. His laboratory is working on a general framework that could combine the two approaches. He uses a bottom-up approach typical of language processors: he starts with a simplified model of language and sees what it does not capture about the language, then tries to put those features in.

Professor Rosenfeld is trying to advance on the 20-year old Markov models still used in all commercially-available speech recognition products or information retrieval systems. He took one of these state-of-the-art language models and asked it to generate sentences. The resulting sentences violated nearly all of the structures and theories of language, and were very easy to tell apart from proper sentences. Focusing on the sentences as entities, rather than as sequences of words, he tried to identify linguistic features of the generated sentences which they did not incorporate, and compute them. The idea is to add these to the model, ask it to generate more sentences, see if they make more sense, and repeat the process until the statistically-generated sentences do make sense - ie, until they capture all the linguistic features needed for comprehensible language. This method is in its infancy, but Professor Rosenfeld argues that it is a new framework for modelling language which opens the door to putting language back into language modelling.

Combining approaches: sounds

Professor Marie Ostendorf (Boston University) also spoke about bringing linguistics into statistical modelling: in her case, modelling of the sounds of language. The performance of speech recognisers dips quite dramatically from reproducing a news reporter's spoken report, to an interview, to more spontaneous speech. These cases vary a lot in their pronunciation and inflection, and experiments suggest that the speech recognisers could be made to work far more accurately if they had better models of pronunciation. Trying to build in all possible variations of the way "and" is pronounced does not necessarily help because it will produce mistakes in other areas: for example, "I hurt my Ôand"! While statistical models already use linguistic knowledge, they might be improved with the injection of more information, for example on the timing of sounds. 

New words

Statistics can help language processing by guiding the process of developing rules. Dr Julie Carson-Berndsen (University College Dublin) illustrated this with a presentation about her work. A big problem in speech technology is how to treat new words: that is, words which have the right sound to fit into the language but which are not part of its vocabulary. As an example she took "blant", which could be an English word but isn't, and compared it with "bnanlt", which is obviously not an English word. Dr Carson-Berndsen analyses the sound of speech on the basis of what sounds are right for that language. She is concerned with the development of part of a speech recognition system which uses linguistic information below the level of the word: knowledge about the structure of syllables and words. She hopes that it will cover the whole language and not just be relevant for a particular speaker or area. It is a top-down, knowledge-based approach, which imposes the constraints of the rightness of sounds onto the speech recognition process. Her method can be made more accurate however by introducing statistical parameters which, for example, give right-sounding sounds precedence over wrong-sounding ones, or rank the ways in which the information processed from the data could be improved. These statistical inputs will lead to a reduction in the error rate and will improve the accuracy of the model. 

Ambiguity

The problem of ambiguity is another that researchers have to cope with. Sentences such as "Flying planes can be dangerous" or "He saw the man with the telescope" can have more than one meaning. Dr Stephen Pulman (University of Cambridge) explained that statistical methods borrowed from speech processing have proved quite successful in solving some ambiguity, but that where the decisions they make are wrong, there is no way of finding out why. This prompted him to try another way of solving ambiguity. Working with easily-available air travel information about fares, flights, breakfasts and lunches (eg sentences such as "I'd like the cheapest flight from Washington to Atlanta", and "Do they serve a meal on the flight from San Francisco to Atlanta?"), he used the rules of grammar to codify the correct analysis of the sentences. He fed the resulting data into a computer to see if the computer analysis would accurately identify the ambiguous and unambiguous statements. The experiment mostly worked, and where it didn't Dr Pulman was able to go back to his grammatical work and discover why it didn't. His sample was small but he is cautiously optimistic that this way of working might be applicable to larger bodies of data, with the added bonus of being transparent when faults occur.

A different sort of ambiguity was discussed by Dr H Baayen (Max Planck Institute, The Netherlands). He and his colleagues are using statistical methods to arrive at the correct meaning of words which, because of their structure, may be misunderstood in speech processing systems. For example, the word "behave" could be wrongly broken up into "be" and "have"; or "lipstick" into "lips" and "tick". The problem exists in English but is much worse in Dutch. Without using any linguistic knowledge, Dr Baayen's complicated statistical model produces correct readings of shorter words from a 200-word sample with 97% accuracy, and with 82% accuracy for longer words. He hopes however to introduce linguistic knowledge into his model once the statistical analysis has been done, in order to make it more accurate.

Proper names

Dr S Renals (University of Sheffield) is trying to identify entities such as proper names, times, dates and monetary amounts from news broadcasts in North America. Up to 9% of the words of these broadcasts consist of proper names, and the aim is to build a system that would classify them automatically. They could be straightforward, like "John", or a fuller version of a name, like "Isaac Newton", or a more complicated name involving punctuation, like "Rabbit & Runce Solicitors". There can be ambiguities in names, too; "South Yorkshire Beekeepers Association", for example, might be interpreted as two separate parts - a two-word location and a two-word organisation - or as a four-word organisation. Again: part of "Nobel Prize" might be identified as the name of a person, or the whole phrase could be identified as one of a class of scientific or research prizes. Until recently, approaches to this sort of identification were based on rules involving specially-constructed grammars; but in the last two or three years, the problem has been tackled statistically. Dr Renals uses a statistical approach. The broadcast news is fed into a speech recogniser which produces a transcription. A statistical model is then used to identify proper names. Most of his presentation was concerned with the details of two statistical models he is using, the better of which identifies names with nearly 90% accuracy compared with a hand transcription of the original broadcast. Although both models are purely statistical, Dr Renals is considering whether he can integrate rule-based approaches into the models to make them perform more accurately. 

Changing focus

Dr Geoffrey Sampson (University of Sussex) urged the natural language computing (NLC) community to change its current focus. He compared it with information technology, arguing that software engineers used to see their jobs purely in terms of writing the programme code. However, the IT community gradually realised that it was far more important to analyse and define the problem they were trying to solve than to write the code, and that once the problem was analysed in a agreed way, the code-writing became a rather minor part of the whole. Applying this analogy to NLC, Dr Sampson urged researchers to put more effort into finding linguistic rules that would apply to the huge structural variety of written and spoken English, and to pay less attention to the details of the software currently used in NLC.

Machine translation

Dr Hyam Alshawi (AT&T Laboratories, New Jersey) is taking a novel approach to machine translation: systems which translate one language to another. One of the previous efforts to do this took parallel translations of text as published in the Canadian Hansard, which is printed in English and French and, using these texts as a baseline, tried to build a system which would produce the text of one language, given the other. The system only worked for short sentences and therefore had very limited application. The problem was a mathematical computation one, and to increase the length of the translated sentences would have needed a prohibitively more expensive system. Dr Alshawi is trying to overcome this problem by changing the computing model to one which deals with the data in a different way. Instead of working with huge rule-sets for each of the languages being translated, Dr Alshawi's model is purely statistical. It produces rules of translation itself, rather than relying on the rules put into the system by linguists; and it establishes which translations are best by comparing its efforts with translations produced by humans. Dr Alshawi expects that this system - whose results are not yet very good - will be able to deal with translations between any languages without having to have any language rules built into it.

Computer-generated text and personality

Text generated by a computer can be more or less fluent, according to features like the length of sentences and the diversity of vocabulary. Dr Jan Oberlander (University of Edinburgh) illustrated this with some examples from his Intelligent Labelling Explorer, which is a text-generating computer designed to describe objects in museum collections for visitors. There are features of language which strongly influence the reader's (or listener's) perception of the speaker, who can appear more or less competent in use of language and more or less dominant according to the fluency of the text. Dr Jan Oberlander wants to be able to tailor the language produced by a computer to the personality of the user. When this happens, the user thinks the computer is faster and more accurate. Doing this is a matter of adjusting style, and can be likened to an author producing a text which is then edited for readability by a reviewer. Dr Oberlander has experimented with language generated by a computer in the style of Shakespeare, and has had incorporated into it a style feature of another author - in this case, the diversity of vocabulary of Mark Twain. He outlined some models which enable this to be done with different degrees of success.

Train timetables

Where members of the public can telephone for information about train timetables, stock prices or cinema listings, the conversation they have with the computer on the other end of the line is mediated by a spoken dialogue system. Their input is passed to a speech recogniser and converted to text which is in turn converted to logical concepts which form another input to a dialogue manager. This generates a response which is synthesised into speech and fed back to the user. The goals of these systems is to have as many people as possible being given the information they need in a minimum transaction time, and to satisfy the user. Professor S Young (University of Cambridge) discussed ways of optimising the performance of such systems and of estimating how well they are fulfilling these goals.

Post-operative briefings

In many applications where spoken language is the output, not only must speech be synthesised, but also the content and the wording of the language must be computed. These systems are known as concept-to-speech systems because they take conceptual representation as input and produce language as output. Professor Kathy McKeown (Colombia University) is working with a concept-to-speech system which generates briefings on the post-operative state of by-pass patients before they are admitted to the intensive care unit (ICU). The ICU staff need to know what medications they were given during their operations and how they reacted, in order to care for them. The system produces the briefings in the form of spoken language and animated graphics. The aim is to produce natural-sounding speech, which depends on prosody: variations in pitch, rhythm and tempo. Professor McKeown contrasted different ways of producing prosody which is evaluated by native English speakers to determine which sounds most natural.

Dr Paul Taylor (University of Edinburgh) concentrated on the second half of a concept-to-speech system: the speech synthesis. He explained the technicalities of a method that he hopes will improve on the quality of current speech synthesis systems. He hopes to make one which could cope with everything from train announcements to text-to-speech tasks.

Wendy Barnaby
ABSW
December 1999


Contacts:-

Dr H Alshawi
AT&T Laboratories, New Jersey, USA
Tel 1 973 360 8538 Fax 1 973 360 7111

Dr H Baayen
Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
Tel 31 24 352 1510 Fax 31 24 361 5970/352 1213

Dr J Carson-Berndsen
Department of Computer Science, University College Dublin
Tel 353 1 706 2493 Fax 353 1 269 7262

Professor GJM Gazdar
School of Cognitive and Computing Sciences, University of Sussex
Tel 01273 678029/678030 Fax 01273 671320

Dr E Hajicova
Charles University, Prague, Czech Republic
Tel 420 2 781 0623 Fax 420 2 219 14309
Professor GR Sampson
School of Cognitive and Computing Sciences, University of Sussex
Tel 01273 678525/606755 Fax 01273 671320

Dr M Huckvale
Department of Phonetics and Linguistics, University College London
Tel 0171 504 5002 Fax 0171 383 4108

Mr S Isard
CSTR, University of Edinburgh
Tel 0131 650 2792 Fax 0131 650 6351

Professor G Leech
Department of Linguistics and Modern English Language, University of Lancaster
Tel 01524 593036 Fax 01524 843085

Dr S McGlashan
SICS, Stockholm, Sweden
Tel 46 708 46 24 32 Fax 46 708 27 83 58

Professor K McKeown
Computer Science Department, Columbia University, New York, USA
Tel 212 939 7004 Fax 212 666 0140

Professor RM Needham
Microsoft Research Ltd, Cambridge
Tel 01273 334607 Fax 01223 334678

Dr J Oberlander
Division of Informatics, University of Edinburgh
Tel 0131 650 4439 Fax 0131 650 4587

Professor M Ostendorf
Electrical and Computer Engineering Department, Boston University, USA
Tel 617 353 5430 Fax 617 353 8437

Dr F Pereira
AT7T Laboratories, New Jersey, USA
Tel 1 973 360 8320 Fax 1 973 360 8970

Dr S Pulman
Computer laboratory, University of Cambridge
Tel 01223 334613 Fax 01223 334678

Dr S Renals
Department of Computer Science, University of Sheffield
Tel 0114 222 1836 Fax 0114 222 18810

Professor R Rosenfeld
School of Computer Science, Carnegie Mellon University, Pittsburgh, USA
Tel 412 268 7678 Fax 412 268 5576

Professor D Scott
ITRI, University of Brighton
Tel 01273 642900

Dr K Sparck Jones
Computer Laboratory, University of Cambridge
Tel 01223 334631/4607 Fax 01223 334678

Professor H Somers
Department of Language Engineering, UMIST
Tel 0161 200 3107 Fax 0161 200 3099

Dr P Taylor
Centre for Speech Technology Research, University of Edinburgh
Tel 0131 650 2793 Fax 0131 650 6351
Professor Y Wilks
Department of Computer Sciences, University of Sheffield
Tel 0114 222 1804 Fax 0114 222 1810

Professor S Young
Speech, Vision and Robotics Group, Engineering Department, University of Cambridge
Tel 01223 332654 Fax 01223 332662

Copyright ABSW  © 2008  Last update 30 May 2008