| ||||||||||
|
USC biomedical engineers Jim-Shih Liaw and Theodore W. Berger have just developed a speech-recognition system capable of picking a needle of meaning out of a haystack of ambience. Their "neural net" approach, modelled on the human brain's workings, can accurately and automatically transcribe conversations with sound-to-noise ratios of 1:1000. In other words, machines can now "understand" the speech of virtually anyone, in any crowd, at any time-much better than would human beings. Liaw and Berger say they help the deaf, or others-like police and air-traffic controllers-who must comprehend speech in noisy environments. Their system can instantly generate clean printouts identifying each participant in a conversation. If that's the present state of this art, what can we expect for its near future? François Richard, a Montreal programmer working in California for speech recognition industry leader Nuance predicted: "Look for voice browsing. You'll pick up the phone and request 'coffee shop in Montreal near Park Avenue' and the system will read back a list of them. Or say 'BACK' to get a new selection." Soon, Richard added, we can expect "Web sites to become voice sites. With voice recognition, you will be able to obtain non-visual information interactively." Where does that leave those of us who just bought Dragon Dictate's Naturally Speaking 4.0? "Hopelessly in the past," said Richard. "All the personal voice-recognition systems are stripped-down versions of early techniques that have since become much more sophisticated in their industrial applications." An increasingly pervasive such application is directory assistance, in which computers have widely replaced operators across the continent over the last few years. These systems can handle multiple languages and millions of voices despite poor line and background conditions. But since I can't go to the store and buy a "411" voice recognition system for my PC, I am still stuck in the past with Naturally Speaking 4.0-the best I am convinced of an antediluvian bunch. I shouldn't complain. Naturally Speaking has gotten better each of the three years I've been using it so I now can slash the time it takes me to literally spit out a first draft by a good 80%, dictating some 5,000 words an hour. Of course NS 4.0 offers no help with subsequent editing, so the time reduction for producing a finished translation nets to about 15%. The accuracy of Naturally Speaking is equivalent to that of my own typing, but the kind of mistake is very different. NS never misspells. I always do. NS may get the entire word wrong, though and has a lot of trouble with names not native to the language version being used ("Roubillard" in an English edition may come out as "ruby year"). Still, sound-alike errors can be easily corrected with search/replace and you can teach the system unfamiliar terms through training. Despite NS's imperfections, I find the main glitch in using it to be my own thought process. Were Naturally Speaking to perfectly transcribe every word I write, I would still probably shuffle most of the text around in the editing process. If I were typing my first draft, I would make some of those kinds of changes as I go-producing a cleaner draft to start, but at one-fifth to one-tenth the speed of NS. Of course the more you talk, train and correct, the better Naturally Speaking will work over time. Still, to benefit from voice recognition, you need a top-end system-and by that I mean an Intel PIII or AMD K7, with at least a Creative PCI 128 sound card, a VXI Parrott mike (around US$60) or better and 256 Megs of memory. You text will splash onto the screen as you speak it (with my old Pentium-I 200 the lag had been several minutes) so you can easily correct mistakes on the fly and the system will generally get them right the second time around. A less powerful setup may still work. But it will not excel. The results of excellent recognition will quickly pay the price difference for that high-performance system. And your satisfaction with the faster results may help you forget how hopelessly trapped in the past you still are.
Josh Wallace, c.tr.
| ||
| ||||