Speech Recognition Comes of Age
Larry D. Rosen, Ph.D.
The National Psychologist
I am writing this article using a new speech recognition program called NaturallySpeaking by DragonDictate. It is a very interesting new technology that allows me to dictate a letter to you and not have to type on the computer.
I've been working on this review for nearly a year now and in net year I've seen the field of speech recognition change dramatically. So, with that in mind I'm pleased to be able to tell you that for a very manageable price you are now able to talk to your computer and hadn't understand nearly everything you say.
You may notice a few peculiar words in my short note to Henry Saeman, Editor and Publisher of The National Psychologist. The computer made four mistakes in my letter. At the beginning, where I wrote "Dear" the computer transposed it to "Dealer." At the beginning of the second paragraph "net" was really "that" and at the very end of that paragraph the computer heard me say "hadn't" when I really said "have it." Finally, in the closing it typed "U.S." instead of "Yours." Four mistakes out of 102 words translates to a 96% hit rate. Not a bad result considering that I was speaking at a normal conversation rate and that I had spent less than one hour training NaturallySpeaking to recognize my voice.
A year ago I first saw an advertisement for a new product called Speech Writer for Mental Health (Voice Input Technologies, a division of CMHC Systems 888-97-VOICE). But, I soon discovered that Speech Writer was quite expensive ($5,000 - $6,000) requiring specific hardware geared to larger mental health centers and clinics.
Another year would pass before additional models would enter the market, models that expanded the scope of speech recognition programs.
There are two types of speech recognition programs, discrete and continuous speech. The former require a brief pause between words while the latter allow you to speak naturally. I arranged for demonstration models from major companies which included DragonDictate 2.5 (Dragon Systems - 800-825-5897), VoiceType 3.0 (IBM - now upgraded and renamed Simply Speaking - 800-426-3333) and VoicePro (Kurzweil - 800-380-1234). I spent quite a few hours "test driving" them.
All three performed adequately, but I felt quite frustrated at how slowly I had to speak. Yet, over time each program was able to recognize more and more of my words. Nonetheless, even after a couple of hours of training I felt that I was topping out at a pace that resembled a moderate speed typist. And the recognition rate for all three varied from 75% to 90% depending on the program and the type of material.
But the slow-speech, inaccurate system I had anticipated in the demonstration models when the CMHC representative arrived to demonstrate their Speech Writer program, had now been replaced. I was surprised when the rep was able to dictate a letter at a very rapid clip. While it made a few mistakes, we were finally talking hands-off dictation. A desktop version of Speech Writer is due sometime this year, but it's cost will exceed $5,000 -- a price tag too steep for many private practitioners.
Enter DragonDictate's NaturallySpeaking. In Late July I received a copy of this revolutionary program that claimed to capture natural speech at an affordable cost (list price of $695 but a street price of under $300) and excitedly prepared to load it into my computer. But first I read the system requirements which told me that my year-old Pentium needed to be upgraded before the program would work. Off to the shop I trundled and added some more RAM so that now I had a Pentium computer with 32 MB of RAM running at 133 MHz, the minimum requirements for running NaturallySpeaking.
Again, I sat down to load NaturallySpeaking and found that in clear instructions, it took only about five minutes to set up the microphone headset (which comes included with all speech recognition systems) by plugging two plugs into the back of my tower. Then, I followed another set of clear instructions and NaturallySpeaking tested the microphone setup and sound card quality and helped me adjust both the microphone and the volume. So, in less than 10 minutes, I was ready to start training NaturallySpeaking to recognize my voice. I was given a choice of two passages to read aloud and chose an excerpt from Dave Barry's very funny book Dave Barry in Cyberspace over an interesting passage from Arthur C. Clarke's 3001: The Final Odyssey. About 20 minutes later the computer told me I was done and then it hummed and worked for another 20 minutes and told me it was ready for me to start dictating.
Compared with my experience with discrete dictation systems, using NaturallySpeaking was quite amazing. I started speaking at a normal pace and, in a little box on the screen, I watched in amazement as the computer worked to figure out what I was saying. The letter that I dictated to Henry was only my second dictation attempt! When I was finished dictating my letter I told NaturallySpeaking to "Copy All to Clipboard" and then told it to "Switch to Next Window" (where I had already loaded Microsoft Word, my word processing program. When MS Word opened up I told NaturallySpeaking to "Paste That" and my letter was now in my word processing program.
NaturallySpeaking comes equipped with a 30,000 word vocabulary which you can double by training it with words that you tend to speak. You can even have it analyze documents you have written and add those words to its vocabulary. It's only shortcomings appear to be the fact that it is restricted to a single user and that the only operation that it can do in other programs is paste text from your dictation screen.
Other programs (including IBM's Simply Speaking, Kurzweil's Voice Pro and Dragon System's own DragonDictate) allow you to use your voice to maneuver in many PC-based programs. However, Dragon Systems will soon release their NaturallySpeaking Deluxe (expected street price of around $400) which will expand the its earlier version to accommodate most of these shortcomings.
IBM was to release its continuous speech recognition system, called ViaVoice, in late Summer. It can be programmed to work inside MS Word. Clicking on a button in MS Word or saying "Begin Dictation" activates your microphone and allows Word to capture your speech. Unfortunately, I couldn't preview ViaVoice without upgrading my 133 MHz Pentium to a minimum 150 MHz with MMX. However, most new computers run at this speed or better so ViaVoice may be an option for you. ViaVoice is expected to cost around $200.
The obvious choice is NaturallySpeaking for anyone able to afford $400 for the system, and a computer that runs fast enough with sufficient RAM. With a minimum of training, you can release your hands from the tedium of typing and avoid any possibility of carpal tunnel syndrome, too. It that price is too steep, but you still want to be able to dictate at a moderate speed, try IBM's Simply Speaking at under $100.
What about Macintosh users? Well, I am afraid that you are basically, once again, left by the wayside. Development money has gone into PC-based systems and there are very few alternatives for Apple users. Dragon Systems does offer Dragon Power Secretary which sells for around $600, but it is a discrete recognition system. None of the other big companies appear to have a Mac option and none are under development.
The fact is that the business world is a PC world. Love your Mac and use it's powerful multimedia options, but figure that for any business applications you are going to need to switch to a PC. Luckily, PC prices have come down drastically and are now very affordable.
Copyright, 1997, The National Psychologist. Reprinted with permission. The National Psychologist is a privately-owned bimonthly newspaper which may be purchased for $30 a year. Write or call: TNP, 6100 Channingway Blvd., Suite 303, Columbus, OH 43232; telephone: 614.861.1999 or fax with Visa or MC to 614.861.1996.