 |
"Industry Begins to Use Voice Recognition Technology"
Futures Industry, October/November 1995
by Sean G. Thomas
Take a veteran equities trader accustomed to passing hand-written notes to a personal assistant for order
entry. Now take away his assistant. Is he most likely to (a) hold his own deck, (b) deck you, or
(c) talk to a computer?
According to Kevin Dunne, "as long as it understands his New York accent," the answer is (c). Dunne is
a product specialist at FICOMP Systems, Inc., who was given the formidable task of training a group of
Bear Stearns traders to use FSI's new Interpreter 6000 Automated Speech Recognition System. As Dunne puts
it, "these are not patient guys."
Patient or not, Bear Stearns now uses the Interpreter 6000 for live trading. Once used mainly by the
injured or disabled as an alternative to keyboard entry, voice recognition systems are gaining acceptance
as a reliable method of entering information quickly -- and on the trading floor, where a few seconds can
make the difference between gain and loss, the quicker the better.
How It Works
In an ASR system, human speech is broken into frequencies and digitized by a computer, which then matches
the frequencies against a stored representation of phonemes (the smallest bits of spoken language), words or
phrases. Traditionally, ASR systems have required speakers to speak... in... discreet... words, but
technological advances now allow for continuous speech recognition. Word recognition has also risen in
recent years, from the eightieth percentile to 95 to 98 percent in some systems, though many are
speaker-dependent and only recognize the speech paterns of one individual.
ASR technology has met with skepticism in the past. In 1969, the pioneering Bell Labs director John
Pierce denounced the pursuit of speech recognition as the folly of "mad inventors and unreliable engineers."
No computer, he argued, could ever amass the knowledge and experience needed for fluency. Raj Reddy had
just begun his career in ASR research when Pierce's influential article was published, which he describes
as "a humbling experience. The rest of us did not necessarily agree with him, and we were successful
enough to keep going, but not enough to say 'I've solved the problem.'"
Reddy, now the dean of Carnegie-Mellon's Computer Science School, predicts that the goal of computer
fluency is still 30 years away. But with the advent of the PC revolution, processing power has become cheap
and readily available. According to Reddy, much of the technology used in today's ASR systems is based
directly on experimental systems such as his own mid-seventies Hearsay problem; the difference is that
today's computers compile the information two orders of magnitude faster, at a fraction of the cost. And
while a 95% word recognition rate might not qualify a person as completely fluent, it is accurate enough
for many commercial applications.
The Interpreter 6000 system is one of the latest commercial products in the financial industry to
incorporate ASR technology. FSI's vice president of sales and marketing Alan Kazanoff describes his product
as "middleware," an integration of several products on both the hardware and software sides. According to
Kazanoff, when Bear Stearns asked them to develop voice-input for a new order entry system, "their
instructions were, 'don't touch a line of our code.' As trading screens and fields changed, they wanted a
system that would adapt with them."
Working with Verbex voice recognition software, also used by the Sydney Futures Exchange for price
reporting in their trading pits, FSI spent a year creating the software platform to integrate the component
systems. After that, says Kazanoff, "we initially thought we were done."
The Human Element
But as Kevin Dunne discovered, "human feedback was incredibly critical. If people aren't comfortable with
the system, they won't use it. If they don't use it, it won't work." Hardware was an immediate issue: FSI
and their Bear Stearns trainees tested 26 different microphones, finally finding a custom-molded model which
fits in the speaker's ear and transmits internal sound vibrations. FSI worked with San Diego's JABRA
Corporation (who originally designed the mics for use with car phones) to customize the noise cancellation
software which was crucial for their use on the trading floor.
Modifying the system's software vocabulary was also a challenge. Traders needed a vocabulary flexible
enough, as Dunne says, to "take the thinking out" of of their spoken commands, one which would include such
phrase variations as "fifty," "five-oh," and "half a hundred." Although the system has a vocabulary of
400 words, according to Dunne it recognizes 28,000 phrases. Since the Interpreter 6000 is speaker-dependent,
FSI included a feature whereby traders could update their voice pattern files on-line. If a trader suffers
from allergies and the system fails to recognize his voice, he can in effect re-record his voice profile
immediately.
Improving Technology
John Oberteuffer, president of Voice Recognition Associates in Lexington, Massachusetts, says that the
Bear Stearns implementation is exciting for two reasons: "One, the user is a professional whose time is
valuable, who has a large database of information which he can control and gain easy access to. Two, it
creates an immediate, real-time record of the trade which can be confirmed. It increases the ease of
billing and tracking trades."
Harry Bergen, vice president of turrets and voice products at IPC in New York, questions some
commercial claims of deal capture in ASR systems. "The accuracy has to be extremely good for that,
otherwise you'll spend time reviewing the trading records anyway." But he emphasizes the improved
recognition of today's systems: "If a customer said 'I'd like you to incorporate [ASR] into our
turret system,' I'd say okay. I wouldn't have said it five years ago."
IPC has previously worked with BBN HARK, a software-only speaker-independent product, and Bergen says that
programming trading turrets could be potentially simplified with ASR technology. FSI's Dunne agrees, but
says that the challenge will be to use one system to switch between the separate frequencies of trading
turret phone lines and computers.
As for speech recognition's future, VRA's Oberteuffer notes what may be a significant parallel: "Fifteen
years ago, when nobody else had computers on their desk, financial traders were the first to use them. The
financial industry has always been an area of technical leadership in terms of information technology, and
it's significant that they're looking into this."
Sean G. Thomas, Sean Thomas, Sean Garrett Thomas
|
 |