Speech Recognition 2003
Microsoft Speech API
CSLU Toolkit
Microsoft Speech API (SAPI)
Training and Evaluation
We will begin with training
1. Select <Start<All Programs<Microsoft Speech SDK 5.1<C++ Samples> <Dictation Pad> and execute
2. Select <Voice<Microphone Setup> and setup your microphone
3. Select <Voice<Voice Training> to train the speech recognizer
4. Test how well your voice is being recognized in Dictation Pad by reading the following text:
"He found himself standing in a landscape that looked exactly like a giant chessboard. On every black square there was a monster: there were two-tongued snakes and lions with three row of teeth, and four headed dogs and five-headed demon kings an so on. He was, so to speak, looking out through the eyes of the young hero of the story. It was like being in the passenger seat of an automobile; all he had to do was watch, while the hero dispatched one monster after another and advanced up the chessboard towards the white stone tower at the end."
You should calculate the Word Error Rate (WER) as calculated in the book:
substitutions + deletions + insertions
WER = ------
number of words in correct text
whereby substitutions, deletions and insertions are the counts for the WRONG substitutions, deletions, and insertions, respectively. E-mail your results for ALL rounds to , and state (if possible) your microphone type. (NB When sending in results of an assignment, always state your group members)
5. Repeat steps 3 and 4 a couple of times (until you are satisfied).
Programming
Next we will create a small Turing-test program using the existing TalkBack application
¨ Copy the whole MS Speech directory (not just the CPP or TalkBack directories) to your drive D: (Data dir), for write access
¨ Locate the TalkBack program in your local version of Microsoft Speech SDK 5.1
¨ Compile the TalkBack program
¨ Get the Babble code from http://www.liacs.nl/~erwin/speechrecognition.html
¨ Compile the Babble code (it is just another incarnation of the well known Eliza program). It might give some warnings but you can ignore them.
¨ Combine the two programs so you can hear `Eliza' speak and she can hear you.
¨ Send a solution to (<Build<Clean> your project; put all the files of your project in a zip file and send it)
CSLU Toolkit
RAD Introduction
¨ Start the RAD-tool by selecting<Start<CSLU Toolkit<Rad>
¨ Select <Examples> from the RAD-file menu and load the Pizza example
¨ Compile the Pizza example and run it (Note that you may have to select CUAnimate instead of the Baldi animation in the <File<Preferences> menu.)
¨ Load the other examples and run them.
¨ Go to <Documentation<RAD Tutorial> and browse through the tutorial such that you understand the main functionality of the RAD-tool.
¨ Expand the pizza example so that
¨ you can interrupt at an earlier stage, if something was not recognized correctly
¨ and you can choose a drink (e.g., Beer, Cola, Water)
¨ Send your solution to
Optional
Take a look at the Coffee tutorial of Microsoft's Speech SDK 5.1 and create a similar or better program using the RAD tool. Can you notice any significant difference in the quality of speech recognition of the two systems.