Mei-Yuh Hwang
Education:
· PhD, Computer Science, Carnegie Mellon University, 1993.
Thesis: Sub-phonetic Acoustic Modeling for Speaker-Independent Continuous Speech Recognition --- Senone Representation.
Allen Newell Research Excellence Medal.
· M.S., Computer Science, Carnegie Mellon University, 1989.
· B.S., Computer Science, National Taiwan University, 1986.
Phi Tau Phi Scholastic Honor Society (http://www.phitauphi.org.tw/ ), 1986.
Professional Interests:
My interests lie in statistical pattern recognition (especially speech and handwriting recognition), heuristic search, discrete math and algorithms. I am also fascinated by and hoping to get involved with machine translation, data mining, and statistical genome analysis.
My work has been always both research and product development. I enjoy research, publication, and supervising graduate students and interns, but I am also a strong believer of implementing the research ideas into something useful, and I personally enjoy programming. Therefore, I worked in both the research and product divisions at Microsoft, mentored graduate and undergraduate interns, and transferred research technologies into real-time products. The products that I was heavily involved in included the Whisper dictation system, Microsoft Speech API, Office XP English, Mandarin, and Japanese dictation and Speech Server. Now I am enjoying once again research, publication, and mentoring graduate students at University of Washington. Through these experiences, I am familiar with a few state-of-the-art speech systems: CMU-SPHINX, CMU-LM, Microsoft-Whisper, CU-HTK, SRI-LM, and SRI-Decipher.
During the past three+ years I have been working at the SSLI Lab as a senior research scientist, I have been supervising graduate students closely, leading the research and development of the SRI/UW Mandarin automatic speech recognition (ASR) system, and reviewing graduate school applications. Both of our DARPA funded telephone-conversation (the EARS project) and broadcast-news systems (the GALE project) have been very successful and evaluated impressively as competitive as the other top systems in the world. Particularly in the most recent evaluation in June 2007, UW demonstrated the best Mandarin ASR system!
Team / Error rateUW / 9.1%
RWTH / 12.1%
UW+RWTH / 8.9%
BBN+Cambridge Univ / 9.4%
IBM+CMU / 9.8%
My career has been rewarding intellectually as it satisfies my zest for both brainstorming new ideas and proving them with real systems. I am passionate about both research and products, and would be happy in steering my career in either direction, as a researcher, engineer, or a product manager.
Education:
· PhD, Computer Science, Carnegie Mellon University, 1993.
Thesis: Sub-phonetic Acoustic Modeling for Speaker-Independent Continuous Speech Recognition --- Senone Representation.
Allen Newell Research Excellence Medal.
· M.S., Computer Science, Carnegie Mellon University, 1989.
· B.S., Computer Science, National Taiwan University, 1986.
Phi Tau Phi Scholastic Honor Society (http://www.phitauphi.org.tw/ ), 1986.
Work Experience:
· Principal Software Design Engineer, Microsoft, 3/2008---.
· Senior Research Scientist, Electrical Engineering Dept., University of Washington, 4/2004-2/2008:
o 2005-2006: GALE (http://www.darpa.mil/ipto/programs/gale/). Build, improve, and research issues in speech recognition for Mandarin broadcast news and broadcast conversation. Supervise graduate students.
o 2004-2005: EARS (http://www.darpa.mil/ipto/Programs/ears/index.htm) Investigated speech recognition for Mandarin telephone conversations using SRI Decipher system. Evaluated as competitive as the state-of-the-art system. Supervised graduate students.
· Researcher/Engineer, Microsoft Corporation, Redmond, Washington, 1994-2004:
Researched and developed Whisper (Windows Highly Intelligent SPEech Recognizer, (http://research.microsoft.com/srg/srproject.aspx), a state-of-the-art speaker-independent continuous speech recognition research system, which had been incorporated into many Microsoft products. General research responsibilities for speech recognition and its applications. Mentored interns.
· Research programmer, Carnegie Mellon University, 1992-1993:
A key player in building and improving the SPHINX-II system (http://www.speech.cs.cmu.edu/). Built the top system in DARPA Resource Management, Wall Street Journal, and ATIS dictation evaluations.
· Teaching Assistant, Computer Science, National Taiwan University, 1987.
Professional Service:
· Member of Technical Committee, ISCSLP 2008, Kunming, Yun-nan, China.
· Reviewer for HLT-2008.
· Member of Technical Committee, Interspeech 2007, Antwerp, Belgium.
· Member of Technical Committee, International Symposium on Chinese Spoken Language Processing, 2006, Singapore (http://www.iscslp2006.org/ ).
· Editorial board of Journal of Negative Results in Speech and Audio Sciences, 2004 ---.
· Member of Technical Committee, International Symposium on Chinese Spoken Language Processing, 2004, Hong Kong.
· Publicity Chair, IEEE-ICASSP 1998, Seattle, WA.
· Reviewer for IEEE Transactions on Audio, Speech and Language Processing, Computer Speech and Language, Speech Communication.
Book Chapter:
· X.D. Huang, A. Acero, F. Alleva., M.Y. Hwang., L. Jiang, and M. Mahajan, Chapter "From Sphinx-II to Whisper -- Making Speech Recognition Usable", in Automatic Speech and Speaker Recognition -- Advanced Topics, Kluwer Academic Publishers, pp. 481-508, 1996.
· M.Y. Hwang, Part I, Chapter 7, "Acoustic Modeling for Mandarin Large Vocabulary Continuous Speech Recognition", in Advances in Chinese Spoken Language Processing, World Scientific Publishing, pp. 153-178, 2007.
Journal Publications:
· M.Y. Hwang, G. Peng, W. Wang, A. Faria, A. Heidel, and M. Ostendorf, “Building A Highly Accurate Mandarin Speech Recognizer with Language-Independent Technologies and Language-Dependent Modules”, in preparation to IEEE Transactions on Audio, Speech, and Language Processing, 2008.
· X. Lei, M. Ostendorf, and M.Y. Hwang, "Lexical Tone Modeling for Mandarin Large Vocabulary Speech Recognition", in preparation to IEEE Transactions on Audio, Speech, and Language Processing, 2008.
· A. Stolcke, B. Chen, H. Franco, R. Gadde, M. Graciarena, M.Y. Hwang, K. Kirchhoff, X. Lei, A. Mandal, N. Morgan, T. NG, M. Ostendorf, K. Sonmez, A. Venkataraman, D. Vergyri, W. Wen, J. Zheng, and Q, Zhu, “Recent Innovations in Speech-to-Text Transcription at SRI-ICSI-UW”, accepted by IEEE Transactions on Audio, Speech and Language Processing, 14(5), 2006, pp. 1729-1744.
· F. Alleva, X.D. Huang, M.Y. Hwang, and L. Jiang, “Can Continuous Speech Recognizers Handle Isolated Speech?” Speech Communication, 26(3), pp. 183-189, November 1998.
· M.Y. Hwang, X.D. Huang, and F. Alleva, “Predicting Unseen Triphones with Senones, IEEE Transaction on Speech and Audio Processing, Vol. 4, No. 6, November 1996, pp. 412-419.
· M.Y. Hwang and X.D. Huang, “Shared-Distribution Hidden Markov Models for Speech Recognition”, IEEE Transaction on Speech and Audio Processing, Vol.1, No. 4, October, 1993, pp. 414-420.
· X.D. Huang, H.W. Hon, M.Y. Hwang, and K.F. Lee, “A Comparative Study of Discrete, Semi-Continuous and Continuous Hidden Markov Models”, Computer Speech and Language, Vol. 7, No. 4, October 1993, pp. 359-368.
· X.D. Huang, F. Alleva, H.W. Hon, M.Y. Hwang, K.F. Lee, and R. Rosenfeld, “The SPHINX-II speech recognition system: an overview”, Computer Speech and Language, Vol. 7, No. 2, April 1993, pp. 137-148.
· K.F. Lee, H.W. Hon, M.Y. Hwang, and X.D. Huang, “Speech Recognition Using Hidden Markov Models: A CMU Perspective”, Speech Communications, Vol. 9, 1990, pp. 497-508.
· K.F. Lee, H.W. Hon, M.Y. Hwang, and S. Mahajan, “Recent Progress and Future Outlook of the Sphinx Speech Recognition System”, Computer Speech and Language, Vol. 4, No. 1, 1990, pp. 57-69.
Conference Publications:
· D. Hillard, M. Hwang, M. Harper, and M. Ostendorf, “Parsing-based Objective Functions for Speech Recognition in Translation Applications”, ICASSP-2008.
· M.Y. Hwang, G. Peng, W. Wang, A. Faria, A. Heidel, and M. Ostendorf, “Building a Highly Accurate Mandarin Speech Recognizer”, ASRU 2007.
· M.Y. Hwang, W. Wang, X. Lei, J. Zheng, O. Cetin, and G. Peng, “Advances in Mandarin Broadcast Speech Recognition”, Interspeech 2007.
· G. Peng, M.Y. Hwang, and M. Ostendorf, “Automatic Acoustic Segmentation for Speech Recognition of Broadcast Recordings”, Interspeech 2007.
· J. Zheng, O. Cetin, M.Y. Hwang, X. Lei, A. Stolcke, and N. Morgan, “Combining Discriminative Feature, Transform, and Model Training for Large Vocabulary Speech Recognition”, ICASSP-2007.
· M.Y. Hwang, X. Lei, W. Wang, and T. Shinozaki, “Investigation on Mandarin Broadcast News Speech Recognition”, Interspeech-2006.
· X. Lei, M. Siu, M.Y. Hwang, M. Ostendorf, and T. Lee, “Improved Tone Modeling for Mandarin Broadcast News Speech Recognition”, Interspeech-2006
· A. Stolcke, F. Grezl, M.Y. Hwang, X. Lei, N. Morgan, and D. Vergyri, “Cross-Domain and Cross-Language Portability of Acoustic Features Estimated by Multilayer Perceptrons”, ICASSP-2006.
· X. Lei, M.Y. Hwang, and M. Ostendorf, “Incorporating Tone-related MLP Posteriors in Feature Representation for Mandarin ASR”, Interspeech-2005, pp. 2981-2984.
· T. Ng, M. Ostendorf, M.Y. Hwang, M. Siu, I. Bulyko, and X. Lei, "Web-Data Augmented Language Models for Mandarin Conversational Speech Recogntion", ICASSP-2005, pp. 589-592.
· M.Y. Hwang, X. Lei, T. NG, I. Bulyko, M. Ostendorf, A. Stolcke, W. Wang, J. Zheng, V. Gadde, M. Graciarena, M. Siu and Y. Huang , “Progress on Mandarin Conversational Telephone Speech Recognition”, ISCSLP-2004, pp. 1-4.
· T. Ng, M. Ostendorf, M.Y. Hwang, M. Siu, I. Bulyko, and X. Lei, “Improving Language Models for Mandarin Conversational Speech Recognition with Web Data”, DARPA RT-04 Workshop, 2004.
· M.Y. Hwang, X. Lei, T. Ng, M. Ostendorf, A. Stolcke, W. Wang, J. Zheng, V. Gadde, “Porting Decipher from English to Mandarin”, DARPA RT-04 Workshop, 2004.
· D. Yu, M.Y. Hwang, P. Mau, A. Acero and L. Deng, “Unsupervised Learning from Users’ Error Correction in Speech Dictation”, ICSLP-2004, pp. 267-270.
· M. Richardson, M.Y. Hwang, A. Acero, and X.D. Huang, “Improvements on Speech Recognition for Fast Talkers”, EuroSpeech-1999, pp. 411-414.
· M.Y. Hwang and X.D. Huang, “Dynamically Configurable Acoustic Models for Speech Recognition”, ICASSP-1998, pp. 669-672.
· F. Alleva, X.D. Huang, M.Y. Hwang, and L. Jiang, “Can continuous speech recognizers handle isolated speech?” Proceedings of Eurospeech-1997, pp. 911-914.
· X.D. Huang, M.Y. Hwang, L. Jiang and M. Mahajan, “Deleted Interpolation and Density Sharing for Continuous Hidden Markov Models”, ICASSP-1996, pp. 885-888.
· F. Alleva, X.D. Huang, and M.Y. Hwang, "Improvements on the Pronunciation Prefix Tree Search Organization", ICASSP-1996, pp. 133-136.
· X.D. Huang, A. Acero, F. Alleva, M. Y. Hwang, L. Jiang and M. Mahajan. "Microsoft Windows Highly Intelligent Speech Recognizer: Whisper", ICASSP-1995, pp. 93-96.
· M.Y. Hwang, R. Rosenfeld, E. Thayer, R. Mosur, L. Chase, R. Weide, X.D. Huang, and F. Alleva, “Improving Speech Recognition Performance via Phone-Dependent VQ Codebooks and Adaptive Language Models in SPHINX-II”, ICASSP-1994, pp. 549-552.
· M.Y. Hwang, R. Rosenfeld, E. Thayer, R. Mosur, L. Chase, R. Weide, X.D. Huang, and F. Alleva, “Improved Acoustic and Adaptive Language Models for Continuous Speech Recognition”, ARPA Workshop on Spoken Language Technology, 1994.
· M.Y. Hwang, X.D. Huang, and F. Alleva, “Predicting unseen triphones with senones,” Proceedings of ICASSP-1993, pp. 311-314.
· F. Alleva, X.D. Huang, and M.Y. Hwang. “An Improved Search Algorithm Using Incremental Knowledge for Continuous Speech Recognition", ICASSP-1993, pp. 307-310.
· X.D. Huang, F. Alleva, M.Y. Hwang, and R. Rosenfeld, “An overview of the SPHINX-II speech recognition system”, DARPA Workshop on Human Language Technology, pp. 81-86, March, 1993.
· X.D. Huang, M. Belin, F. Alleva, and M.Y. Hwang, “Unified Stochastic Engine (USE) for Speech Recognition”, ICASSP-1993, pp. 636-639.
· M.Y. Hwang, X.D. Huang, and F. Alleva, “Senones, Multi-Pass Search, and Unified Stochastic Modeling in SPHINX-II”, Eurospeech-1993, pp. 2143-2146.
· M.Y. Hwang and X.D. Huang, “Sub-phonetic Modeling with Markov States – Senones”, Proceedings of ICASSP-92, pp. 33-36.
· W. Ward, S. Issar, X.D. Huang, H.W. Hon, M.Y. Hwang, S. Young, M. Matessa, F.H. Liu, R. Stern, “Speech Understanding in Open Tasks”, DARPA Workshop on Speech and Natural Language Understanding, pp. 78-83, 1992.
· M.Y Hwang and X.D. Huang, “Subphonetic modeling for speech recognition”, DARPA Workshop on Speech and Natural Language Understanding, pp. 174-179, 1992.
· F. Alleva, H.W. Hon, X.D. Huang, M.Y. Hwang, R. Rosenfeld, and R. Weide, “Applying SPHINX-II to the DARPA Wall Street Journal CSR task”, DARPA Workshop on Speech and Natural Language Understanding, pp. 393-398, 1992.
· X.D. Huang, K.F. Lee, H.W. Hon, and M.Y. Hwang, “Improved Acoustic Modeling for the SPHINX Speech Recognition System”, ICASSP-1991, pp. 345-348.
· M.Y. Hwang and X.D. Huang, “Improved Speaker-Independent Continuous Speech Recognition Using Shared-Distribution Semi-Continuous Models”, IEEE Workshop on Speech Recognition, Arden House, New York, 1991.
· M.Y. Hwang and X.D. Huang, “Acoustic Classification of Phonetic Hidden Markov Models”, Eurospeech-1991.
· X.D. Huang, F. Alleva, S. Hayamizu, H.W. Hon, M.Y. Hwang, and K.F. Lee, “Improved hidden Markov modeling for speaker-independent continuous speech recognition”, DARPA Workshop on Speech and Natural Language, pp. 327-331, May 1990.
· M.Y. Hwang, H.W. Hon, and K.F. Lee, “Modeling Between-Word Co-articulation in Continuous Speech Recognition”, Proceedings of Eurospeech-1989, pp. 5-8.
· M.Y. Hwang, H.W. Hon, and K.F. Lee, "Modeling Inter-Word Co-articulation Using Generalized Triphones", The 117th Meeting of the Acoustical Society of America, Syracuse, NY, May 1989.
· K.F. Lee, H.W. Hon, M.Y. Hwang, S. Mahajan, and R. Reddy, “The SPHINX Speech Recognition System”, ICASSP-1989, pp. 445-448.
· K.F. Lee, H.W. Hon, and M.Y. Hwang, “Recent Advances in Large-Vocabulary Speaker-Independent Continuous Speech Recognition”, DARPA Workshop on Speech Technology, 1989.
Technical Reports:
· M. Hwang, X. Lei, T. Ng, and M. Ostendorf, “Porting Decipher from English to Mandarin”, UWEE Technical Report #UWEETR-2006-0013.
· M.Y. Hwang, X.D. Huang, and F. Alleva, “Predicting Unseen Triphones with Senones”, Carnegie Mellon University Technical Report, CMU-CS-93-139, 1993.
· X.D. Huang, F. Alleva, H.W. Hon, M.Y. Hwang, and R. Rosenfeld, “The SHINX-II Speech Recognition System: An Overview”, Carnegie Mellon University Technical Report, CMU-CS-92-112, 1992.
· M.Y. Hwang and X.D. Huang, “Shared-Distribution Hidden Markov Models for Speech Recognition”, Carnegie Mellon University Technical Report, CMU-CS-91-124, 1991.
· M.Y. Hwang, H.W. Hon, and K.F. Lee, “Between-word Co-articulation Modeling for Continuous Speech Recognition”, Carnegie Mellon University Technical Report, CMU-CS-89-141R, 1989.
Patents:
1. Improving letter-to-sound derivation using graphonemes, 2004.
2. Improving speech recognition from user correction, 2004.
3. Improving new-word pronunciation learning using a pronunciation graph constructed from both acoustics and LTS rules, 2003.
4. Modeling and processing filled pauses and noises in speech recognition, 2003.
5. Generating a task-adapted acoustic model from one or more supervised and/or unsupervised corpora, 2002.
6. Methods and apparatus for performing speech recognition using acoustic models which are improved through an iterative process, 2000.