US5839105A - Speaker-independent model generation apparatus and speech recognition apparatus each equipped with means for splitting state having maximum increase in likelihood - Google Patents
Speaker-independent model generation apparatus and speech recognition apparatus each equipped with means for splitting state having maximum increase in likelihood Download PDFInfo
- Publication number
- US5839105A US5839105A US08/758,378 US75837896A US5839105A US 5839105 A US5839105 A US 5839105A US 75837896 A US75837896 A US 75837896A US 5839105 A US5839105 A US 5839105A
- Authority
- US
- United States
- Prior art keywords
- state
- speaker
- hidden markov
- likelihood
- markov model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
Abstract
Description
Q(θ|θ.sup.(r))=E.sub.θ(r) log p(y.sub.1.sup.T,S.sub.1.sup.T |y.sub.1.sup.T,θ)!,(1)
Q(θ|.sup.(r))≧Q(θ.sup.(r) |θ.sup.(r)) →L(θ)≧L(θ.sup.(r))(2)
γ.sub.t (s)=p(s.sub.t =s|y.sub.1.sup.T,θ.sup.(r))(4), and
ξ.sub.t (s,s')=p(s.sub.t =s,s.sub.t-1 =s'y.sub.1.sup.T,θ.sup.(r))(5).
i(s)=E L(Yθ(s))|s! (12),
d(s,yh)=E L(Y,yh)|s!-i(s) (14).
A.sub.k.sup.(r+1) ={x.sub.j :α(x.sub.j).sup.(r+1) =k}(20).
θ.sup.(0) (S.sub.0)=θ(s)=(μ(s),C(s)) (24), and
θ.sup.(0) (S.sub.1)=(μ(s)(1ε),C(s)) (25).
L.sup.(r) =-N.sub.0 log|C.sup.(r) (S.sub.0)|N.sub.1 logC.sup.(r) (S.sub.1)| (30),
L.sup.(r) ≧L.sup.(r-1) (31).
2N.sub.j log|C(s.sub.0)|+tr (S.sub.j.sup.2 -2S.sub.j.sup.1 μ(.sub.0).sup.t +N.sub.j μ(s.sub.0)μ(s.sub.0).sup.t)C(S.sub.0).sup.-1 !≦2N.sub.j log|C(s.sub.1)|+tr (S.sub.j.sup.2 -2S.sub.j.sup.1 μ(s.sub.1).sup.t +N.sub.j μ(s.sub.1)μ(s.sub.1).sup.t)C(s.sub.1).sup.-1 ! (36).
γ.sub.t (s*)=γ.sub.t (q.sub.0)+γ.sub.t (q.sub.1)(48)
ξ.sub.t (s*,s*)=ξ.sub.t (q.sub.0,q.sub.0)+ξ.sub.t (q.sub.1,q.sub.0)+ξ.sub.t (q.sub.1,q.sub.1) (49)
γ.sub.t (i)+p(s.sub.t =i|Y) (50), and
ξ.sub.t (i,j)=p(s.sub.t =i,s.sub.t=31 1 =j|Y) (51)
γh.sub.t (q)=p(q.sub.t =q|s.sub.t =s*,Y) (52), and
ξh.sub.t (q,q')=p(q.sub.t =q,q.sub.t-1 =q'|s.sub.t =s*,s.sub.t-1 =s*,Y) (53)
TABLE 1 ______________________________________% Accuracy 200states 400states 400states 1Gaussian 1Gaussian 3 Gaussians /State /State /State Speaker SSS SI-SSS SSS SI-SSS SSS SI-SSS ______________________________________ MHT 93.9 92.8 95.4 94.5 96.1 96.0 MAU 93.6 93.2 95.2 95.2 96.4 96.7 MXM 91.7 91.9 93.6 93.9 95.3 95.1 FTK 91.5 91.1 92.9 94.0 94.7 95.0 FMS 89.7 91.3 91.9 93.2 94.2 94.6 FYM 90.7 92.4 92.9 93.6 95.1 95.5 Average 91.9 92.1 93.7 94.1 95.3 95.5 ______________________________________
Claims (10)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP7312286A JP2871561B2 (en) | 1995-11-30 | 1995-11-30 | Unspecified speaker model generation device and speech recognition device |
JP7-312286 | 1995-11-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
US5839105A true US5839105A (en) | 1998-11-17 |
Family
ID=18027425
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/758,378 Expired - Lifetime US5839105A (en) | 1995-11-30 | 1996-11-29 | Speaker-independent model generation apparatus and speech recognition apparatus each equipped with means for splitting state having maximum increase in likelihood |
Country Status (2)
Country | Link |
---|---|
US (1) | US5839105A (en) |
JP (1) | JP2871561B2 (en) |
Cited By (68)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6049797A (en) * | 1998-04-07 | 2000-04-11 | Lucent Technologies, Inc. | Method, apparatus and programmed medium for clustering databases with categorical attributes |
US6108628A (en) * | 1996-09-20 | 2000-08-22 | Canon Kabushiki Kaisha | Speech recognition method and apparatus using coarse and fine output probabilities utilizing an unspecified speaker model |
US6195636B1 (en) * | 1999-02-19 | 2001-02-27 | Texas Instruments Incorporated | Speech recognition over packet networks |
US6223159B1 (en) * | 1998-02-25 | 2001-04-24 | Mitsubishi Denki Kabushiki Kaisha | Speaker adaptation device and speech recognition device |
US6263309B1 (en) * | 1998-04-30 | 2001-07-17 | Matsushita Electric Industrial Co., Ltd. | Maximum likelihood method for finding an adapted speaker model in eigenvoice space |
US6266636B1 (en) * | 1997-03-13 | 2001-07-24 | Canon Kabushiki Kaisha | Single distribution and mixed distribution model conversion in speech recognition method, apparatus, and computer readable medium |
US6266637B1 (en) * | 1998-09-11 | 2001-07-24 | International Business Machines Corporation | Phrase splicing and variable substitution using a trainable speech synthesizer |
US6269334B1 (en) * | 1998-06-25 | 2001-07-31 | International Business Machines Corporation | Nongaussian density estimation for the classification of acoustic feature vectors in speech recognition |
US6343267B1 (en) | 1998-04-30 | 2002-01-29 | Matsushita Electric Industrial Co., Ltd. | Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques |
US6377921B1 (en) * | 1998-06-26 | 2002-04-23 | International Business Machines Corporation | Identifying mismatches between assumed and actual pronunciations of words |
US6377924B1 (en) * | 1999-03-12 | 2002-04-23 | Texas Instruments Incorporated | Method of enrolling phone-based speaker specific commands |
US6380934B1 (en) * | 1998-11-30 | 2002-04-30 | Mitsubishi Electric Research Laboratories, Inc. | Estimating targets using statistical properties of observations of known targets |
US6421641B1 (en) * | 1999-11-12 | 2002-07-16 | International Business Machines Corporation | Methods and apparatus for fast adaptation of a band-quantized speech decoding system |
US20020095289A1 (en) * | 2000-12-04 | 2002-07-18 | Min Chu | Method and apparatus for identifying prosodic word boundaries |
US20020099547A1 (en) * | 2000-12-04 | 2002-07-25 | Min Chu | Method and apparatus for speech synthesis without prosody modification |
US20020116192A1 (en) * | 1998-09-09 | 2002-08-22 | Makoto Shozakai | Speech recognizer |
US6526379B1 (en) | 1999-11-29 | 2003-02-25 | Matsushita Electric Industrial Co., Ltd. | Discriminative clustering methods for automatic speech recognition |
US20030050814A1 (en) * | 2001-03-08 | 2003-03-13 | Stoneking Michael D. | Computer assisted benchmarking system and method using induction based artificial intelligence |
US6535849B1 (en) * | 2000-01-18 | 2003-03-18 | Scansoft, Inc. | Method and system for generating semi-literal transcripts for speech recognition systems |
WO1999059135A3 (en) * | 1998-05-11 | 2003-04-03 | Siemens Ag | Arrangement and method for computer recognition of a predefined vocabulary in spoken language |
US20030067467A1 (en) * | 2001-03-23 | 2003-04-10 | Microsoft Corporation | Methods and systems for merging graphics for display on a computing device |
US6549899B1 (en) * | 1997-11-14 | 2003-04-15 | Mitsubishi Electric Research Laboratories, Inc. | System for analyzing and synthesis of multi-factor data |
US20030071818A1 (en) * | 2001-03-23 | 2003-04-17 | Microsoft Corporation | Methods and systems for displaying animated graphics on a computing device |
US6571208B1 (en) | 1999-11-29 | 2003-05-27 | Matsushita Electric Industrial Co., Ltd. | Context-dependent acoustic models for medium and large vocabulary speech recognition with eigenvoice training |
US20030120488A1 (en) * | 2001-12-20 | 2003-06-26 | Shinichi Yoshizawa | Method and apparatus for preparing acoustic model and computer program for preparing acoustic model |
US20030163313A1 (en) * | 2002-02-26 | 2003-08-28 | Canon Kabushiki Kaisha | Model generation apparatus and methods |
GB2387008A (en) * | 2002-03-28 | 2003-10-01 | Qinetiq Ltd | Signal Processing System |
US20030187647A1 (en) * | 2002-03-29 | 2003-10-02 | At&T Corp. | Automatic segmentation in speech synthesis |
US6691087B2 (en) * | 1997-11-21 | 2004-02-10 | Sarnoff Corporation | Method and apparatus for adaptive speech detection by applying a probabilistic description to the classification and tracking of signal components |
US6697769B1 (en) * | 2000-01-21 | 2004-02-24 | Microsoft Corporation | Method and apparatus for fast machine training |
US20040181408A1 (en) * | 2003-03-13 | 2004-09-16 | Microsoft Corporation | Method for training of subspace coded gaussian models |
US20040193398A1 (en) * | 2003-03-24 | 2004-09-30 | Microsoft Corporation | Front-end architecture for a multi-lingual text-to-speech system |
US6804648B1 (en) * | 1999-03-25 | 2004-10-12 | International Business Machines Corporation | Impulsivity estimates of mixtures of the power exponential distrubutions in speech modeling |
US6816101B2 (en) | 2002-03-08 | 2004-11-09 | Quelian, Inc. | High-speed analog-to-digital converter using a unique gray code |
US20050102122A1 (en) * | 2003-11-10 | 2005-05-12 | Yuko Maruyama | Dynamic model detecting apparatus |
US6910000B1 (en) * | 2000-06-02 | 2005-06-21 | Mitsubishi Electric Research Labs, Inc. | Generalized belief propagation for probabilistic systems |
US20060111905A1 (en) * | 2004-11-22 | 2006-05-25 | Jiri Navratil | Method and apparatus for training a text independent speaker recognition system using speech data with text labels |
US20070033044A1 (en) * | 2005-08-03 | 2007-02-08 | Texas Instruments, Incorporated | System and method for creating generalized tied-mixture hidden Markov models for automatic speech recognition |
US20070050180A1 (en) * | 2000-05-04 | 2007-03-01 | Dov Dori | Modeling system |
US20070088551A1 (en) * | 2002-06-06 | 2007-04-19 | Mcintyre Joseph H | Multiple sound fragments processing and load balancing |
US20070100624A1 (en) * | 2005-11-03 | 2007-05-03 | Fuliang Weng | Unified treatment of data-sparseness and data-overfitting in maximum entropy modeling |
US7318032B1 (en) * | 2000-06-13 | 2008-01-08 | International Business Machines Corporation | Speaker recognition method based on structured speaker modeling and a “Pickmax” scoring technique |
US20080059190A1 (en) * | 2006-08-22 | 2008-03-06 | Microsoft Corporation | Speech unit selection using HMM acoustic models |
US20080059184A1 (en) * | 2006-08-22 | 2008-03-06 | Microsoft Corporation | Calculating cost measures between HMM acoustic models |
US20080091424A1 (en) * | 2006-10-16 | 2008-04-17 | Microsoft Corporation | Minimum classification error training with growth transformation optimization |
US20080147403A1 (en) * | 2002-06-06 | 2008-06-19 | International Business Machines Corporation | Multiple sound fragments processing and load balancing |
US20080147579A1 (en) * | 2006-12-14 | 2008-06-19 | Microsoft Corporation | Discriminative training using boosted lasso |
US20080201139A1 (en) * | 2007-02-20 | 2008-08-21 | Microsoft Corporation | Generic framework for large-margin MCE training in speech recognition |
US7725079B2 (en) | 2004-12-14 | 2010-05-25 | Quellan, Inc. | Method and system for automatic control in an interference cancellation device |
US7729431B2 (en) | 2003-11-17 | 2010-06-01 | Quellan, Inc. | Method and system for antenna interference cancellation |
US20100185444A1 (en) * | 2009-01-21 | 2010-07-22 | Jesper Olsen | Method, apparatus and computer program product for providing compound models for speech recognition adaptation |
US7804760B2 (en) | 2003-08-07 | 2010-09-28 | Quellan, Inc. | Method and system for signal emulation |
US7934144B2 (en) | 2002-11-12 | 2011-04-26 | Quellan, Inc. | High-speed analog-to-digital conversion with improved robustness to timing uncertainty |
US8005430B2 (en) | 2004-12-14 | 2011-08-23 | Quellan Inc. | Method and system for reducing signal interference |
US8068406B2 (en) | 2003-08-07 | 2011-11-29 | Quellan, Inc. | Method and system for crosstalk cancellation |
US8311168B2 (en) | 2002-07-15 | 2012-11-13 | Quellan, Inc. | Adaptive noise filtering and equalization for optimal high speed multilevel signal decoding |
US8494850B2 (en) * | 2011-06-30 | 2013-07-23 | Google Inc. | Speech recognition using variable-length context |
US8576939B2 (en) | 2003-12-22 | 2013-11-05 | Quellan, Inc. | Method and system for slicing a communication signal |
US8727991B2 (en) | 2011-08-29 | 2014-05-20 | Salutron, Inc. | Probabilistic segmental model for doppler ultrasound heart rate monitoring |
US8924212B1 (en) * | 2005-08-26 | 2014-12-30 | At&T Intellectual Property Ii, L.P. | System and method for robust access and entry to large structured data using voice form-filling |
US20150371633A1 (en) * | 2012-11-01 | 2015-12-24 | Google Inc. | Speech recognition using non-parametric models |
US9252983B2 (en) | 2006-04-26 | 2016-02-02 | Intersil Americas LLC | Method and system for reducing radiated emissions from a communications channel |
US20170200447A1 (en) * | 2013-04-25 | 2017-07-13 | Nuance Communications, Inc. | Systems and methods for providing metadata-dependent language models |
US9753912B1 (en) | 2007-12-27 | 2017-09-05 | Great Northern Research, LLC | Method for processing the output of a speech recognizer |
US9858922B2 (en) | 2014-06-23 | 2018-01-02 | Google Inc. | Caching speech recognition scores |
EP3174262A4 (en) * | 2015-03-20 | 2018-01-17 | Baidu Online Network Technology (Beijing) Co., Ltd | Artificial intelligence based voiceprint login method and device |
US10204619B2 (en) | 2014-10-22 | 2019-02-12 | Google Llc | Speech recognition using associative mapping |
CN111523565A (en) * | 2020-03-30 | 2020-08-11 | 中南大学 | Streaming processing method, system and storage medium for big data |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6078872B2 (en) * | 2012-10-01 | 2017-02-15 | 国立研究開発法人産業技術総合研究所 | Automatic topology generation of AR-HMM |
-
1995
- 1995-11-30 JP JP7312286A patent/JP2871561B2/en not_active Expired - Fee Related
-
1996
- 1996-11-29 US US08/758,378 patent/US5839105A/en not_active Expired - Lifetime
Non-Patent Citations (42)
Title |
---|
Anderson, An Introduction To Multivariate Statistical Analysis, 2nd Ed., John Wiley & Sons, (1984), pp. 404 411. * |
Anderson, An Introduction To Multivariate Statistical Analysis, 2nd Ed., John Wiley & Sons, (1984), pp. 404-411. |
Bahl et al. Decision Trees for Phonological Rules in Continuous Speech, IEEE Proceedings of the International Conference on Acoustic Speech and Signal Processing, (1991), pp. 185 188. * |
Bahl et al. Decision Trees for Phonological Rules in Continuous Speech, IEEE Proceedings of the International Conference on Acoustic Speech and Signal Processing, (1991), pp. 185-188. |
Bahl et al., A Tree Based Statistical Language Model . . . , IEEE Transactions on Acoustic Speech and Signal Processing, vol. 37, No. 7, (1989), pp. 507 514. * |
Bahl et al., A Tree-Based Statistical Language Model . . . , IEEE Transactions on Acoustic Speech and Signal Processing, vol. 37, No. 7, (1989), pp. 507-514. |
Bahl et al., Context Dependent Vector Quantization for Continuous Speech Recognition, IEEE Proceedings of the International Conference on Acoustic Speech and Signal Processing, (1993), pp. II 632 II 635. * |
Bahl et al., Context Dependent Vector Quantization for Continuous Speech Recognition, IEEE Proceedings of the International Conference on Acoustic Speech and Signal Processing, (1993), pp. II-632-II-635. |
Breiman et al., Classification And Regression Trees, Wadsworth, Inc., (1984), pp. 266 271. * |
Breiman et al., Classification And Regression Trees, Wadsworth, Inc., (1984), pp. 266-271. |
Chou, Optimal Partitioning for Classification And Regression Trees, IEEE Transactions On Pattern Analysis and Machine Intelligence, vol. 13, No. 4, Apr. 1991, pp. 340 354. * |
Chou, Optimal Partitioning for Classification And Regression Trees, IEEE Transactions On Pattern Analysis and Machine Intelligence, vol. 13, No. 4, Apr. 1991, pp. 340-354. |
Dempster et al., Maximum Likelihood from Incomplete Data . . . , Royal Statistical Society, Journal, Series B. vol. 39, No. 1, (1977), pp. 1 38. * |
Dempster et al., Maximum Likelihood from Incomplete Data . . . , Royal Statistical Society, Journal, Series B. vol. 39, No. 1, (1977), pp. 1-38. |
Huang et al., An Overview of the SPHINX II Speech Recognition System, Proceedings of ARPA Workshop on Human Language Technology, pp. 81 86. * |
Huang et al., An Overview of the SPHINX-II Speech Recognition System, Proceedings of ARPA Workshop on Human Language Technology, pp. 81-86. |
Kannan et al., Maximum Likelihood Clustering of Gaussians for Speech Recognition, IEEE Transactions on Speech and Audio Processing, vol. 2, No. 3, (1994), pp. 453 455. * |
Kannan et al., Maximum Likelihood Clustering of Gaussians for Speech Recognition, IEEE Transactions on Speech and Audio Processing, vol. 2, No. 3, (1994), pp. 453-455. |
Kosaka et al., Tree Structured Speaker Clustering for Speaker Independent Continuous . . . , ICSLP, (1994), pp. 1375 1378. * |
Kosaka et al., Tree-Structured Speaker Clustering for Speaker-Independent Continuous . . . , ICSLP, (1994), pp. 1375-1378. |
Kurematsu et al., ATR Japanese Speech Database As A Tool Of Speech . . . , Speech Communication 9, Elsevier Science Publishers B.V. (North Holland), (1990), pp. 367 363. * |
Kurematsu et al., ATR Japanese Speech Database As A Tool Of Speech . . . , Speech Communication 9, Elsevier Science Publishers B.V. (North-Holland), (1990), pp. 367-363. |
Lee et al., Allophone Clustering For Continuous Speech Recognition, IEEE Proceedings of the International Conference on Acoustic Speech and Signal Processing, (1990), pp. 749 752. * |
Lee et al., Allophone Clustering For Continuous Speech Recognition, IEEE Proceedings of the International Conference on Acoustic Speech and Signal Processing, (1990), pp. 749-752. |
Linde et al., An Algorithm for Vector Quantizer Design, IEEE Transactions On Communications, vol. COM 28, No. 1, Jan. 1980, pp. 84 95. * |
Linde et al., An Algorithm for Vector Quantizer Design, IEEE Transactions On Communications, vol. COM-28, No. 1, Jan. 1980, pp. 84-95. |
Nadas et al., An Iterative "Flip-Flop" Approximation Of the Most Informative . . . , IEEE Proceedings of the International Conference on Acoustic Speech and Signal Processing, (1991), pp. 565-568. |
Nadas et al., An Iterative Flip Flop Approximation Of the Most Informative . . . , IEEE Proceedings of the International Conference on Acoustic Speech and Signal Processing, (1991), pp. 565 568. * |
Nagai et al., Atreus: A Comparative Study of Continuous Speech . . . , 1993 IEEE ICASSP 93 reprint, pp. II 139 II 142. * |
Nagai et al., Atreus: A Comparative Study of Continuous Speech . . . , 1993 IEEE ICASSP-93 reprint, pp. II-139-II-142. |
Nagai et al., The SSS LR Continuous Speech Recognition System . . . , Proceedings of International Conference on Spoken Language Processing, (1992), pp. 1511 1514. * |
Nagai et al., The SSS-LR Continuous Speech Recognition System . . . , Proceedings of International Conference on Spoken Language Processing, (1992), pp. 1511-1514. |
Sagayama et al., ATREUS: a Speech Recognition Front end for a Speech Translation System, Proceedings of European Conference on Speech Communication and Technology, (1993), pp. 1287 1290. * |
Sagayama et al., ATREUS: a Speech Recognition Front-end for a Speech Translation System, Proceedings of European Conference on Speech Communication and Technology, (1993), pp. 1287-1290. |
Singer et al., Speech Recognition Without Grammar Or Vocabulary Constrains, ICSLP, (1994), pp. 2207 2210. * |
Singer et al., Speech Recognition Without Grammar Or Vocabulary Constrains, ICSLP, (1994), pp. 2207-2210. |
Takami et al., A Successive State Splitting Algorithm for Efficient Allophone Modelling, IEEE, (1992), pp. I 573 I 576. * |
Takami et al., A Successive State Splitting Algorithm for Efficient Allophone Modelling, IEEE, (1992), pp. I-573-I-576. |
Takami et al., Automatic Generation of Speaker Common Hidden . . . Proceedings of Acoustic Society in Japan (partial English translation), (1992), pp. 155 156. * |
Takami et al., Automatic Generation of Speaker-Common Hidden . . . Proceedings of Acoustic Society in Japan (partial English translation), (1992), pp. 155-156. |
Young et al., Tree Based State Tying for High Accuracy Acoustic Modelling, pp. 286 291. * |
Young et al., Tree-Based State Tying for High Accuracy Acoustic Modelling, pp. 286-291. |
Cited By (114)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6108628A (en) * | 1996-09-20 | 2000-08-22 | Canon Kabushiki Kaisha | Speech recognition method and apparatus using coarse and fine output probabilities utilizing an unspecified speaker model |
US6266636B1 (en) * | 1997-03-13 | 2001-07-24 | Canon Kabushiki Kaisha | Single distribution and mixed distribution model conversion in speech recognition method, apparatus, and computer readable medium |
US6549899B1 (en) * | 1997-11-14 | 2003-04-15 | Mitsubishi Electric Research Laboratories, Inc. | System for analyzing and synthesis of multi-factor data |
US6691087B2 (en) * | 1997-11-21 | 2004-02-10 | Sarnoff Corporation | Method and apparatus for adaptive speech detection by applying a probabilistic description to the classification and tracking of signal components |
US6223159B1 (en) * | 1998-02-25 | 2001-04-24 | Mitsubishi Denki Kabushiki Kaisha | Speaker adaptation device and speech recognition device |
US6049797A (en) * | 1998-04-07 | 2000-04-11 | Lucent Technologies, Inc. | Method, apparatus and programmed medium for clustering databases with categorical attributes |
US6263309B1 (en) * | 1998-04-30 | 2001-07-17 | Matsushita Electric Industrial Co., Ltd. | Maximum likelihood method for finding an adapted speaker model in eigenvoice space |
US6343267B1 (en) | 1998-04-30 | 2002-01-29 | Matsushita Electric Industrial Co., Ltd. | Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques |
US7003460B1 (en) | 1998-05-11 | 2006-02-21 | Siemens Aktiengesellschaft | Method and apparatus for an adaptive speech recognition system utilizing HMM models |
WO1999059135A3 (en) * | 1998-05-11 | 2003-04-03 | Siemens Ag | Arrangement and method for computer recognition of a predefined vocabulary in spoken language |
US6269334B1 (en) * | 1998-06-25 | 2001-07-31 | International Business Machines Corporation | Nongaussian density estimation for the classification of acoustic feature vectors in speech recognition |
US6377921B1 (en) * | 1998-06-26 | 2002-04-23 | International Business Machines Corporation | Identifying mismatches between assumed and actual pronunciations of words |
US20020116192A1 (en) * | 1998-09-09 | 2002-08-22 | Makoto Shozakai | Speech recognizer |
US6868382B2 (en) * | 1998-09-09 | 2005-03-15 | Asahi Kasei Kabushiki Kaisha | Speech recognizer |
US6266637B1 (en) * | 1998-09-11 | 2001-07-24 | International Business Machines Corporation | Phrase splicing and variable substitution using a trainable speech synthesizer |
US6380934B1 (en) * | 1998-11-30 | 2002-04-30 | Mitsubishi Electric Research Laboratories, Inc. | Estimating targets using statistical properties of observations of known targets |
US6195636B1 (en) * | 1999-02-19 | 2001-02-27 | Texas Instruments Incorporated | Speech recognition over packet networks |
US6377924B1 (en) * | 1999-03-12 | 2002-04-23 | Texas Instruments Incorporated | Method of enrolling phone-based speaker specific commands |
US6804648B1 (en) * | 1999-03-25 | 2004-10-12 | International Business Machines Corporation | Impulsivity estimates of mixtures of the power exponential distrubutions in speech modeling |
US6421641B1 (en) * | 1999-11-12 | 2002-07-16 | International Business Machines Corporation | Methods and apparatus for fast adaptation of a band-quantized speech decoding system |
US6571208B1 (en) | 1999-11-29 | 2003-05-27 | Matsushita Electric Industrial Co., Ltd. | Context-dependent acoustic models for medium and large vocabulary speech recognition with eigenvoice training |
US6526379B1 (en) | 1999-11-29 | 2003-02-25 | Matsushita Electric Industrial Co., Ltd. | Discriminative clustering methods for automatic speech recognition |
US6535849B1 (en) * | 2000-01-18 | 2003-03-18 | Scansoft, Inc. | Method and system for generating semi-literal transcripts for speech recognition systems |
US6697769B1 (en) * | 2000-01-21 | 2004-02-24 | Microsoft Corporation | Method and apparatus for fast machine training |
US20070050180A1 (en) * | 2000-05-04 | 2007-03-01 | Dov Dori | Modeling system |
US6910000B1 (en) * | 2000-06-02 | 2005-06-21 | Mitsubishi Electric Research Labs, Inc. | Generalized belief propagation for probabilistic systems |
US7318032B1 (en) * | 2000-06-13 | 2008-01-08 | International Business Machines Corporation | Speaker recognition method based on structured speaker modeling and a “Pickmax” scoring technique |
US7263488B2 (en) * | 2000-12-04 | 2007-08-28 | Microsoft Corporation | Method and apparatus for identifying prosodic word boundaries |
US7127396B2 (en) | 2000-12-04 | 2006-10-24 | Microsoft Corporation | Method and apparatus for speech synthesis without prosody modification |
US20040148171A1 (en) * | 2000-12-04 | 2004-07-29 | Microsoft Corporation | Method and apparatus for speech synthesis without prosody modification |
US20020099547A1 (en) * | 2000-12-04 | 2002-07-25 | Min Chu | Method and apparatus for speech synthesis without prosody modification |
US6978239B2 (en) | 2000-12-04 | 2005-12-20 | Microsoft Corporation | Method and apparatus for speech synthesis without prosody modification |
US20020095289A1 (en) * | 2000-12-04 | 2002-07-18 | Min Chu | Method and apparatus for identifying prosodic word boundaries |
US20050119891A1 (en) * | 2000-12-04 | 2005-06-02 | Microsoft Corporation | Method and apparatus for speech synthesis without prosody modification |
US20030050814A1 (en) * | 2001-03-08 | 2003-03-13 | Stoneking Michael D. | Computer assisted benchmarking system and method using induction based artificial intelligence |
US8788452B2 (en) * | 2001-03-08 | 2014-07-22 | Deloitte Development Llc | Computer assisted benchmarking system and method using induction based artificial intelligence |
US7315307B2 (en) | 2001-03-23 | 2008-01-01 | Microsoft Corporation | Methods and systems for merging graphics for display on a computing device |
US7439981B2 (en) | 2001-03-23 | 2008-10-21 | Microsoft Corporation | Methods and systems for displaying animated graphics on a computing device |
US20050083339A1 (en) * | 2001-03-23 | 2005-04-21 | Microsoft Corporation | Methods and systems for displaying animated graphics on a computing device |
US20030071818A1 (en) * | 2001-03-23 | 2003-04-17 | Microsoft Corporation | Methods and systems for displaying animated graphics on a computing device |
US20040217960A1 (en) * | 2001-03-23 | 2004-11-04 | Microsoft Corporation | Methods and systems for merging graphics for display on a computing device |
US20040212621A1 (en) * | 2001-03-23 | 2004-10-28 | Microsoft Corporation | Methods and systems for merging graphics for display on a computing device |
US7315308B2 (en) | 2001-03-23 | 2008-01-01 | Microsoft Corporation | Methods and system for merging graphics for display on a computing device |
US7239324B2 (en) | 2001-03-23 | 2007-07-03 | Microsoft Corporation | Methods and systems for merging graphics for display on a computing device |
US20030067467A1 (en) * | 2001-03-23 | 2003-04-10 | Microsoft Corporation | Methods and systems for merging graphics for display on a computing device |
US20030120488A1 (en) * | 2001-12-20 | 2003-06-26 | Shinichi Yoshizawa | Method and apparatus for preparing acoustic model and computer program for preparing acoustic model |
US7209881B2 (en) * | 2001-12-20 | 2007-04-24 | Matsushita Electric Industrial Co., Ltd. | Preparing acoustic models by sufficient statistics and noise-superimposed speech data |
US20030163313A1 (en) * | 2002-02-26 | 2003-08-28 | Canon Kabushiki Kaisha | Model generation apparatus and methods |
US7260532B2 (en) | 2002-02-26 | 2007-08-21 | Canon Kabushiki Kaisha | Hidden Markov model generation apparatus and method with selection of number of states |
US6816101B2 (en) | 2002-03-08 | 2004-11-09 | Quelian, Inc. | High-speed analog-to-digital converter using a unique gray code |
GB2387008A (en) * | 2002-03-28 | 2003-10-01 | Qinetiq Ltd | Signal Processing System |
US7664640B2 (en) | 2002-03-28 | 2010-02-16 | Qinetiq Limited | System for estimating parameters of a gaussian mixture model |
US8131547B2 (en) | 2002-03-29 | 2012-03-06 | At&T Intellectual Property Ii, L.P. | Automatic segmentation in speech synthesis |
US20030187647A1 (en) * | 2002-03-29 | 2003-10-02 | At&T Corp. | Automatic segmentation in speech synthesis |
US7266497B2 (en) * | 2002-03-29 | 2007-09-04 | At&T Corp. | Automatic segmentation in speech synthesis |
US20070271100A1 (en) * | 2002-03-29 | 2007-11-22 | At&T Corp. | Automatic segmentation in speech synthesis |
US7587320B2 (en) * | 2002-03-29 | 2009-09-08 | At&T Intellectual Property Ii, L.P. | Automatic segmentation in speech synthesis |
US20090313025A1 (en) * | 2002-03-29 | 2009-12-17 | At&T Corp. | Automatic Segmentation in Speech Synthesis |
US7788097B2 (en) | 2002-06-06 | 2010-08-31 | Nuance Communications, Inc. | Multiple sound fragments processing and load balancing |
US7747444B2 (en) | 2002-06-06 | 2010-06-29 | Nuance Communications, Inc. | Multiple sound fragments processing and load balancing |
US20070088551A1 (en) * | 2002-06-06 | 2007-04-19 | Mcintyre Joseph H | Multiple sound fragments processing and load balancing |
US20080147403A1 (en) * | 2002-06-06 | 2008-06-19 | International Business Machines Corporation | Multiple sound fragments processing and load balancing |
US8311168B2 (en) | 2002-07-15 | 2012-11-13 | Quellan, Inc. | Adaptive noise filtering and equalization for optimal high speed multilevel signal decoding |
US7934144B2 (en) | 2002-11-12 | 2011-04-26 | Quellan, Inc. | High-speed analog-to-digital conversion with improved robustness to timing uncertainty |
US20040181408A1 (en) * | 2003-03-13 | 2004-09-16 | Microsoft Corporation | Method for training of subspace coded gaussian models |
US7571097B2 (en) * | 2003-03-13 | 2009-08-04 | Microsoft Corporation | Method for training of subspace coded gaussian models |
US7496498B2 (en) | 2003-03-24 | 2009-02-24 | Microsoft Corporation | Front-end architecture for a multi-lingual text-to-speech system |
US20040193398A1 (en) * | 2003-03-24 | 2004-09-30 | Microsoft Corporation | Front-end architecture for a multi-lingual text-to-speech system |
US7804760B2 (en) | 2003-08-07 | 2010-09-28 | Quellan, Inc. | Method and system for signal emulation |
US8068406B2 (en) | 2003-08-07 | 2011-11-29 | Quellan, Inc. | Method and system for crosstalk cancellation |
US8605566B2 (en) | 2003-08-07 | 2013-12-10 | Quellan, Inc. | Method and system for signal emulation |
US20050102122A1 (en) * | 2003-11-10 | 2005-05-12 | Yuko Maruyama | Dynamic model detecting apparatus |
US7660707B2 (en) * | 2003-11-10 | 2010-02-09 | Nec Corporation | Dynamic model detecting apparatus |
US7729431B2 (en) | 2003-11-17 | 2010-06-01 | Quellan, Inc. | Method and system for antenna interference cancellation |
US8576939B2 (en) | 2003-12-22 | 2013-11-05 | Quellan, Inc. | Method and system for slicing a communication signal |
US20060111905A1 (en) * | 2004-11-22 | 2006-05-25 | Jiri Navratil | Method and apparatus for training a text independent speaker recognition system using speech data with text labels |
US20080235020A1 (en) * | 2004-11-22 | 2008-09-25 | Jiri Navratil | Method and apparatus for training a text independent speaker recognition system using speech data with text labels |
US7447633B2 (en) * | 2004-11-22 | 2008-11-04 | International Business Machines Corporation | Method and apparatus for training a text independent speaker recognition system using speech data with text labels |
US7813927B2 (en) * | 2004-11-22 | 2010-10-12 | Nuance Communications, Inc. | Method and apparatus for training a text independent speaker recognition system using speech data with text labels |
US8005430B2 (en) | 2004-12-14 | 2011-08-23 | Quellan Inc. | Method and system for reducing signal interference |
US7725079B2 (en) | 2004-12-14 | 2010-05-25 | Quellan, Inc. | Method and system for automatic control in an interference cancellation device |
US8135350B2 (en) | 2004-12-14 | 2012-03-13 | Quellan, Inc. | System for reducing signal interference |
US8503940B2 (en) | 2004-12-14 | 2013-08-06 | Quellan, Inc. | Reducing signal interference |
US20070033044A1 (en) * | 2005-08-03 | 2007-02-08 | Texas Instruments, Incorporated | System and method for creating generalized tied-mixture hidden Markov models for automatic speech recognition |
US9824682B2 (en) | 2005-08-26 | 2017-11-21 | Nuance Communications, Inc. | System and method for robust access and entry to large structured data using voice form-filling |
US9165554B2 (en) | 2005-08-26 | 2015-10-20 | At&T Intellectual Property Ii, L.P. | System and method for robust access and entry to large structured data using voice form-filling |
US8924212B1 (en) * | 2005-08-26 | 2014-12-30 | At&T Intellectual Property Ii, L.P. | System and method for robust access and entry to large structured data using voice form-filling |
US8700403B2 (en) * | 2005-11-03 | 2014-04-15 | Robert Bosch Gmbh | Unified treatment of data-sparseness and data-overfitting in maximum entropy modeling |
US20070100624A1 (en) * | 2005-11-03 | 2007-05-03 | Fuliang Weng | Unified treatment of data-sparseness and data-overfitting in maximum entropy modeling |
US9252983B2 (en) | 2006-04-26 | 2016-02-02 | Intersil Americas LLC | Method and system for reducing radiated emissions from a communications channel |
US20080059190A1 (en) * | 2006-08-22 | 2008-03-06 | Microsoft Corporation | Speech unit selection using HMM acoustic models |
US20080059184A1 (en) * | 2006-08-22 | 2008-03-06 | Microsoft Corporation | Calculating cost measures between HMM acoustic models |
US8234116B2 (en) * | 2006-08-22 | 2012-07-31 | Microsoft Corporation | Calculating cost measures between HMM acoustic models |
US8301449B2 (en) * | 2006-10-16 | 2012-10-30 | Microsoft Corporation | Minimum classification error training with growth transformation optimization |
US20080091424A1 (en) * | 2006-10-16 | 2008-04-17 | Microsoft Corporation | Minimum classification error training with growth transformation optimization |
US20080147579A1 (en) * | 2006-12-14 | 2008-06-19 | Microsoft Corporation | Discriminative training using boosted lasso |
US20080201139A1 (en) * | 2007-02-20 | 2008-08-21 | Microsoft Corporation | Generic framework for large-margin MCE training in speech recognition |
US8423364B2 (en) * | 2007-02-20 | 2013-04-16 | Microsoft Corporation | Generic framework for large-margin MCE training in speech recognition |
US9805723B1 (en) | 2007-12-27 | 2017-10-31 | Great Northern Research, LLC | Method for processing the output of a speech recognizer |
US9753912B1 (en) | 2007-12-27 | 2017-09-05 | Great Northern Research, LLC | Method for processing the output of a speech recognizer |
US9418662B2 (en) * | 2009-01-21 | 2016-08-16 | Nokia Technologies Oy | Method, apparatus and computer program product for providing compound models for speech recognition adaptation |
US20100185444A1 (en) * | 2009-01-21 | 2010-07-22 | Jesper Olsen | Method, apparatus and computer program product for providing compound models for speech recognition adaptation |
US8494850B2 (en) * | 2011-06-30 | 2013-07-23 | Google Inc. | Speech recognition using variable-length context |
US8959014B2 (en) | 2011-06-30 | 2015-02-17 | Google Inc. | Training acoustic models using distributed computing techniques |
US8727991B2 (en) | 2011-08-29 | 2014-05-20 | Salutron, Inc. | Probabilistic segmental model for doppler ultrasound heart rate monitoring |
US9336771B2 (en) * | 2012-11-01 | 2016-05-10 | Google Inc. | Speech recognition using non-parametric models |
US20150371633A1 (en) * | 2012-11-01 | 2015-12-24 | Google Inc. | Speech recognition using non-parametric models |
US20170200447A1 (en) * | 2013-04-25 | 2017-07-13 | Nuance Communications, Inc. | Systems and methods for providing metadata-dependent language models |
US10102849B2 (en) * | 2013-04-25 | 2018-10-16 | Nuance Communications, Inc. | Systems and methods for providing metadata-dependent language models |
US9858922B2 (en) | 2014-06-23 | 2018-01-02 | Google Inc. | Caching speech recognition scores |
US10204619B2 (en) | 2014-10-22 | 2019-02-12 | Google Llc | Speech recognition using associative mapping |
EP3174262A4 (en) * | 2015-03-20 | 2018-01-17 | Baidu Online Network Technology (Beijing) Co., Ltd | Artificial intelligence based voiceprint login method and device |
CN111523565A (en) * | 2020-03-30 | 2020-08-11 | 中南大学 | Streaming processing method, system and storage medium for big data |
CN111523565B (en) * | 2020-03-30 | 2023-06-20 | 中南大学 | Big data stream processing method, system and storage medium |
Also Published As
Publication number | Publication date |
---|---|
JP2871561B2 (en) | 1999-03-17 |
JPH09152886A (en) | 1997-06-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5839105A (en) | Speaker-independent model generation apparatus and speech recognition apparatus each equipped with means for splitting state having maximum increase in likelihood | |
Ostendorf et al. | HMM topology design using maximum likelihood successive state splitting | |
EP0771461B1 (en) | Method and apparatus for speech recognition using optimised partial probability mixture tying | |
US5864810A (en) | Method and apparatus for speech recognition adapted to an individual speaker | |
EP0921519B1 (en) | Technique for adaptation of hidden Markov Models for speech recognition | |
EP0635820B1 (en) | Minimum error rate training of combined string models | |
US7054810B2 (en) | Feature vector-based apparatus and method for robust pattern recognition | |
EP0691640B1 (en) | Adaptive training method for pattern recognition | |
Lee et al. | Improved acoustic modeling for large vocabulary continuous speech recognition | |
US6076053A (en) | Methods and apparatus for discriminative training and adaptation of pronunciation networks | |
US5778341A (en) | Method of speech recognition using decoded state sequences having constrained state likelihoods | |
Chengalvarayan et al. | HMM-based speech recognition using state-dependent, discriminatively derived transforms on Mel-warped DFT features | |
US20080312921A1 (en) | Speech recognition utilizing multitude of speech features | |
US5946656A (en) | Speech and speaker recognition using factor analysis to model covariance structure of mixture components | |
WO2001022400A1 (en) | Iterative speech recognition from multiple feature vectors | |
WO1996022514A9 (en) | Method and apparatus for speech recognition adapted to an individual speaker | |
Ney et al. | The RWTH large vocabulary continuous speech recognition system | |
US5956676A (en) | Pattern adapting apparatus using minimum description length criterion in pattern recognition processing and speech recognition system | |
Chen et al. | Automatic transcription of broadcast news | |
Sankar | Experiments with a Gaussian merging-splitting algorithm for HMM training for speech recognition | |
Frankel et al. | Speech recognition using linear dynamic models | |
Yamagishi et al. | HSMM-based model adaptation algorithms for average-voice-based speech synthesis | |
JP3589044B2 (en) | Speaker adaptation device | |
JPH1185186A (en) | Nonspecific speaker acoustic model forming apparatus and speech recognition apparatus | |
Furui | Generalization problem in ASR acoustic model training and adaptation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ATR INTERPRETING TELECOMMUNICATIONS RESEARCH LABOR Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:OSTENDORF, MARI;SINGER, HARALD;REEL/FRAME:008363/0503 Effective date: 19961212 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: ATR INTERPRETING TELECOMMUNICATIONS RESEARCH LABOR Free format text: CHANGE OF ADDRESS;ASSIGNOR:ATR INTERPRETING TELECOMMUNICATIONS RESEARCH LABORATORIES;REEL/FRAME:013211/0068 Effective date: 20000325 |
|
AS | Assignment |
Owner name: DENSO CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ATR INTERPRETING TELECOMMUNICATIONS RESEARCH LABORATORIES;REEL/FRAME:013552/0465 Effective date: 20021031 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FPAY | Fee payment |
Year of fee payment: 12 |