speaker independent:training이 필요없는 음성인식
speaker dependent:training을 필요로하고 그럼 정확도가 더 올라가는 음성인식
DTW(Dynamic Time Warping)
LPC(Linear Predictive Coding)
Cepstrum
HMM(Hidden Markov Model)
n-gram language model
back-off model(multiple length n-gram model)
maximum likelihood estimation
generative model:(input,output)의 joint distribution을 모델링하고 goal로 삼는 방법
discriminative mode:(output|input)의 conditional distribution을 모델링하고 goal로 삼는 방법
feedforward ANN
RNN
LSTM(Long short-term memory)
vanishing gradient problem이 없으며
discrete time steps의 수천번 이전의 내용을 기억하는 효과가 있어서 speech recognition에 유용
acoustic model:audio signal과 phonemes(음소)사이의 관계를 밝히는 모델링
language model:words의 sequence에 해당하는 distribution model
In the long history of speech recognition, both shallow form and deep form (e.g. recurrent nets) of artificial neural networks had been explored for many years during 1980s, 1990s and a few years into the 2000s.[47][48][49] But these methods never won over the non-uniform internal-handcrafting Gaussian mixture model/Hidden Markov model (GMM-HMM) technology based on generative models of speech trained discriminatively.
GMM(Gaussian mixture model)
TDNN(Time Delay Neural Networks)
Autoencoder
RNN-CTC model
LAS(Listen, Attend and Spell, Attention-based ASR model)
LSD(Latent Sequence Decomposition)
WLAS(Watch, Listen, Attend and Spell)
MD-LSTEM(2D-LSTM, Multidimensional LSTM)
'MachineLearning > 기본' 카테고리의 다른 글
How to handle imbalanced classification problems (0) | 2018.05.23 |
---|---|
신호분석 정리 (0) | 2018.03.22 |
[First Edition]머신러닝 기본 (0) | 2016.05.11 |