speaker independent:training이 필요없는 음성인식

speaker dependent:training을 필요로하고 그럼 정확도가 더 올라가는 음성인식

 

DTW(Dynamic Time Warping)

LPC(Linear Predictive Coding)

Cepstrum

HMM(Hidden Markov Model)

n-gram language model

back-off model(multiple length n-gram model)

maximum likelihood estimation

generative model:(input,output)의 joint distribution을 모델링하고 goal로 삼는 방법

discriminative mode:(output|input)의 conditional distribution을 모델링하고 goal로 삼는 방법

feedforward ANN

RNN

LSTM(Long short-term memory)

vanishing gradient problem이 없으며

discrete time steps의 수천번 이전의 내용을 기억하는 효과가 있어서 speech recognition에 유용

acoustic model:audio signal과 phonemes(음소)사이의 관계를 밝히는 모델링

language model:words의 sequence에 해당하는 distribution model

In the long history of speech recognition, both shallow form and deep form (e.g. recurrent nets) of artificial neural networks had been explored for many years during 1980s, 1990s and a few years into the 2000s.[47][48][49] But these methods never won over the non-uniform internal-handcrafting Gaussian mixture model/Hidden Markov model (GMM-HMM) technology based on generative models of speech trained discriminatively.

GMM(Gaussian mixture model)

TDNN(Time Delay Neural Networks)

Autoencoder

RNN-CTC model

LAS(Listen, Attend and Spell, Attention-based ASR model)

LSD(Latent Sequence Decomposition)

WLAS(Watch, Listen, Attend and Spell)

MD-LSTEM(2D-LSTM, Multidimensional LSTM)

 

'MachineLearning > 기본' 카테고리의 다른 글

How to handle imbalanced classification problems  (0) 2018.05.23
신호분석 정리  (0) 2018.03.22
[First Edition]머신러닝 기본  (0) 2016.05.11

+ Recent posts