In the Transforming audio signals into the frequency domain recipe, we discussed how to convert a signal into the frequency domain. In most modern speech recognition systems, people use frequency domain features. After you convert a signal into the frequency domain, you need to convert it into a usable form. Mel Frequency Cepstral Coefficients (MFCC) is a good way to do this. MFCC takes the power spectrum of a signal and then uses a combination of filter banks and discrete cosine transform (DCT) to extract the features.
Extracting frequency domain features
Getting ready
In this recipe, we will see how to use the python_speech_features package to extract frequency domain features. You can find the installation instructions...