Getting back to openEAR, a folder with a lot of code files and limited resources in the tutorial which mainly dealt with configuration files is what we have at hand. Now comes the part where we use openEAR.
Step 1. Feature Extraction:
The features for recognizing emotions are extracted from the audio files using openSMILE which was also built by the same set of people at TUM.
If you see something like this:
Since I did not have Red Hen's clip command on my local machine, I wrote a shell script that uses ffmpeg to extract audio in the wav format with a sampling rate of 16000Hz and then again clipped it via some php code that uses ffmpeg to get only a small chunk of the audio between two given timestamps for testing. Let's assume a file called sample.wav for now. Although insignificant, you may find that on GitHub.
To extract features, we use (from within the openEAR folder)
2. Conversion and Scaling:
The weka arff files need to be converted to a format that libsvm can handle. Hence, openEAR has some perl scripts in the scripts folder to do this, of which we need to use arffToLsvm.perl. Convert the file as follows:
The libsvm package provides a utility called svm-scale for this purpose. You may use it like this:
3. Prediction:
The final phase of using openEAR is the emotion prediction! Now that we have everything compatible with the pre-trained model, we may use svm-predict to find the emotion for the the given audio file.
Step 1. Feature Extraction:
The features for recognizing emotions are extracted from the audio files using openSMILE which was also built by the same set of people at TUM.
cd openEAR-0.1.0 ./SMILExtract -h
If you see something like this:
=============================================================== openSMILE version 0.1.0 Build date: 01-09-2009 (c) 2008-2009 by Florian Eyben, Martin Woellmer, Bjoern Schuller TUM Institute for Human-Machine Communication =============================================================== Usage: SMILExtract [-option (value)] ... -h Show this usage information . . . (MSG) [2] in SMILExtract : openSMILE starting! (MSG) [2] in SMILExtract : config file is: smile.conf (MSG) [2] in cComponentManager : successfully registered 68 component types.then you have installed openEAR correctly.
Since I did not have Red Hen's clip command on my local machine, I wrote a shell script that uses ffmpeg to extract audio in the wav format with a sampling rate of 16000Hz and then again clipped it via some php code that uses ffmpeg to get only a small chunk of the audio between two given timestamps for testing. Let's assume a file called sample.wav for now. Although insignificant, you may find that on GitHub.
To extract features, we use (from within the openEAR folder)
./SMILExtract -C config/config_file -I path_to_wav -O path_to_output_with_extensionFor Eg.
./SMILExtract -C config/emobase.conf -I ../RHSOC/sample.wav -O ../RHSOC/sample_wav.arff
2. Conversion and Scaling:
The weka arff files need to be converted to a format that libsvm can handle. Hence, openEAR has some perl scripts in the scripts folder to do this, of which we need to use arffToLsvm.perl. Convert the file as follows:
cd scripts/modeltrain perl arffToLsvm.pl arff_file output_lsvm_format_file
Each of the config files provided with openEAR correspond to a standard emotion set or database that has been used to classify emotions. A few of them come with a pre-defined scale that needs to be used to scale features of the input wav file to so that the data trained with and the testing data are on the same scale or in other words, are comparable.The libsvm package provides a utility called svm-scale for this purpose. You may use it like this:
svm-scale path_to_reference_scale path_to_file_to_scale > outputSo in our case, with sample_wav.lsvm, from ~/RHSOC, I ran
svm-scale ../openEAR-0.1.0/models/emo/emodb.emobase.scale sample_wav.lsvm > sample_wav.scaled.lsvm
3. Prediction:
The final phase of using openEAR is the emotion prediction! Now that we have everything compatible with the pre-trained model, we may use svm-predict to find the emotion for the the given audio file.
svm-predict scaled_lsvm_test_file path_to_model_file result_fileEg.
svm-predict sample_wav.scaled.lsvm ~/openEAR-0.1.0/models/emo/emodb.emobase.model resultOutSampleAnd Voila! The result file would magically contain the number corresponding to the class of emotion as given in the models folder for that particular model. One could also see the probability distribution over all classes in case of emoDB by setting the -b flag in svm-predict with value 1. Although I did manage to get the tool working, every sample I entered gave me the same result with the exact same probability distribution. If the distributions changed, one might have assumed the tool is wrong, but since that wasn't the case, there seems to be some mistake in the way I'm entering data or some catch to the way it should be entered that I was unable to figure out.