OpenEMO: 2015

Sunday, 28 June 2015

Week 2: Using openEAR

Getting back to openEAR, a folder with a lot of code files and limited resources in the tutorial which mainly dealt with configuration files is what we have at hand. Now comes the part where we use openEAR.

Step 1. Feature Extraction:

The features for recognizing emotions are extracted from the audio files using openSMILE which was also built by the same set of people at TUM.

cd openEAR-0.1.0
./SMILExtract -h

If you see something like this:

 =============================================================== 
   openSMILE version 0.1.0
   Build date: 01-09-2009
   (c) 2008-2009 by Florian Eyben, Martin Woellmer, Bjoern Schuller
   TUM Institute for Human-Machine Communication
 =============================================================== 
 
Usage: SMILExtract [-option (value)] ...
 
 -h    Show this usage information
.
.
.
(MSG) [2] in SMILExtract : openSMILE starting!
(MSG) [2] in SMILExtract : config file is: smile.conf
(MSG) [2] in cComponentManager : successfully registered 68 component types.

then you have installed openEAR correctly.

Since I did not have Red Hen's clip command on my local machine, I wrote a shell script that uses ffmpeg to extract audio in the wav format with a sampling rate of 16000Hz and then again clipped it via some php code that uses ffmpeg to get only a small chunk of the audio between two given timestamps for testing. Let's assume a file called sample.wav for now. Although insignificant, you may find that on GitHub.

To extract features, we use (from within the openEAR folder)

./SMILExtract -C config/config_file -I path_to_wav -O path_to_output_with_extension

For Eg.

./SMILExtract -C config/emobase.conf -I ../RHSOC/sample.wav -O ../RHSOC/sample_wav.arff

2. Conversion and Scaling:

The weka arff files need to be converted to a format that libsvm can handle. Hence, openEAR has some perl scripts in the scripts folder to do this, of which we need to use arffToLsvm.perl. Convert the file as follows:

cd scripts/modeltrain
perl arffToLsvm.pl arff_file output_lsvm_format_file

Each of the config files provided with openEAR correspond to a standard emotion set or database that has been used to classify emotions. A few of them come with a pre-defined scale that needs to be used to scale features of the input wav file to so that the data trained with and the testing data are on the same scale or in other words, are comparable.
The libsvm package provides a utility called svm-scale for this purpose. You may use it like this:

svm-scale path_to_reference_scale path_to_file_to_scale > output

So in our case, with sample_wav.lsvm, from ~/RHSOC, I ran

svm-scale ../openEAR-0.1.0/models/emo/emodb.emobase.scale sample_wav.lsvm > sample_wav.scaled.lsvm

3. Prediction:

The final phase of using openEAR is the emotion prediction! Now that we have everything compatible with the pre-trained model, we may use svm-predict to find the emotion for the the given audio file.

svm-predict scaled_lsvm_test_file path_to_model_file result_file

Eg.

svm-predict sample_wav.scaled.lsvm ~/openEAR-0.1.0/models/emo/emodb.emobase.model resultOutSample

And Voila! The result file would magically contain the number corresponding to the class of emotion as given in the models folder for that particular model. One could also see the probability distribution over all classes in case of emoDB by setting the -b flag in svm-predict with value 1. Although I did manage to get the tool working, every sample I entered gave me the same result with the exact same probability distribution. If the distributions changed, one might have assumed the tool is wrong, but since that wasn't the case, there seems to be some mistake in the way I'm entering data or some catch to the way it should be entered that I was unable to figure out.

Week 1: openEAR

As the Summer of Code began, the first target set was to get a few results from openEAR as an emotion recognizer from news audio.

1. Obtaining openEAR:

openEAR is the Munich Open-Source Emotion and Affect Recognition toolkit created by Florian Eyben, Martin Woellmer and Bjoern Schuller at the Institute for Human-Machine Communication, Technische Universitaet Muenchen. It provides audio feature extraction implemented in C++ and classifiers and pre-trained models along with some perl scripts to make it flexible for tailor making one's own emotion recognizer.

It is hosted on SourceForge and can be found here. Since I am working on Ubuntu, I will provide instructions for a debian-based OS. Download the tar and decompress using:

mv ~/Downloads/openEAR-0.1.0.tar.gz ~/ 
tar -zxvf openEAR-0.1.0.tar.gz

2. Installing openEAR:

Before we install openEAR, let's get the dependencies (just in case they are not already present)

sudo apt-get install autotools automake build-essential libtool libpthread-stubs0-dev libc6-dev build-essential

It's quite possible most of these would be present on your machine. But ensure you've done these as a precaution.
Then change directory into the openEAR folder and run the following. Ensure you have execute permissions for the files.

./autogen.sh
./autogen.sh (For some reason, it works only when run twice)
./configure
make
sudo make install

3. Post Installation:

openEAR uses weka. So as a next step, let's install weka if you do not already have it.

sudo apt-get install weka

Edit: Something I recently found about openEAR is that the files fsel.pl and arff-functions.pl in the scripts folder have paths to weka coded in. So we need to change that to the paths specific to our weka installation.
So,
$wekapath = "\$CLASSPATH:/home/don/eyb/inst/weka-3-5-6/weka.jar" should be changed to the path to the jar on your machine. I found the jar in /usr/share/java/weka.jar and used that.
Also,
$wekacmd = "java -Xmx1024m -classpath /home/don/eyb/inst/weka-3-5-6/weka.jar "; should be changed accordingly.

Although this seems quite straightforward, it took me a good amount of time and effort to get it set up right. There seem to be a lack of resources for the amateur (like me) to install openEAR and I hope that changes with this post and whatever more posts come up with regards to openEAR, making it easier for others down the line.

I spent bits and pieces of the week going through some news archives using EDGE. I watched a couple of Nancy Grace videos and The O'Reilly Factor to get an idea of what sort of speech I would be working with to predict emotions.

Monday, 22 June 2015

Introductory Post: What is OpenEMO?

Hello Reader!

You must be wondering what this blog, "OpenEMO" is all about? If you thought that this is some sort of an Open Source project and that deals emotions, guess what? You are absolutely correct!

OpenEMO is an Open Source framework that Red Hen Lab is working on, that will be able to recognize emotions from audio streams. Under the guidance of Prof. Steen of Red Hen, I will be attempting to build this in these few weeks.

This Open Source project, hosted on github, is meant to be a generic tool for Emotion Recognition from Audio much like the openEAR project. However, the openEMO project will involve modules built to achieve its goal to detect emotions in news first and then go ahead to a more generic tool.

The ultimate aim of this project is to annotate generated news transcripts from the audio. For this,
the audio should be free of advertisements as we will not be measuring emotions over those. Also, transcripts were required to be force-aligned so that we may annotate them correctly based on the analysis of the audio stream.

Just to go ahead and articulate the thoughts that went into this project when we began working on it,
the questions that came up before we began planning the workflow included:
1. What is the set of emotions that we want to detect? Will they not be application specific?
2. How do we decide on the unit (in terms of time/speaker) over which emotion must be measured?
3. Are there not cases where multiple emotions can be expressed simultaneously? (For eg. Anger and Sarcasm together)

To get some insights into these, we decided to look into openEAR's implementation to understand what emotions it was built to recognize. We also agreed upon the output of the diarization module (audio segments) being used as the basic unit over which we will look to associate an emotion over. Currently, we would look to provide the single emotion that is best with the problem of multiple emotions to be considered once we can detect a single emotion satisfactorily.

The workflow that I had initially intended to implement can be found at this link. The code for this project can be found here. With each new post, I will go on to explain what work was done during a particular week/phase and how openEMO is progressing.

Do reach out if you have any more queries by commenting on the posts!