openAUDIO.eu

Editors:

Bjoern Schuller (Technische Universitaet Muenchen, Germany)
Florian Eyben (Technische Universitaet Muenchen, Germany)
Felix Weninger (Technische Universitaet Muenchen, Germany)



Content and Software

openBliSSART

Authors: Felix Weninger, Alexander Lehmann, Bjoern Schuller

openBliSSART is a C++ framework and toolbox that provides "Blind Source Separation for Audio Recognition Tasks". Its areas of application include, but are not limited to, instrument separation (e.g. extraction of drum tracks from popular music), speech enhancement, and feature extraction. It features various source separation algorithms, with a strong focus on variants of Non-Negative Matrix Factorization (NMF).

Besides basic blind (unsupervised) source separation, it provides support for component classification by Support Vector Machines (SVM) using common acoustic features from speech and music processing. For component playback and data set creation, a Qt-based GUI is available. Furthermore, supervised NMF can be performed for source separation as well as audio feature extraction.

openBliSSART is fast: typical real-time factors are in the order of 0.1 (Euclidean NMF) on a state-of-the-art desktop PC. It is written in C++, enforcing strict coding standards, and adhering to modular design principles for seamless integration into multimedia applications.

Interfaces are provided to Weka and HTK (Hidden Markov Model Toolkit).

openBliSSART is free software and licensed under the GNU General Public License.

We provide a demonstrator that uses various features of openBliSSART to separate drum tracks from popular music. This demonstrator, along with extensive documentation, including a tutorial, reference manual, and description of the framework API, can be found in the openBliSSART source distribution.

If you want to use openBliSSART for your research, please cite the following paper:

Bjoern Schuller, Alexander Lehmann, Felix Weninger, Florian Eyben, Gerhard Rigoll: "Blind Enhancement of the Rhythmic and Harmonic Sections by NMF: Does it help?", in Proc. NAG/DAGA 2009, Rotterdam, The Netherlands, pp. 361-364.

 


openSMILE.

Authors: Florian Eyben, Martin Woellmer, Bjoern Schuller

The openSMILE tool enables you to extract large audio feature spaces in realtime. SMILE is an acronym for Speech & Music Interpretation by Large Space Extraction. It is written in C++ and is available as both a standalone commandline executable as well as a dynamic library (A GUI version is to come soon). The main features of openSMILE are its capability of on-line incremental processing and its modularity. Feature extractor components can be freely interconnected to create new and custom features, all via a simple configuration file. New components can be added to openSMILE via an easy plugin interface and a comprehensive API.

openSMILE is free software licensed under the GPL license and is currently available via Subversion (http://subversion.tigris.org/) in a pre-release state here. Commercial licensing options are available upon request.

To directly check out the Subversion repository, type the following command in a command-line prompt on a system where SVN is installed:
   svn co https://opensmile.svn.sourceforge.net/svnroot/opensmile opensmile

If you use openSMILE for your research, please cite the following paper:
Florian Eyben, Martin Wöllmer, Björn Schuller: "openEAR - Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit", in Proc. 4th International HUMAINE Association Conference on Affective Computing and Intelligent Interaction 2009 (ACII 2009), IEEE, Amsterdam, The Netherlands, 10.-12.09.2009.

A brief summary of openSMILE's features is given here:

  • Cross-platform (Windows, Linux, Mac)
  • Fast and efficient incremental processing in real-time
  • High modularity and reusability of components
  • Plugin support
  • Multi-threading support for parallel feature extraction
  • Audio I/O:
    • WAVE file reader/writer
    • Sound recording and playback via PortAudio library.
    • Acoustic echo cancellation for full duplex recording/playback in an open-microphone setting
  • General audio signal processing:
    • Windowing Functions (Hamming, Hann, Gauss, Sine, ...)
    • Fast-Fourier Transform
    • Pre-emphasis filter
    • Comb filter (available soon)
    • FIR/IIR filter (available soon)
    • Autocorrelation
    • Cepstrum
  • Extraction of speech-related features:
    • Signal energy
    • Loudness (pseudo)
    • Mel-spectra
    • MFCC
    • Pitch
    • Voice quality
    • Formants (available soon)
    • LPC (available soon)
  • Music-related features:
    • Pitch classes (semitone spectrum)
    • Chroma features
    • Chroma based CENS features
    • Tatum and Meter vector
  • Moving average smoothing of feature contours
  • Moving average mean subtraction (e.g. for on-line cepstral mean subtraction)
  • Delta Regression coefficients of arbitrary order
  • Functionals:
    • Means, Extremes
    • Moments
    • Segments
    • Peaks
    • Linear and quadratic regression
    • Percentiles
    • Durations
    • Onsets
    • DCT coefficients
    • ...
  • Popular feature file formats supported:
    • Hidden Markov Toolkit (HTK) parameter files (write)
    • WEKA Arff files (currently only non-sparse) (read/write)
    • Comma separated value (CSV) text
    • LibSVM feature file format
  • Fully HTK compatible MFCC, energy, and delta regression coefficient computation
  • Fast: 6k features extracted with 0.02 RTF
Acknowledgment: openSMILE's development has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement No. 211486 (SEMAINE).

 


openEAR.

Authors: Florian Eyben, Martin Woellmer, Bjoern Schuller

The Munich openEAR toolkit is a complete package for automatic speech emotion recognition. Its acronym stands for open Emotion and Affect Recognition Toolkit. It is based on the openSMILE feature extractor and thus is capable of real-time on-line emotion recognition. Pre-trained models on various standard corpora are included, as well as scripts and tools to quickly build and evaluate custom model sets. As classifier currently included are Support-Vector Machines using the LibSVM libray. Soon to come are also Bidirectional Long-Short-Term-Memory Recurrent Neural Nets, Discriminative Muli-nominal Bayesian Networks, and Lazy Learners.

openEAR is free software licensed under the GPL license. The first release (including model sets and pre-compiled openSMILE) will be available soon on Sourceforge: openEAR. Meanwhile, please refer to the openSMILE project, where we provide the feature extraction engine.

If you use openEAR for your research, please cite the following paper:

Florian Eyben, Martin Wöllmer, Björn Schuller: "openEAR - Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit", in Proc. 4th International HUMAINE Association Conference on Affective Computing and Intelligent Interaction 2009 (ACII 2009), IEEE, Amsterdam, The Netherlands, 10.-12.09.2009.

Acknowledgment: openEAR's development has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement No. 211486 (SEMAINE).

DOWNLOAD: The first release of openEAR can be downloaded at: http://www.mmk.ei.tum.de/~eyb/openEAR-0.1.0.tar.gz . A short tutorial is included with the release. Further, the release contains pre-compiled binaries of the openSMILE engine for Windows and Linux, including PortAudio support. The live emotion recognition GUI is not yet included in the release, it will be made available within the next few weeks.

 

 


Annotations.

Authors: Bjoern Schuller

The annotation of the MTV music data set for Automatic Mood Classification is accessible as PDF or Comma-Separated-Values (CSV) Text Files. For details please refer to and cite in case of usage:

Björn Schuller, Clemens Hage, Dagmar Schuller, Gerhard Rigoll: "'Mister D.J., Cheer Me Up!': Musical and Textual Features for Automatic Mood Classification", Journal of New Music Research, Routledge Taylor & Francis, Vol. 39, Issue 1, pp. 13-34, 2010.

The annotation of the NTWICM music data set for Automatic Mood Classification is accessible as ARFF File. This file is readable as text file and resembles Comma-Separated-Values (CSV) with an explanatory header. The according labelling-tool can be downloaded as Foobar2000 plugin. It allows for annotation of audio in the valence-arousal plane. For details please refer to and cite in case of usage of the annotation or the tool:

Björn Schuller, Johannes Dorfner, Gerhard Rigoll: "Determination of Non-Prototypical Valence and Arousal in Popular Music: Features and Performances", EURASIP Journal on Audio, Speech, and Music Processing (JASMP), Special Issue on "Scalable Audio-Content Analysis", vol. 2010, Article ID 735854, 19 pages, 2010.


In any case do not hesitate to contact us .
Looking forward to hearing from you,


Bjoern Schuller
Florian Eyben
Felix Weninger

More information will follow in short.




Last updated: July 28th, 2010