This document contains information on how to use MusicMiner for research in audio similarity. The MusicMiner audio features are based on the research results of the Databionic Research Group. A large scale analysis of possible audio features revealed a small set of high quality sound descriptors. The features showed superior performance for visualization and clustering of music in comparison with other feature sets from the scientific literature on several music datasets.

Usage of MusicMiner audio features

The prototypes of the audio feature used in the publications were written in Matlab. Due to limited programming ressources it was impossible to create an exact reproduction in Java. Instead we repeated the methodology described in the publications with Java using a slightly different set of low level features and a much larger set of high level features. The resulting top 20 features are defined in etc/features/musicminer-VERSION.xml and they showed similar performance to the original 20 features defined e.g. in our technichal report.

If you want to extract these features on your own music datasets you need to

  1. Install MusicMiner
  2. Use e.g. mmadd -f ~/music/mp3 -r to add all music files in the specified folder and all subdirs to the database. Use the option -g to force a genre, otherwise it is read from the ID3 tag.
  3. Use e.g. mmafe -f musicminer -s 30 -x ~/music/feat to extract the MusicMiner audio features from a 30s segment in the center of each song.
The results are stored in two files per song in the specified folder named with a unique key per song. The *.lrn files contain the unique key and the 20 features in a single tab separated line. The *.names files contain the unique key the original filename and the genre (if available) seperated with tabs. The file musicminer.names contains the names ot the features. You can add more songs later with mmadd and rerun mmafe. The extraction works incrementally, the program checks for existing files and only processes songs where no features have been extracted, yet. Note, that when using the -x option the features are only stored in the files and not in the database.

For further processing you can e.g. use the Unix commands cat [0-9]*.lrn > songs.lrn and cat [0-9]*.lrn > songs.names to merge the single files and import them into your machine learning tools (for Windows see GNU utilities for Win32).Note, that the raw feature values are stored. For clustering and distance calculations a normalization e.g. to zero mean and unit variance should be applied.

The option -k can be used with mmafe to save the downsampled (mono, 22050Hz), trimmed, and normalized (DC 0, maximum absolute amplitude = 1) *.wav files in the same location as the original sound files with changed file extensions. This is useful to ensure the same conditions when extracting audio features with different programs for comparison.

Other feature sets can be defined using the Yale XML syntax. For some examples see etc/features/*.xml:

musicminerDefault MusicMiner audio features20
generate40kLarge set of candidate audio features39,760
generate688kHuge set of candidate audio features687,960
mfccMean and standard deviation of first 20 MFCC40
chromaMean and standard deviation of 12 Chroma tones24

If you need further assistance or want to help making this more comfortable, please contact us. If you use the MusicMiner features in your scientific work, please indicate the version number of the features (see etc/features/musicminer-VERSION.xml) you used and cite one of the following references:

Moerchen, F., Ultsch, A., Thies, M., Loehken, I.: Modelling timbre distance with temporal statistics from polyphonic music, IEEE Transactions on Speech and Audio Processing, 14(1), pp. 81-90, (2006)
Moerchen, F., Ultsch, A., Thies, M., Loehken, I., Noecker, M., Stamm, C., Efthymiou, N., Kuemmerer. M.: MusicMiner: Visualizing perceptual distances of music as topograpical maps, Technical Report No. 47, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2005)

Future work

We are not claiming that MusicMiner uses very best audio features ever. The results are surely somewhat biased towards the dataset we used. Further, the dataset was comparatively small and the ground truth was based on the consensus of only few listeners. We are looking for larger datasets annotated with timbre ground truth information to repeat our methodology and select even better audio features. If you are interested in a cooperation to perform further studies on a large corpus of music data or other music similarity research, please contact the Databionic Research Group.