This document contains information on how to use MusicMiner for research in audio similarity. The MusicMiner audio features are based on the research results of the Databionic Research Group. A large scale analysis of possible audio features revealed a small set of high quality sound descriptors. The features showed superior performance for visualization and clustering of music in comparison with other feature sets from the scientific literature on several music datasets.
The prototypes of the audio feature used in the publications were written in Matlab. Due to limited programming ressources it was impossible to create an exact reproduction in Java. Instead we repeated the methodology described in the publications with Java using a slightly different set of low level features and a much larger set of high level features. The resulting top 20 features are defined in etc/features/musicminer-VERSION.xml and they showed similar performance to the original 20 features defined e.g. in our technichal report.
If you want to extract these features on your own music datasets you need to
For further processing you can e.g. use the Unix commands cat [0-9]*.lrn > songs.lrn and cat [0-9]*.lrn > songs.names to merge the single files and import them into your machine learning tools (for Windows see GNU utilities for Win32).Note, that the raw feature values are stored. For clustering and distance calculations a normalization e.g. to zero mean and unit variance should be applied.
The option -k can be used with mmafe to save the downsampled (mono, 22050Hz), trimmed, and normalized (DC 0, maximum absolute amplitude = 1) *.wav files in the same location as the original sound files with changed file extensions. This is useful to ensure the same conditions when extracting audio features with different programs for comparison.
Other feature sets can be defined using the Yale XML syntax. For some examples see etc/features/*.xml:
|musicminer||Default MusicMiner audio features||20|
|generate40k||Large set of candidate audio features||39,760|
|generate688k||Huge set of candidate audio features||687,960|
|mfcc||Mean and standard deviation of first 20 MFCC||40|
|chroma||Mean and standard deviation of 12 Chroma tones||24|
If you need further assistance or want to help making this more comfortable, please contact us. If you use the MusicMiner features in your scientific work, please indicate the version number of the features (see etc/features/musicminer-VERSION.xml) you used and cite one of the following references:
|Moerchen, F., Ultsch, A., Thies, M., Loehken, I.: Modelling timbre distance with temporal statistics from polyphonic music, IEEE Transactions on Speech and Audio Processing, 14(1), pp. 81-90, (2006)|
|Moerchen, F., Ultsch, A., Thies, M., Loehken, I., Noecker, M., Stamm, C., Efthymiou, N., Kuemmerer. M.: MusicMiner: Visualizing perceptual distances of music as topograpical maps, Technical Report No. 47, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2005)|
We are not claiming that MusicMiner uses very best audio features ever. The results are surely somewhat biased towards the dataset we used. Further, the dataset was comparatively small and the ground truth was based on the consensus of only few listeners. We are looking for larger datasets annotated with timbre ground truth information to repeat our methodology and select even better audio features. If you are interested in a cooperation to perform further studies on a large corpus of music data or other music similarity research, please contact the Databionic Research Group.