This document gives a brief introduction into the technology and source code of the Databionic MusicMiner for interested developers. More information can be found in the Javadoc and the source itself. If you have any questions or ideas for new features please contact us on the mailing lists.
This document describes what needs to be done to compile and set up MusicMiner. The description covers Linux, Windows, and OS X Tiger. It might work on other operating systems as well, please report and experiences.
Environment variables |
Many of the following steps require setting environment variables. This is how you do it on
|
Java |
|
Maven |
|
MusicMiner |
|
Audio tools |
|
Compiling |
|
Installing |
|
Documentation |
|
Misc |
|
If something doesn't work as described above please ask for assistance, we might have forgotten to mention something.
The description of sound based on features extracted from the audio data of each song as performed by the MusicMiner is directly based on the latest results in research on audio similarity. Each song is described by a fixed length vector of real valued features extracted from the audio data. The conversion from *.mp3 to raw audio data is performed using e.g. mplayer and sox. A pure Java processing of the audio files is planned. The actual audio features are then extracted using the Value Series Plugin of Yale machine learning software. Yale offers a modular processing concept based on XML. We did have to extend the Value Series Plugin to fit our purpose, however. These changes are already incorporated into the latest CVS version of the plugin.
The MusicMaps used to visualize the sound space are U-Matrix visualizations of Emergent Self-Organizing Maps based on the extracted audio features. Emergence is the ability of a system to develop high level structures by the cooperation of many elementary processes. A popular example of an emergent phenomenon is the so called La Ola in a sports stadium. A large number of people perform the simple task of standing up and cheering for a short moment. In this way a wave rolling through the crowd is formed. The wave is not visible at the moment of being part of it, only from a distance. Transferring the principles of self organization into data analysis leads to Emergent Self-Organizing Maps (ESOM). ESOM consist of a large number of prototypes describing different sounding points in the soudn space. During the training, the sound examples are iteratively adjusted to the structures thersonal music collection. A low dimensional projection that preserves the perceptual topology of the input space as good as possible is created.
The U-Matrix is the canonical display of ESOM. The local differences in sound are displayed at each map position as a height value creating a 3D landscape of the high dimensional sound space. The height will be large in areas where no or few songs reside, creating mountain ranges for large perceptual differences in sound. The height will be small in areas with many similar sounding songs. Thus homogeneous groups of music are depicted as valleys. Using the same coloring commonly found on geographical maps, an intuitive visualizations of the sound space of your personal music collection is created.
We have implemented the latest research results on ESOM in the Databionic ESOM Tools, a software for scientists interested in using ESOM technology for visualization, clustering and classification. A simplified version offering less options is integrated in the MusicMiner to offer an intuitive way of navigating personal music collections.
The database storing the meta information on songs like artist and title as well as the extracted audio features and the MusicMaps is programmed using the Torque database abstraction layer. This way, theoretically many different backend databases can be used. Currently we are using the pure java HSQL database because it can easily be distributed and set up with the MusicMiner installer. If you are interested in integrating support for other databases, e.g. MySQL, please contact us on the mailing lists.
The database schema is designed to be very flexible. Most information is stored in seperate tables, to reduce redundancies, e.g. genres, artists, albums. A song can be in the database without an actual file being present. In fact a song can also be linked to several audio files at different locations, though this is not heavily used within MusicMiner, yet. Playlists can contain songs, albums, and other playlists. Ratings can be stored per user and song or per user and playlist.
The import and export of metadata is based on XML. The dom4j library is used for this purpose. An example XML file looks as follows:
<mm> <artist id="197" name="Bob Marley" sortname="Bob Marley"/> <artist id="198" name="The Wailers" sortname="The Wailers"/> <playlist id="141"> <name>Legend</name> </playlist> <media id="2" name="localhost"/> <song id="844" playtime="233" position="3"> <name>Could You Be Loved</name> <added>2005-03-23</added> <comment></comment> <artist ref="197"/> <artist ref="198"/> <album ref="141"/> <user id="3" name="mminer"/> <file id="894" size="5282829"> <url>file:///home/mm/reggae/Bob Marley & The Wailers - 03 - Could You Be Loved.mp3</url> <bitrate>320</bitrate> <mpegid>1</mpegid> <layer>III</layer> <frequency>44100</frequency> <channelmode>Joint Stereo (Stereo)</channelmode> <emphasis>none</emphasis> <framecount>8998</framecount> <media ref="2"/> </file> </song> </mm>Not all database fields are supported, yet. If you are interested completing the XML support of MusicMiner, please contact us on the mailing lists.
databionics.mm | All command line applications and the MusicMiner graphical user interface including the definitions for command line options in the *Options classes. |
databionics.mm.add | Some helper classes for parsing filenames and ID3 tags and merging the information for adding them to the database. |
databionics.mm.afe | Audio feature extraction. Includes conversion of audio files and and interface to Yale that perform the feature extraction. |
databionics.mm.gui | Everything around the graphical user interface. |
databionics.mm.om | The object model as generated by Torque, plus many convenience functions added for easy object-oriented use in the rest of MusicMiner. |
databionics.mm.xml | Some helper classes for XML import with dom4j. |