================= Bodhidharma 1.2 by Cory McKay ================= -- OVERVIEW -- Bodhidharma is a general-purpose software system for automatically classifying musical MIDI recordings based on pre-defined taxonomies. Although the emphasis is put on genre classification in the documentation for the software, content-based classification of almost any type can be performed simply by changing the taxonomy and model training recordings. Recordings could be classified by composer, performer, geographical area, mood, or any other criteria that the user desires. The Bodhidharma software allows the user to custom design his or her hierarchal taxonomy, choose the features on which to base classification, control the meta-data associated with recordings, extract features from MIDI recordings, train customized classifiers on model recordings and classify unknown recordings with trained classifiers. All of this can be done using a relatively easy to use graphical user interface with full saving functionality. This software includes an extensive library of high-level musical features as well as a novel hybrid classification system that makes use of hierarchical, flat and round robin classification. Both k-nearest neighbour and neural network-based classifiers are used, and feature selection and weighting are performed using genetic algorithms. -- AUTHOURSHIP AND LICENSING -- This software was produced as part of of Cory McKay's master's thesis at McGill University. All software design, development and testing was performed by Cory McKay. As the work of one person with limited time, there may be some bugs left in the software. Please contact Cory McKay at cory.mckay@mail.mcgill.ca with any bug reports or questions. This software is included in Cory McKay's 2004 thesis copyright. This software is distributed freely, with the understanding that the authour is not liable for any damages of any sort incurred through the use of the software. There is no express or implied warranty. Furthermore, the authour must be cited in any publications involving the use of the Bodhidharma system. This software is open source, and users are welcome to modify the code as they wish. This software includes the Xerces XML parsing software developed by the Apache Software Foundation (http://www.apache.org/). Their licence is available in the on-line help of the Bodhidharma software. Financial support was provided to the authour through a grant from the Fonds Québécois de la recherche sur la société et la culture. The advice of Ichiro Fujinaga, Cory's thesis advisor, was of great help throughout the development of this software. -- FURTHER INFORMATION -- Those wishing information beyond the scope of this manual should consult the authour's master's thesis (available at www.music.mcgill.ca/~cmckay) or contact him at cory.mckay@mail.mcgill.ca. Chapter 6 of the thesis will be of particular interest to those wishing to understand how the training and classification performed by this software works. -- COMPATABILITY ISSUES -- The Bodhidharma software is written in Java, which means that it can theoretically be run on any system that has the Java Runtime Environment (JRE) installed on it. It is recommended that the software be used with Windows XP, however, as it was developed and tested under the Windows XP operating system and the interface’s appearance is optimized for Windows. Although the software will theoretically run under earlier versions of Windows, OS X, Linux, Solaris or any other operating system with the JRE installed on it, users should be advised that the Bodhidharma software has not yet been tested on other platforms, so difficulties may be encountered. This software was developed with version 1.4.1 of the JDK (Java Development Kit), so it is suggested that the corresponding version or higher of the JRE be installed on the user's computer. -- INSTALLING THE JAVA RUNTIME ENVIRONMENT -- If your system already has the JRE installed, you may skip this section. If not, you will need to install it in order to run Bodhidharma. It can be downloaded for free from the java.sun.com web site. The JDK includes the JRE. When the download is complete, follow the installation instructions that come with it. In particular, be sure to follow the steps needed to make it possible to run the JRE in arbitrary directories. Under Windows XP, this means adding an entry such as C:\j2sdk1.4.1\bin to the system PATH variable. -- INSTALLING THE BODHIDHARMA SOFTWARE -- All of the Bodhidharma files are contained in the Bodhidharma.zip file. The files contained in this file can be unpacked using WinZip, Stuffit Expander or any other archive expansion utility. The files must be unpacked to the C:\Bodhidharma directory (under windows). Although the software will still work if it is placed in another directory, some of the sample project files will not function properly. -- GUIDE TO BUNDLED FILES -- After unpacking, the C:\Bodhidharma directory should contain the following items: * MIDI_Files: A directory holding MIDI files. Will be empty for copyright reasons under the standard Bodhidharma distribution. * ProgramFiles: A directory holding files used by the Bodhidharma software, including the html files for the on-line manual, the Xerces XML file parser and graphics files used by the software. * Projects: Contains several sample projects. See below for more details. Not available in all Bodhidharma distributions. * Source_Code: Contains the Java source code for the Bodhidharma main class as well as the Bodhidharma package. Also includes Javadoc API documentation. * Bodhidharma.jar: The Bodhidharma classes in a jar archive. * default_file_locations.xml: An XML file holding the file path of the default project that is loaded at startup of the Bodhidharma software. * Manual.html: The manual for the Bodhidharma software. * README.txt: Basic installation and running instructions. -- RUNNING THE BODHIDHARMA SOFTWARE -- One of the files that is extracted is called Bodhidharma.jar. This is the file that should be run in order to use the Bodhidharma software. You can run it either by double clicking on it or by typing the following line at a command prompt, such as the DOS prompt, in the Bodhidharma directory: java -jar Bodhidharma.jar It should be noted that the JRE does not always allocate sufficient memory for Bodhidharma to extract features from long MIDI files or perform lengthy classifications. Running Bodhidharma using the above method could result in an out of memory error if particularly long MIDI recordings are processed. It is therefore preferable to manually allocate a greater amount of memory. 500 to 700 MB should be more than enough for any realistic situation, but one should be careful to leave at least 300MB unallocated under Windows XP. If you have 512 MB of RAM, you should run the Bodhidharma software as follows at the command prompt: java -mx300M -jar Bodhidharma.jar If you have 1 GB or more of RAM, you should run Bodhidharma as follows: java -mx700M -jar Bodhidharma.jar -- RUNNING THE BODHIDHARMA SOFTWARE THROUGH THE COMMAND LINE -- If you already have Bodhidharma configuration files set up and do not want to use the GUI, then command line arguments may be used to run the software. This approach is only recommended for those with significant experience with Bodhidharma, however. The command line action options are as follows: -extract features: Extract features from MIDID files. -tran : Train on all currently loaded recordings and save to the given path. -classify : Classify all currently loaded recordings and output results to the given path. The command line settings are as follows: -proj : Specify the path of a project file to load. -tax : Specify the path of a taxonomy file to load. -feat : Specify the path of a feature settings file to load. -rec : Specify the path of a recordings file to load. -clas : Specify the path of a trained classifiers file to load. -pref : Specify the path of a preferences file to load. -dir : Specify the path of the directory to use as the current directory. Note that if a project file is specified at the command line, then this overrides other specified files (e.g. taxonomy file, feature settings file, etc.) For example, if one wished to extract features from a training and a testing set of MIDI recordings (given a pre-defined taxonomy, set of features to use and preferences settings), one could type: java -mx500M -jar Bodhidharma.jar -tax Projects\Command_Line_Test\SimpleTaxonomy.xml -feat Projects\Command_Line_Test\AllFeatures.xml -pref Projects\Command_Line_Test\CommandLinePreferences.xml -rec Projects\Command_Line_Test\CommandLineTestingRecordings.xml -extract features and java -mx500M -jar Bodhidharma.jar -tax Projects\Command_Line_Test\SimpleTaxonomy.xml -feat Projects\Command_Line_Test\AllFeatures.xml -pref Projects\Command_Line_Test\CommandLinePreferences.xml -rec Projects\Command_Line_Test\CommandLineTrainingRecordings.xml -extract features Training could then be performed by typing: java -mx500M -jar Bodhidharma.jar -tax Projects\Command_Line_Test\SimpleTaxonomy.xml -feat Projects\Command_Line_Test\AllFeatures.xml -pref Projects\Command_Line_Test\CommandLinePreferences.xml -rec Projects\Command_Line_Test\CommandLineTrainingRecordings.xml -train Projects\Command_Line_Test\TrainedClassifiers.xml And finally, classifications could be printed to standard out by typing: java -mx500M -jar Bodhidharma.jar -tax Projects\Command_Line_Test\SimpleTaxonomy.xml -feat Projects\Command_Line_Test\AllFeatures.xml -pref Projects\Command_Line_Test\CommandLinePreferences.xml -rec Projects\Command_Line_Test\CommandLineTestingRecordings.xml -clas Projects\Command_Line_Test\TrainedClassifiers.xml -classify Projects\Command_Line_Test\results.txt -- USING TAB-DELIMITED FORMATS INSTEAD OF BODHIDHARMA FORMATS -- Bodhidharma uses custom XML files to store settings. However, some users may wish to use tab-delimited text files as alternatives to the taxonomy and recording list XML file formats. So, in addition to the XML formats, the following two file formats may be used as alternatives: For the recording list file, each line denotes a different MIDI file, and each line should consist of: \t< ... >\n where the < and > characters are not included and \t denotes a tab, \n denotes a new line and < ... > denotes the possible inclusion of further \t's to denote membership to multiple genres. If no , then an automatic classification fo Unknown will be provided. For the taxonomy file, each line denotes a different leaf genre, and each line should consist of: \t\t< ... >\n where the < and > characters are not included and \t denotes a tab, \n denotes a new line and < ... > includes further parents (in order from low position in the genre hierarchy to the root). Note that Bodhidharma automatically determines if a text file or XML file is being used. -- RECOMPILING THE SOFTWARE -- Those who wish to edit and recompile the Java code may do so by simply typing: javac *.java in the Source_Code directory. The Javadocs may be rebuilt by typing: javadoc -tag author:a:"Author:" -link http://java.sun.com/j2se/1.4/docs/api -d JavaDocs C:\Bodhidharma\Source_Code\*.java C:\Bodhidharma\Source_Code\bodhidharma\*.java C:\Bodhidharma\Source_Code\bodhidharma\classifiers\*.java C:\Bodhidharma\Source_Code\bodhidharma\classifiers\bio_k_nearest_neighbour\*.java C:\Bodhidharma\Source_Code\bodhidharma\classifiers\feedforward_neural_networks\*.java C:\Bodhidharma\Source_Code\bodhidharma\classifiers\genetic_algorithms\*.java C:\Bodhidharma\Source_Code\bodhidharma\data_structures\*.java C:\Bodhidharma\Source_Code\bodhidharma\midi_parsing\*.java C:\Bodhidharma\Source_Code\bodhidharma\midi_parsing\multi_dimensional_features\*.java C:\Bodhidharma\Source_Code\bodhidharma\midi_parsing\one_dimensional_features\*.java C:\Bodhidharma\Source_Code\bodhidharma\utilities\*.java C:\Bodhidharma\Source_Code\bodhidharma\xml_parsing\*.java If you have recompiled the software into class files, you can run the software simply by typing: java Bodhidharma Although modifications to the system CLASSPATH variable are not necessary if the user is simply using the Bodhidharma.jar file as is, users who recompile the software so that the Bodhidharma class and bodhidharma package are in the C:\Bodhidharma directory will need to add the following entry to the CLASSPATH: C:\Bodhidharma\Source_Code;C:\Bodhidharma\Source_Code\bodhidharma;C:\Bodhidharma\ProgramFiles\xerces.jar; The class files may be repackaged into a jar by typing: jar cfm0 Bodhidharma.jar MANIFEST.MF Bodhidharma.java Bodhidharma.class bodhidharma where the MANIFEST.MF file can be taken from the existing Bodhidharma.jar file. Users who recompile the code and place it in a new jar file must place the jar file in the Bodhidharma directory. -- GUIDE TO SAMPLE PROJECTS -- The following sample projects are found in the Projects directory. They can be opened from inside Bodhidharma. * T_9_No_Recordings.xml: A basic nine leaf category taxonomy with no MIDI recordings or classifiers loaded yet. * T_9_Trained_With_Recorings.xml: A basic nine leaf category with trained classifiers and the 225 recordings that were used to train them. The recordings may be used even if the MIDI files are not included with this distribution, as the features have been extracted from the MIDI files and saved. The user may load his or her own recordings and classify them into the nine leaf category taxonomy using the trained classifiers. * T_9_Trained_Without_Recorings.xml: A basic nine leaf category with classifiers that have been trained on 225 recordings that are not included. The user can load his or her own recordings and classify them into the nine leaf category taxonomy using the trained classifiers. * T_9_Untrained_With_Recordings: A basic nine leaf category 225 recordings divided between them. The recordings may be used even if the MIDI files are not included with this distribution, as the features have been extracted from the MIDI files and saved. The user may experiment with training and testing classifiers using these recordings. * T_38_No_Recordings.xml: A thirty-eight leaf category taxonomy with no MIDI recordings or classifiers loaded yet. * T_38_Trained_With_Recorings.xml: A thirty-eight leaf category with trained classifiers and the 950 recordings that were used to train them. The recordings may be used even if the MIDI files are not included with this distribution, as the features have been extracted from the MIDI files and saved. The user may load his or her own recordings and classify them into the nine leaf category taxonomy using the trained classifiers. * T_38_Trained_Without_Recorings.xml: A basic thirty-eigh leaf category with classifiers that have been trained on 950 recordings that are not included. The user can load his or her own recordings and classify them into the nine leaf category taxonomy using the trained classifiers. * T_38_Untrained_With_Recordings: A thirty-eight leaf category 950 recordings divided between them. The recordings may be used even if the MIDI files are not included with this distribution, as the features have been extracted from the MIDI files and saved. The user may experiment with training and testing classifiers using these recordings. -- CURRENT LIMITATIONS -- - All MIDI files must be placed in the same directory if recording lists are be saved in XML format. This limitation does not hold true if the tab-delimited format is used. - No taxonomy may have a leaf class at the root level. -- NEW TO VERSION 1.1.1 -- - Reports on average weightings of features accross cross-validation folds are now generated and saved. -- NEW TO VERSION 1.1.2 -- - Minor bugs fixed in FeatureSelectionEvaluator and FeatureWeightingEvaluator classes that were causing an incorrect count of the number of possible categories to be generated - Gives user informative error messages if there are an insufficient number of recordings available for training. -- NEW TO VERSION 1.2 -- - Added command line functionality so that Bodhidharma can be run without the GUI. -- NEW TO VERSION 1.2.2 -- - Added ability to parse taxonomies and recording lists in tab-delimited MIREX 2005 text file formats as well as the previous Bodhidharma XML formats. - Generates proper XML escape characters in XML strings when parsing taxonomy and recording list (but not feature settings) files (currently cannot load them from XML)