This project was developed by Bradley Buda at the University of Michigan for the EECS 595 Natural Language Processing class. A detailed paper describing the system and sample code are available below.
This paper presents a system for identifying rhyming words in a block of text. The system is designed for finding rhymes in hip-hop (rap) music lyrics, but is general enough to work for any block of text containing rhymes. The system runs on Windows or Linux using the .NET Framework. It relies on the CMU Pronouncing Dictionary to find the constituent sounds for the supplied words, then uses a series of custom algorithms to find patterns that indicate rhymes. The system is capable of finding rhymes that span multiple words. Experimental results indicate that the system finds nearly all the intentional rhymes in a lyric, but fails in that it also finds a large number of unintentional rhymes. This paper describes the design of the system, describes the algorithms used in the system, gives experimental results, and presents an analysis of the system's successes and shortcomings and suggestions for future work.
Download the entire paper (969 KB PDF File)
Download the source code (1.4 MB Zip File)
The source code for this project, written in C#, is freely available (see Licensing below).
The easiest way to use this code is on Windows with Visual Studio.net 2003. Just unzip the code file and load the 595-project.sln solution file. Build and execute the RhymeFinderTest project to search for rhymes in a text file, or the RhymeTester project to see if two words rhyme.
If you don't have Visual Studio, you can still build the code. You need to obtain the free Microsoft .NET Framework SDK. Download and install the SDK, and start the SDK command line. In order to build the project, change to the folder where you unzipped the source code and type:
csc /out:RhymeFinder.exe Base\*.cs RhymeFinderTest\*.cs
or to build the Rhyme tester:
csc /out:Rhyme.exe Base\*.cs RhymeTester\*.cs
You may see some warnings during the build process - you may ignore these.
To use this project on Linux, you will need to obtain Mono, an open-source implementation of the .NET Framework. Mono 1.05 (the current version as of this writing) can be downloaded here (there is also a Windows version if you would rather not use the MS implementation). Once Mono is installed, you can build the project with the command:
mcs Base/*.cs RhymeFinderTest/*.cs -o RhymeFinder.exe
and execute it with:
mono RhymeFinder.exe
You can also build and execute the Rhyme tester - see the Windows instructions above.
The project is at a very early stage. It may crash and the algorithms are far from bulletproof. The program accepts a filename which must consist of space-seperated words and no punctuation. The source code comes with five sample files:
delight.txt
- Rapper's Delight by The Sugarhill Gangintergalactic.txt
- Intergalactic by The Beastie Boysitstricky.txt
- It's Tricky by Run-D.M.C.joyofyourworld.txt
- The Joy of your World by MC Paul Barmanremotecontrol.txt
- Remote Control by The Beastie BoysThe paper describing this work is available under the Creative Commons Share-Alike license. The code is available under the GPL. If you have any questions about licensing or appropriate use of these items, please constact the author (see below).
The author would like to thank the people at the CMU Pronouncing Dictionary for their invaluable tool. The author would also like to thank Nick Shawver for his consultations on the project.
This is an experimental student project and I am unable to provide detailed support for the source code; if you wish to modify or change it, or even if you're having trouble, you're pretty much on your own. However, I can answer simple question about the code - I can be contacted at bradleybuda {at} gmail {dot} com.
Copyright Bradley Buda, 2004