Realtime Modification Of Sound Data In The Frequency Domain
Situation
One cannot help but notice that the vast majority of interesting effects which one can perform upon audio occur within the frequency domain. There are currently a great many pieces of hardware and some pieces of software which perform audio functions in the frequency domain, but few which operate in realtime. Additionally, the quantity of open source devoted to realtime audio processing is slim. It would seem to be of great value to the computing community at large to develop an architecture, preferably platform-independent, which was capable of converting waveform audio to frequency data, manipulating, transforming, analyzing, and mixing this data, and producing waveform audio output all in realtime, with minimal latency.
Utility
Strategic Use #1: Mixing MP3 streams
Coupled with an MP3 decompression engine that could take snippets of MP3s and convert them into the source frequency data, this system could allow easily for equalization and pitch-shifting to be applied to individual streams, mixing multiple streams together with global filters and equalization applied, and then outputting an audio stream to the sound card. Such a system could feasibly form the basis of an "MP3 DJ" project, and would prove much more efficient and clean than converting the MP3 data to waveform data, converting back into frequency data for channel equalization, back into waveform data, mixing, back into frequency data for global equalization, and then finally back again into waveform output for the sound card.
Strategic Use #2: Speaker Identification / Voice Recognition
Once given access to the raw, incoming frequency data, it might be feasible to begin to base a speaker identification and/or voice recognition system upon this base toolkit. Simply picking out the top 3 frequency peaks (or formants) after applying the vocal chord filter would allow one access to realtime speech data for processing, or even manipulation. (Want to sound like a woman? a man? Something really weird?)
Strategic Use #3: Analog -> MIDI conversions (singing pianos)
Dominant frequencies in the incoming data could be mapped to their closest musical notes. One signal could be sent when sufficient energy was found at a given frequency (e.g., F#) and another when that energy was no longer present. One possible representation for these signals could be over a MIDI cable, allowing the human voice (or anything, for that matter) to act as a MIDI controller. Additionally, intelligent programs could potentially musically analyze the performance and play along with appropriate accompaniments, allowing the computer to assist the performer in (hopefully!) pleasant and customizable ways. This system would also allow for the trivial transcription of sung music, or perhaps eventually simply performed musical scores.
Plan
For my Music 320 project, I would like to implement the beginning of such an enabling architecture, and I aspire to produce a working demonstration of such a system modifying realtime data on either a Linux box or a Windows box by the end of the quarter. My eventual hopes are to create an official OpenSource™ project of it and release it into the public domain. I will also try to integrate it into MPlayers custom system as part of a larger realtime Internet-based collaborative-music project.
This plan created on October 12, 1998 by David Weekly.