image2wav.py - what is it?

This program will generate a mono 16 bit wav file from an image file. The audio, when viewed in a spectrograph, will contain a resemblance to the source image.

This was done after reading about Aphex Twin placing images in his music this way, and I thought software to do this task should not be difficult to write, given access to suitable image handling and Fourier transform libraries.

license

The program is copyright duncan@linuxbandwagon.com and released under the GPL 'as is'.

The program was developed with Python 1.5.2 on RedHat Linux 7.3. It should work on any system the Python language runs on, although the .wav output stuff has not been checked for endian issues.

Update: 17/10/2004 : After this project sitting idle for ages I have just tweaked it so it works with newer Python and Python Imaging Library.

Update: 16/6/2014 : Source code now placed in GITHUB

requirements

The Fast Fourier transform functions I used are included in the program, but came from Recipes In Python

To install Python Imaging Library on Debian, do:
apt-get install python-imaging

download

examples of usage

original image

Example input image, in this case Jintara Punlarp, superstar of the Mor Lam music from northern Thailand.

Run the command:

image2wav.py jintara.png
which will create the .wav file jintara.png.wav.

If we get the spectrogram of this audio file, using the spectromatic program in this case:

spectromatic --step 1024 --logarithmic  --window 1024 --inverse jintara.png.wav
Produces the the following output spectrogram (the commandline parameters were selected to give a good aspect ratio):

converted back image

effects of mp3

Interestingly, converting the above .wav file to mp3 using the LAME encoder, then getting the spectrogram reveals:

lame  jintara.png.wav LAME version 3.96 MMX  (http://www.mp3dev.org/)
CPU features: MMX (ASM used), 3DNow! (ASM used), SSE
Using polyphase lowpass filter, transition band: 16538 Hz - 17071 Hz
Encoding jintara.png.wav to jintara.png.wav.mp3
Encoding as 44.1 kHz  64 kbps single-ch MPEG-1 Layer III (11x) qval=3
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
   371/373    (99%)|    0:00/    0:00|    0:00/    0:00|   13.650x|    0:00
average:  64.0 kbps

Writing LAME Tag...done
ReplayGain: +17.6dB
spectromatic --step 1024 --logarithmic  --window 1024 --inverse jintara.png.wav
samples = 428032, step_size = 1024, window_size = 1024

converted to mp3 and back

spectromatic copy

As the spectromatic home page appears to be not available at the moment here is a local copy of this most handy program: