Raw harmonies

If you find transmediation and glitch art interesting and you haven’t read Masuma Ahuja and Denise Lu’s piece titled “What Paris look like with an echo” in the Washington post, I highly suggest you to do it. In there, Masuma and Denise add audio effects images like this one, an image of the Eiffel Tower with echo, to create several transmediated visuals.

Paris with an echo effect (source).

Moreover, the authors give one of the most comprehensive explanations I have been able to find about databending:

“All types of media — whether it’s an image, a video or even a sound file — contains raw data. This is encoded information which has data that your computer reads and translates into the appropriate format. For an image, this includes information about pixels and color, hue and contrast.

But what happens when you edit this information in an editor intended for another file type?”

“This practice — of manipulating data in an editor traditionally used to edit media of another format — is databending.” — Masuma Ahuja & Denise Lu [2014]

Because virtually any form of media can be subjected to the process of databending (image to sound, text to sound, text to image, etc.), the potential for fun is vast. In their article, Masuma and Denise affirm that an image opened in an audio editor produces “atonal white noise,” in fact, Denise published the audio file of the image via soundcloud. I put it here for you to listen:

Paris — Databending, by Denise Lu.

I found these sounds fascinating and beautiful. Maybe I am crazy but I definitely noticed tonal variations in Paris, enough to become convinced that there had to be common patterns between images and transmediated audio files. In fact, I convinced myself that such patterns would have to be stable enough for one to be able to “draw music” at a raw file level from an image editor. That’s exactly what Dr. Kedrick James and I tried to find out. In Raw Harmonies… (just accepted in Leonardo, MIT Press) Kedrick and I provide some insights on the common patterns between visuals and audio at raw level. These patterns relate to file size, number of channels, tone, rhythm and envelope. The next image is an early experiment of the application of such patterns:

Duotone, the image
“Duotone”. In Stereo, signed 8-bit, little-endian. Again, mind the volume!
“Duotone2”. Same raw file, different integers: unsigned 8-bit, little-endian.

The sample above is a 10 seconds stereo bichord of A and C during the first 5 seconds and A and D during the last 5. A is constant on the left channel and C and D change on the right one around second 5:00 (try using headphones or earbuds and comparing the sound on each side). The sample also has a (sort of) tremolo effect. The file was drawn entirely in Photoshop CS6, saved as .raw, and imported into Audacity. Granted, it is not the prettiest thing you’ve ever heard, but it serves its purpose. Now, let me break it down for you:


Each audio sample is translated as a pixel. One second of audio at 44.1 Mhz can be translated into a square of 210 × 210 pixels. So, to produce a 10 seconds stereo audio file at 44.1 Mhz, we multiplied 44,100 ×10, and then distributed the resulting 441,000 in a bidimensional frame (as I did for the example above). The easiest way (although is not exact) is to get the square root of 441,000, resulting in a square image of 664 by 664 pixels in 8-bit (again, the result is not exact). If you want an exact proportion, you can use a factoring calculator. For example, for 441,000 samples, 600 × 735 or 630 × 700 would work way better.


There are very specific patterns in the translation between image and audio channels, and one has to be careful with them. The easiest way to produce a stereo file from an image editor is to work each channel separately and then combine them in a duotone image. Not an RGB image that resembles a duotone, but an actual one, an image with only two channels. It is possible to generate a stereo file from an RGB, of course, but when translating a 3 channels image to a 2 channels audio track, the tone will be affected. Think about the fact that the data is just redistributed from 3 channels to two, if one calculates the tone based on three channels with a value of 1, that value will become 1.5 when you reduce the channels to two.

We call this “the principle of data conservation”…

To prevent any change in the tone, you can alternatively open the image in the audio editor as a 3 channels audio file. This will produce a left, right and mono channel, but much of that information will be lost when you render the audio file into stereo. For this example, we produced a duotone of cyan and magenta.


This is my favourite pattern. When the samples in an audio file are transformed into pixels, the image editor arranges it from left to right, top to bottom. Tones are produced by drawing arrays of dark and light pixels in a similar way in which sound pressure is usually represented:

Representation of sound pressure and sound waves. (Source)

The number of pixels in these arrays can be calculated by dividing the sampling frequency by tonal frequency of each note. For example, if the sampling frequency is 44.1 MHz and you want to draw A4, you should divide 44100 by 440. Because there is no way to divide a pixel, the resulting 100.22 have to be round down to 100. An A4 square wave would be drawn by arraying 100 dark pixels (over 50% black) and then 100 light pixels (under 50% black), a sine wave will be a gradient from grey to black to grey to white to grey in 200 pixels. The volume of the sample will be determined by the contrast between the positive and the negative half cycles of the wave. To draw the sample above, we created a pattern of 100 dark pixels and 100 light pixels (a square wave) over one channel with enough slight variations to create the wavey effect. For the other channel, we performed the same calculations to determine the distribution of pixels in C5 (44,100 ÷523.25 = 84[.28]) and D4 (44,100 ÷ 587.33 = 75[.08]).

These are not the only patterns but are certainly the most relevant ones or at least the ones that would allow anyone to start experimenting with transmediation through raw data. By the way, you don’t have to use Photoshop, Gimp works as well (in many ways even better), but it has a different raw data format (.data as opposed to .raw) which adds an extra step between the image editor and the audio editor.

Now what?

Well, we’ll see… probably we will try to produce a few pieces for exhibition or prepare a workshop, we don’t know. For the time being, we are having a blast. We want to see how far we can get with this and we hope that other people would start experimenting with this form of transmediation. The paper will be formally published in 2020 (maybe) but the first online version is available here. If you are interested in other possible implications and uses of this kind of approach, you should take a look at Pixualization…



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store