Perceptually Uniform Pitch-Loudness Scales for Data Sonification
Musical scales are based on perceptual pitch intervals corresponding to frequency ratios, not differences. (e.g., a ratio of 2.0 gives the pitch interval called the “octave.“) The deciBell scale (dB) is a log scale for sound intensity (or pressure), where a ratio of 2.0 = 6dB.
One of the challenges in producing a good visualization is the construction of appropriate visual scales that do not bias the interpretation of the data. When constructing color scales, for example, it’s important to base them on how we perceive the colors, not simply on raw RGB intensity levels or raw hue values. (See my previous post, as well as Gregor Aisch’s article on the subject.)
An analogous problem in sonification is the construction of perceptually linear frequency and intensity scales. Directly mapping data to frequency and intensity (sound pressure level) produces misleading results. To begin with, frequency perception is logarithmic and best represented as pitch (frequency on a log scale).1 But not only is intensity perception logarithmic (hence the log deciBell scale), it is also highly frequency dependent. Consider this map of perceived loudness as a function of frequency and intensity (data from the current ISO standard, derived from multiple empirical studies):
Phon levels are perceptually equal loudness levels across frequencies. Phon differences represent perceptually equal loudness differences. At 1000 Hz, the phon and dB scales are identical. For a 100Hz tone and a 1000Hz tone to sound at equal loudness, the 100 Hz tone must have a much higher SPL, since the human ear is much less sensitive to low frequencies.
Figure 1. ISO226:2003 Equal loudness contours. Produced with Matlab and Plotly.
Take a hearing test like the ones used to collect data for those phon curves.
Just as in visual perception, perceived brightness is dependent on frequency (hue) and not just light intensity, in auditory perception, perceived loudness is also highly dependent on frequency (and not just sound intensity). That means that if we simply construct a linear pitch scale at a constant intensity (or sound pressure, or “gain” in digital audio terms), then some pitches will sound much louder than others, making them sound “more important” and biasing the interpretation of the data. So, we need to construct a perceptually uniform “pitch-loudness space” analogous to a perceptually uniform “color space” (such as CIELAB).
This 2D pitch-loudness space based on ISO226:2003 shows equal perceived differences in pitch on the X axis (as log Frequency), and equal perceived differences in loudness (phons) on the Y axis. The colored contours show the SPL required to produce a desired loudness (phon) level at that frequency.
Let’s hear that. First, here’s a sine tone sweeping from low to high across the audible frequency range, at a single Sound Pressure Level (dB). This sound would be a horizontal line on the first graph, and a contour line on the second graph. Notice how the sound is hard to hear at first (in the low range) and at the end (high range).
Listen with decent headphones in a quiet room for best results. Turn the volume up to comfortably loud level.
31.5 Hz to 12,500 Hz, equal SPL:
31.5 Hz to 12,500 Hz at equal loudness
(using 80 phon equal loudness contour):
Notice the low and high sounds are easier to hear. If you listen on good “reference” speakers in a recording studio environment, the sine tone would sound equally loud across the whole range.
Using that second graph of pitch-loudness space, you can now imagine constructing more complex scales, such as a diagonal line that goes from low+loud to high+soft, or divergent scales analogous to these divergent color scales (code by Gregor Aisch). (We might want to take into account the Sone scale of loudness to construct certain scales in 2D pitch-loudness space, but that’s a topic for another day…)
Note: All of this is only valid for sounds that are “pure tones” (i.e. sine tones). If we want to construct objective, linear scales for sonification using complex tones we will need a higher-dimensional timbre-space, and one that is perceptually uniform—something like this (Hiroko et al. “Perceptual distance in timbre space”) or this (Hoffman et al. “Feature-Based Synthesis for Sonification and Psychoacoustic Research”). But that’s a topic for yet another day.
The ISO226 standard (2003 revision) represents the current most widely accepted representation of perceptually uniform pitch-loudness space (for sine tones). This is a commercial standard, and you need to purchase it in order to read the complete document. Here are a couple of tools that implement it:
There is a good Matlab function called
iso226(), written by Christopher Hummersone that outputs SPL in dB for given frequency and phon level(s). The function implements the mathematics in the ISO226:2003 and uses Matlab’s shape preserving
pchipmethod for interpolation.
For my own purposes, I wrote an abstraction for Cycling74′s Max environment called
can.phon2dBthat does the same thing, using a pre-calculated lookup table. The lookup table is in in cycling74′s “jitter matrix” format and is well suited for real-time applications. (I’m using it to build a sonification of microbial populations—more on that in a later post…)
can.phon2dB.zip — The zip file contains 3 files:
- can.phon2dB.maxpat - the abstraction
- can.phon2dB.maxhelp - the help patch
- iso226-2003.jxf.jit - the matrix file (64MB)
The .jxf.jit file must be in your search path. This lookup table has a resolution of 1 Hz and 0.25 phon, which is about the same as or less than the “just-noticeable difference” for pitch and loudness, respectively.
The Mathematics: Formula for deriving SPL from Phons
Read on if you’re interested in the math involved in constructing scales from the equal loudness curves, or on practical tools for sonification. The formula is taken from ISO226.
The sound pressure level Lp of a sine tone at frequency f at perceived loudness level LN is:
Lp=(αf10⋅log10Af) dB −LU+94 dB
- Tf is the threshold of hearing for frequency f, in dB;
- αf is the exponent for loudness perception at frequency f;
- LU is a magnitude of the linear transfer function normalized at 1000 Hz;
The values of Tf, αf and LU are given in the following table:
|Frequency f||Loudness perception exponent αf||Transfer function magnitude LU||Threshold of hearing Tf|
|20 Hz||0.532||-31.6 dB||78.5 dB|
|25 Hz||0.506||-27.2 dB||68.7 dB|
|31.5 Hz||0.480||-23.0 dB||59.5 dB|
|40 Hz||0.455||-19.1 dB||51.1 dB|
|50 Hz||0.432||-15.9 dB||44.0 dB|
|63 Hz||0.409||-13.0 dB||37.5 dB|
|80 Hz||0.387||-10.3 dB||31.5 dB|
|100 Hz||0.367||-8.1 dB||26.5 dB|
|125 Hz||0.349||-6.2 dB||22.1 dB|
|160 Hz||0.330||-4.5 dB||17.9 dB|
|200 Hz||0.315||-3.1 dB||14.4 dB|
|250 Hz||0.301||-2.0 dB||11.4 dB|
|315 Hz||0.288||-1.1 dB||8.6 dB|
|400 Hz||0.276||-0.4 dB||6.2 dB|
|500 Hz||0.267||0.0 dB||4.4 dB|
|630 Hz||0.259||0.3 dB||3.0 dB|
|800 Hz||0.253||0.5 dB||2.2 dB|
|1000 Hz||0.250||0.0 dB||2.4 dB|
|1250 Hz||0.246||-2.7 dB||3.5 dB|
|1600 Hz||0.244||-4.1 dB||1.7 dB|
|2000 Hz||0.243||-1.0 dB||-1.3 dB|
|2500 Hz||0.243||1.7dB||-4.2 dB|
|3150 Hz||0.243||2.5 dB||-6.0 dB|
|4000 Hz||0.242||1.2 dB||-5.4 dB|
|5000 Hz||0.242||-2.1 dB||-1.5 dB|
|6300 Hz||0.245||-7.1 dB||6.0 dB|
|8000 Hz||0.254||-11.2 dB||12.6 dB|
|10,000 Hz||0.271||-10.7 dB||13.9 dB|
|12,500 Hz||0.301||-3.1 dB||12.3 dB|
To produce smooth curves, we must interpolate between the 29 data points. The graphs and software tools in this post all interpolate using Matlab’s custom built-in
pchip method, which is a shape preserving PCHIP (Piecemeal Cubic Hermite Interpolating Polynomial). It is similar to using bezier curve interpolation, but with a tighter fitting curve that does not “overshoot” actual data points, so that the curves shape is preserved.
The ISO226 specification also provides the formula for deriving Phons from SPL, which uses the same tables for Tf, αf and LU.
There is another, empirically derived scale for frequency perception called the Mel scale, which in some contexts can be more appropriate, but the ordinary logarithmic frequency scale is probably best in most cases. The Mel scale seems to capture the way we perceive frequency best in the absence of any “musical” context. But if we listen to a linear scales constructed from small equal steps, (where we can easily compare each interval with the adjacent intervals) then scales based on the simple mathematical definition of pitch (equal frequency ratios = equal perceptual difference) produce more even results than ones based on the Mel scale, especially for anyone with musical experience. Listen to an example of the Mel scale here if you’d like to judge for yourself: http://www.sfu.ca/sonic-studio/handbook/Mel.html↩