D3, written by Michael Bostock, with Vadim Ogievetsky and Jeffrey Heer at the Stanford Vizualization Group (now the University of Washington Interactive Data Lab is a visualization library now widely used as the basis for many of the most powerful online visualizations. According to Heer, it is intentionally designed as a low-level system.
During the early design of D3, we even referred to it as a “visualization kernel” rather than a “toolkit” or “framework.” In addition to custom design, D3 is intended as a supporting layer for higher-level visualization tools.1 — Jeff Heer
There are numerous higher-level (i.e. easier, more conceptual) third-party tools built on top of D3, including pretty sophisticated high-level GUI’s like Plot.ly, simpler drag-and-drop editors like Raw and tools leveraging D3 in computing environments like R and Python.
But the creators of D3 have built their own in-house “stack” of increasingly higher-level languages and tools, built on D3, known as the “Vega” family of tools.
low level: Data Visualization “Kernel”
low-mid level: Visualization Grammar
Vega by the creators of D3 (Interactive Data Lab). A visualization grammar. User language: JSON. User writes a “visualization specification” as a JSON file. The Vega renderer draws draws the visualization from the spec in SVG or HTML Canvas in the browser. Online editor: http://vega.github.io/vega-editor/
Vega-lite by the creators of D3. This simplified version of Vega serves as a visual analysis grammar. User writes a “visualization specification” with minimal styling options as a JSON file. Vega-lite then generates a full Vega spec. Online editor: https://vega.github.io/vega-editor/?mode=vega-lite
High level data exploration (GUI)
Voyager built on top of Vega-lite. This web-based GUI automatically builds a set of recommended visualizations from your dataset. Use Voyager to explore your data and possible visualizations.
Pole✭ (Polestar), Voyager and Lyra are not yet bug free and ready for prime time, but they represent important pieces of the design stack. Voyager is a visualization recommendation engine that helps organize data exploration, saving loads of time in the process. Polestar is a welcome edition to the list of drag-and-drop visualization builders.
Lyra, in particular, represents an important category of high level design tool that just hasn’t existed yet - a full GUI design environment for data-driven graphics. It still has a long way to go to become the Illustrator of data graphics, but it’s heading in that direction. Once Lyra (or something like it) reaches maturity, then we will see another surge in the prevelance of high quality data visualizations that parallels the surge that the introduction of D3 brought us.
Vega is well documented on the wiki. Mostly for those like me who want a birds-eye view of Vega, I’ve assembled a Vega 2.2 cheat sheet, which you can read in my next post.
Back in 2013, in honor of the 100th birthday of John Cage, the Forum of Contemporary Music Leipzig [FZML] asked me and 124 other composers to collectively write an exquisite corpse composition based on an idea of John Cage’s. Each of us wrote one page of manuscript, passing on only the last bar to the next composer. The sequence of composers was chosen by a random process involving tossing coins and consulting the I Ching, one of Cage’s favorite tools.
125 Party Pieces was premiered in New York by Ensemble Either/Or in October 2013 for the finale of the international Cage 100 Festival. The 125 manuscripts were exhibited at Galerie für Zeitgenössische Kunst, Leipzig, Germany in August-Spetember 2013.
Party Pieces will get its European premier in Leipzig on January 20, 2016 (tickets).
Perceptually Uniform Pitch-Loudness Scales for Data Sonification
Musical scales are based on perceptual pitch intervals corresponding to frequency ratios, not differences. (e.g., a ratio of 2.0 gives the pitch interval called the “octave.“) The deciBell scale (dB) is a log scale for sound intensity (or pressure), where a ratio of 2.0 = 6dB.
One of the challenges in producing a good visualization is the construction of appropriate visual scales that do not bias the interpretation of the data. When constructing color scales, for example, it’s important to base them on how we perceive the colors, not simply on raw RGB intensity levels or raw hue values. (See my previous post, as well as Gregor Aisch’s article on the subject.)
An analogous problem in sonification is the construction of perceptually linear frequency and intensity scales. Directly mapping data to frequency and intensity (sound pressure level) produces misleading results. To begin with, frequency perception is logarithmic and best represented as pitch (frequency on a log scale).1 But not only is intensity perception logarithmic (hence the log deciBell scale), it is also highly frequency dependent. Consider this map of perceived loudness as a function of frequency and intensity (data from the current ISO standard, derived from multiple empirical studies):
Phon levels are perceptually equal loudness levels across frequencies. Phon differences represent perceptually equal loudness differences. At 1000 Hz, the phon and dB scales are identical. For a 100Hz tone and a 1000Hz tone to sound at equal loudness, the 100 Hz tone must have a much higher SPL, since the human ear is much less sensitive to low frequencies.
Figure 1. ISO226:2003 Equal loudness contours. Produced with Matlab and Plotly.
Take a hearing test like the ones used to collect data for those phon curves.
Just as in visual perception, perceived brightness is dependent on frequency (hue) and not just light intensity, in auditory perception, perceived loudness is also highly dependent on frequency (and not just sound intensity). That means that if we simply construct a linear pitch scale at a constant intensity (or sound pressure, or “gain” in digital audio terms), then some pitches will sound much louder than others, making them sound “more important” and biasing the interpretation of the data. So, we need to construct a perceptually uniform “pitch-loudness space” analogous to a perceptually uniform “color space” (such as CIELAB).
This 2D pitch-loudness space based on ISO226:2003 shows equal perceived differences in pitch on the X axis (as log Frequency), and equal perceived differences in loudness (phons) on the Y axis. The colored contours show the SPL required to produce a desired loudness (phon) level at that frequency.
Figure 2. SPL contours for given frequency and loudness level, derived from ISO226:2003. Produced with Matlab.
Let’s hear that. First, here’s a sine tone sweeping from low to high across the audible frequency range, at a single Sound Pressure Level (dB). This sound would be a horizontal line on the first graph, and a contour line on the second graph. Notice how the sound is hard to hear at first (in the low range) and at the end (high range).
Listen with decent headphones in a quiet room for best results. Turn the volume up to comfortably loud level.
31.5 Hz to 12,500 Hz, equal SPL:
Next, the same sweep, but with the SPL constantly following the 80 phon “equal loudness contour.” That’s a contour line on the first graph, and a horizontal line on the second graph.
31.5 Hz to 12,500 Hz at equal loudness (using 80 phon equal loudness contour):
Notice the low and high sounds are easier to hear. If you listen on good “reference” speakers in a recording studio environment, the sine tone would sound equally loud across the whole range.
Using that second graph of pitch-loudness space, you can now imagine constructing more complex scales, such as a diagonal line that goes from low+loud to high+soft, or divergent scales analogous to these divergent color scales (code by Gregor Aisch). (We might want to take into account the Sone scale of loudness to construct certain scales in 2D pitch-loudness space, but that’s a topic for another day…)
Note: All of this is only valid for sounds that are “pure tones” (i.e. sine tones). If we want to construct objective, linear scales for sonification using complex tones we will need a higher-dimensional timbre-space, and one that is perceptually uniform—something like this (Hiroko et al. “Perceptual distance in timbre space”) or this (Hoffman et al. “Feature-Based Synthesis for Sonification and Psychoacoustic Research”). But that’s a topic for yet another day.
The ISO226 standard (2003 revision) represents the current most widely accepted representation of perceptually uniform pitch-loudness space (for sine tones). This is a commercial standard, and you need to purchase it in order to read the complete document. Here are a couple of tools that implement it:
There is a good Matlab function called iso226(), written by Christopher Hummersone that outputs SPL in dB for given frequency and phon level(s). The function implements the mathematics in the ISO226:2003 and uses Matlab’s shape preserving pchip method for interpolation.
For my own purposes, I wrote an abstraction for Cycling74′s Max environment called can.phon2dB that does the same thing, using a pre-calculated lookup table. The lookup table is in in cycling74′s “jitter matrix” format and is well suited for real-time applications. (I’m using it to build a sonification of microbial populations—more on that in a later post…)
The .jxf.jit file must be in your search path. This lookup table has a resolution of 1 Hz and 0.25 phon, which is about the same as or less than the “just-noticeable difference” for pitch and loudness, respectively.
The Mathematics: Formula for deriving SPL from Phons
Read on if you’re interested in the math involved in constructing scales from the equal loudness curves, or on practical tools for sonification. The formula is taken from ISO226.
The sound pressure level Lp of a sine tone at frequency f at perceived loudness level LN is:
Tf is the threshold of hearing for frequency f, in dB;
αf is the exponent for loudness perception at frequency f;
LU is a magnitude of the linear transfer function normalized at 1000 Hz;
The values of Tf, αf and LU are given in the following table:
Loudness perception exponent αf
Transfer function magnitude LU
Threshold of hearing Tf
To produce smooth curves, we must interpolate between the 29 data points. The graphs and software tools in this post all interpolate using Matlab’s custom built-in pchip method, which is a shape preserving PCHIP (Piecemeal Cubic Hermite Interpolating Polynomial). It is similar to using bezier curve interpolation, but with a tighter fitting curve that does not “overshoot” actual data points, so that the curves shape is preserved.
The ISO226 specification also provides the formula for deriving Phons from SPL, which uses the same tables for Tf, αf and LU.
There is another, empirically derived scale for frequency perception called the Mel scale, which in some contexts can be more appropriate, but the ordinary logarithmic frequency scale is probably best in most cases. The Mel scale seems to capture the way we perceive frequency best in the absence of any “musical” context. But if we listen to a linear scales constructed from small equal steps, (where we can easily compare each interval with the adjacent intervals) then scales based on the simple mathematical definition of pitch (equal frequency ratios = equal perceptual difference) produce more even results than ones based on the Mel scale, especially for anyone with musical experience. Listen to an example of the Mel scale here if you’d like to judge for yourself: http://www.sfu.ca/sonic-studio/handbook/Mel.html↩
Here is an exquisite vizualization of ocean surface waves by Cameron Beccario @cambecc. The animation below is just one visualization possible with his tool, Earth. Visit earth.nullschool.net to play with wind, current and temperature data. Scroll down to read more about Earth.
The color scale shows the peak wave period (time between crests) for ocean swells. Longest period waves (up to 25 seconds) are brighter cyan, and chopy, short period waves approach black. The animation on top is a representation of the surface wave “vector field,” and not an animation of actual surface waves.
Wave conditions (updated every 3 hours). Scroll down to see more.
Earth: an Excellent Example of Good Design
Earth is an excellent example of how good design can produce visualizations that engage a public audience and do a better job of representing data. A good data visualization begins by formulating the task as a Design Problem. The design problem that Earth sets out to solve might be worded like this:
“Represent near real-time global weather forecast in an accurate yet holistic way that captures geographical context.”
You might add “and is visually successful as fine art,” since one of Baccario’s intended uses of Earth is to generate fine art prints and other materials for exhibition. But a really good design can produce museum quality work in any case, even if it has a functional purpose.
Beccario decided that what was required was not a single visualization but a tool to generate visualizations of a choice of datasets from a variety of perspectives. The design of the resulting tool, Earth, flows from that requirement. It uses a visually efficient, limited, and unified design language and intuitive interface to allow users to display a very large range of possible visualizations.
One small component of that design language is the carefully constructed set of color scales. Color scales are often an afterthought in scientific visualizations, often relying on default “rainbow” scales. These naïve color scales can cause intelligibility problems and bias the interpretation of the data. For example, the colors in the standard rainbow scale are not of equal perceived brightness; therefore some bands of information can be perceived as more important than they are, and we can perceive boundaries between color bands that are not necessarily there in the data.1 Beccario designed more perceptually relevant color scales with the help of the online tool ColorBrewer. (There are many other similar tools out there. See also New York Times’ designer Gregor Aisch’s excellent article on color scales.) Beccario’s color scale for wave periods is a perceptually even brightness gradient of a single hue (cyan), so any boundaries we see in the map are real boundaries, an not artifacts of the color scale.
Beccario's color scale for wave period is a perceptually linear brightness gradient of a single hue.
I encourage everyone, including scientists, to explore Cameron Beccario’s tool Earth at their leisure, to discover the difference good design can make in data visualization.
Watch Beccario talk about his design process for Earth at the Graphical Web 2014 conference:
-Apr 20: Rig explodes -Apr 22: Rig sinks -Jul 15: Well capped -Sep 19: Well sealed
5 years ago, in 2010, the Deepwater Horizon drilling platform in the Gulf of Mexico exploded when a pulse of high-pressure methane gas from the 1500m deep Macondo wellhead expanded into the drilling riser and rose into the drilling rig. The resulting oil spill was the largest accidental marine oil spill in history.
The quantity of oil released — 4.9 million barrels (206 million gallons)1 — is difficult to grasp. To try to wrap my head around what that volume of oil looks like, I first represented it as an oil storage tank roughly 27m (85ft) in diameter like these…
View of Chevron crude oil depot, Richmond CA. Josh Cassidy/KQED. (Appologies to Chevron --- The Gulf spill was BP's accident. Chevron's tanks are only used in this article to show scale.)
…but 1500m high (nearly a mile). That’s the depth of the wellhead. There is no oil tanker in existence that could contain all that oil, though the now scrapped super tanker “Seawise Giant” could have held most of it.
Deepwater Horizon oil slick, in a May 2010 NASA image
Of course, the spill didn’t look like that. About half of the oil spread accross the surface as a slick, and half spread out in a deepwater plume2, never reaching the surface but impacting the water column and ocean bottom. However, an intuitive feel for the quantities can us help to begin to understand the impact of the spill.
Natural oil and gas seeps in the Gulf
ECOGIG studies the effect of natural hydrocabon seeps on the Gulf ecosystem, and compares them with the effects of large accidental releases.
The Gulf of Mexico is well known as an oil reservoir, and like other ocean oil reservoirs, the Gulf features thousands of naturally occuring seeps that release small quanitites of oil and gas into the ocean on a continual basis. One might think that since the Gulf ecosystem has evolved to cope with these natural hydrocarbon releases, the impact of oil spills would be small (or at least reduced). But to understand the relationship between natural and accidental hydrocarbon releases in the Gulf, we need to begin by inderstanding the differences in scale.
1 barrel (42 gallons)
A very active natural oil seep in the Gulf of Mexico can put out around 1 barrel of oil per day and looks like this:
Surface sheen above a natural oil seep in the Gulf of Mexico. Photo by Beth Orcutt.
Natural oil seep in the Gulf of Mexico. Photo ECOGIG
There are something like 20,000 seeps in the Gulf but most put out much less that one barrel per day. The total daily output of all the seeps in the Gulf is not well constrained, but a “reasonable” estimate based on extrapolation from looking at small areas would be from 2500 to 10,000 barrels per day.34
The large tanks in the background at left, some of the largest ever built, are 88m in diameter and hold 750,000 barrels each. A 747 could park comfortably inside.
One day's output from all 20,000 natural oil seeps in the entire Gulf of Mexico (2500 to 10,000 barrels) would fill a hypothetical tank 19.6m (64 ft) high and 5 to 10 meters (16 to 32 ft) in diameter. *The large tanks in the background hold 750,000 barrels and would comfortably fit a 747 inside.
By contrast, the Macondo wellhead spilled from 57,000 to 70,000 barrels per day from a single point source. It would take from 1 to 4 weeks for the output of all the natural seeps in the gulf to equal one day’s output from the macondo wellhead.
Or you could flood the passenger and cargo holds of 8.75 to 10.75 Boing 747s
Macondo wellhead spewing oil and gas. Photo credit: U.S. Geological Survey
One day's output from the Macondo wellhead blowout would have filled a hypothetical tank the same height (19.6m or 64 ft) but 24 to 27 meters (80 to 88 ft) in diameter.
1 day’s output would fill a giant 88m diameter storage tank in 11-13 days. The wellhead discharged oil continually for 86 days, outputting 4.9 Million Barrels. It would take all the storage tanks in the field below to contain that oil:
Section of the Chevron Richmond Refinery. These tanks are used to store crude oil offloaded from oil tankers. the tanks on the right are the ersatz 747 hangars.
Or, one might construct a single storage tank 6 stories tall and nearly 2 1/2 football fields in diameter.
This hypothetical storage tank, too large to ever be built, is 19.6m high and 225m in diameter.
To sum up the comparison of natural vs. accidental oil releases in the Gulf of Mexico, let’s look at those quantities side by side.
You can read more about the Deepwater Horizon spill and its effects on the Gulf ecosystem at ecogig.org.
If you’d like to explore the visualizations in this article yourself in Google Earth, download the KMZ files below. Double click on the .kmz files and they will open in Google Earth.