February 23, 2016

The D3 / Vega stack”

D3, written by Michael Bostock, with Vadim Ogievetsky and Jeffrey Heer at the Stanford Vizualization Group (now the University of Washington Interactive Data Lab is a visualization library now widely used as the basis for many of the most powerful online visualizations. According to Heer, it is intentionally designed as a low-level system.

During the early design of D3, we even referred to it as a visualization kernel” rather than a toolkit” or framework.” In addition to custom design, D3 is intended as a supporting layer for higher-level visualization tools.1 — Jeff Heer

There are numerous higher-level (i.e. easier, more conceptual) third-party tools built on top of D3, including pretty sophisticated high-level GUIs like Plot.ly, simpler drag-and-drop editors like Raw and tools leveraging D3 in computing environments like R and Python.

But the creators of D3 have built their own in-house stack” of increasingly higher-level languages and tools, built on D3, known as the Vega” family of tools.

low level: Data Visualization Kernel”

D3 (“Data Driven Documents”) - a JavaScript library for manipulating documents based on data. Users write JavaScript code to produce SVG graphics driven by data.

low-mid level: Visualization Grammar

Vega by the creators of D3 (Interactive Data Lab). A visualization grammar. User language: JSON. User writes a visualization specification” as a JSON file. The Vega renderer draws draws the visualization from the spec in SVG or HTML Canvas in the browser. Online editor: http://vega.github.io/vega-editor/

(You can run Vega locally with Vegaserver)

mid level: Visual Analysis Grammar

Vega-lite by the creators of D3. This simplified version of Vega serves as a visual analysis grammar. User writes a visualization specification” with minimal styling options as a JSON file. Vega-lite then generates a full Vega spec. Online editor: https://vega.github.io/vega-editor/?mode=vega-lite

High level data exploration (GUI)

Voyager built on top of Vega-lite. This web-based GUI automatically builds a set of recommended visualizations from your dataset. Use Voyager to explore your data and possible visualizations.

Pole✭ (Polestar) built on top of Vega-lite. Web-based Drag-and-drop GUI for quickly building visualizations from your dataset. Online editor: http://uwdata.github.io/polestar/#
Compare to Tableau

Full GUIs (no coding)

Lyra A full GUI design environment built on Vega. Currently in Beta, and buggy. Try it here
See also: http://idl.cs.washington.edu/projects/lyra/


State of the Stack

Pole✭ (Polestar), Voyager and Lyra are not yet bug free and ready for prime time, but they represent important pieces of the design stack. Voyager is a visualization recommendation engine that helps organize data exploration, saving loads of time in the process. Polestar is a welcome edition to the list of drag-and-drop visualization builders.

Lyra, in particular, represents an important category of high level design tool that just hasn’t existed yet - a full GUI design environment for data-driven graphics. It still has a long way to go to become the Illustrator of data graphics, but it’s heading in that direction. Once Lyra (or something like it) reaches maturity, then we will see another surge in the prevelance of high quality data visualizations that parallels the surge that the introduction of D3 brought us.

The basis for these advances is Vega, the declarative visualization grammar. Compared to coding in D3, Vega can be learned by non-coders quite easily. No background in JavaScript, or any programming language, is needed. At the same time, Vega is rich enough to be expressive, and not just easy to use.

Vega is well documented on the wiki. Mostly for those like me who want a birds-eye view of Vega, I’ve assembled a Vega 2.2 cheat sheet, which you can read in my next post.


  1. Jeff Heer : https://github.com/vega/vega/wiki/Vega-and-D3, edited May 1, 2014, accessed Aug 26, 2015

November 9, 2015

European Premier of Party Pieces

Back in 2013, in honor of the 100th birthday of John Cage, the Forum of Contemporary Music Leipzig [FZML] asked me and 124 other composers to collectively write an exquisite corpse composition based on an idea of John Cage’s. Each of us wrote one page of manuscript, passing on only the last bar to the next composer. The sequence of composers was chosen by a random process involving tossing coins and consulting the I Ching, one of Cage’s favorite tools.


125 Party Pieces was premiered in New York by Ensemble Either/Or in October 2013 for the finale of the international Cage 100 Festival. The 125 manuscripts were exhibited at Galerie für Zeitgenössische Kunst, Leipzig, Germany in August-Spetember 2013.

Party Pieces will get its European premier in Leipzig on January 20, 2016 (tickets).

Thanks to Pierre Boulez for his patronage of this project and the German Federal Cultural Foundation for the funding.

September 28, 2015

Perceptually Uniform Pitch-Loudness Scales for Data Sonification

Musical scales are based on perceptual pitch intervals corresponding to frequency ratios, not differences. (e.g., a ratio of 2.0 gives the pitch interval called the octave.“) The deciBell scale (dB) is a log scale for sound intensity (or pressure), where a ratio of 2.0 = 6dB.

One of the challenges in producing a good visualization is the construction of appropriate visual scales that do not bias the interpretation of the data. When constructing color scales, for example, it’s important to base them on how we perceive the colors, not simply on raw RGB intensity levels or raw hue values. (See my previous post, as well as Gregor Aisch’s article on the subject.)

An analogous problem in sonification is the construction of perceptually linear frequency and intensity scales. Directly mapping data to frequency and intensity (sound pressure level) produces misleading results. To begin with, frequency perception is logarithmic and best represented as pitch (frequency on a log scale).1 But not only is intensity perception logarithmic (hence the log deciBell scale), it is also highly frequency dependent. Consider this map of perceived loudness as a function of frequency and intensity (data from the current ISO standard, derived from multiple empirical studies):

Phon levels are perceptually equal loudness levels across frequencies. Phon differences represent perceptually equal loudness differences. At 1000 Hz, the phon and dB scales are identical. For a 100Hz tone and a 1000Hz tone to sound at equal loudness, the 100 Hz tone must have a much higher SPL, since the human ear is much less sensitive to low frequencies.

Figure 1. ISO226:2003 Equal loudness contours. Produced with Matlab and Plotly.


Take a hearing test like the ones used to collect data for those phon curves.

Just as in visual perception, perceived brightness is dependent on frequency (hue) and not just light intensity, in auditory perception, perceived loudness is also highly dependent on frequency (and not just sound intensity). That means that if we simply construct a linear pitch scale at a constant intensity (or sound pressure, or gain” in digital audio terms), then some pitches will sound much louder than others, making them sound more important” and biasing the interpretation of the data. So, we need to construct a perceptually uniform pitch-loudness space” analogous to a perceptually uniform color space” (such as CIELAB).

This 2D pitch-loudness space based on ISO226:2003 shows equal perceived differences in pitch on the X axis (as log Frequency), and equal perceived differences in loudness (phons) on the Y axis. The colored contours show the SPL required to produce a desired loudness (phon) level at that frequency.

Figure 2. SPL contours for given frequency and loudness level, derived from ISO226:2003. Produced with Matlab.

Let’s hear that. First, here’s a sine tone sweeping from low to high across the audible frequency range, at a single Sound Pressure Level (dB). This sound would be a horizontal line on the first graph, and a contour line on the second graph. Notice how the sound is hard to hear at first (in the low range) and at the end (high range).

Listen with decent headphones in a quiet room for best results. Turn the volume up to comfortably loud level.

31.5 Hz to 12,500 Hz, equal SPL:

Next, the same sweep, but with the SPL constantly following the 80 phon equal loudness contour.” That’s a contour line on the first graph, and a horizontal line on the second graph.

31.5 Hz to 12,500 Hz at equal loudness
(using 80 phon equal loudness contour):

Notice the low and high sounds are easier to hear. If you listen on good reference” speakers in a recording studio environment, the sine tone would sound equally loud across the whole range.

Using that second graph of pitch-loudness space, you can now imagine constructing more complex scales, such as a diagonal line that goes from low+loud to high+soft, or divergent scales analogous to these divergent color scales (code by Gregor Aisch). (We might want to take into account the Sone scale of loudness to construct certain scales in 2D pitch-loudness space, but that’s a topic for another day…)

Note: All of this is only valid for sounds that are pure tones” (i.e. sine tones). If we want to construct objective, linear scales for sonification using complex tones we will need a higher-dimensional timbre-space, and one that is perceptually uniform—something like this (Hiroko et al. Perceptual distance in timbre space”) or this (Hoffman et al. Feature-Based Synthesis for Sonification and Psychoacoustic Research”). But that’s a topic for yet another day.


Software Tools

The ISO226 standard (2003 revision) represents the current most widely accepted representation of perceptually uniform pitch-loudness space (for sine tones). This is a commercial standard, and you need to purchase it in order to read the complete document. Here are a couple of tools that implement it:

  1. There is a good Matlab function called iso226(), written by Christopher Hummersone that outputs SPL in dB for given frequency and phon level(s). The function implements the mathematics in the ISO226:2003 and uses Matlab’s shape preserving pchip method for interpolation.

  2. For my own purposes, I wrote an abstraction for Cycling74′s Max environment called can.phon2dB that does the same thing, using a pre-calculated lookup table. The lookup table is in in cycling74′s jitter matrix” format and is well suited for real-time applications. (I’m using it to build a sonification of microbial populations—more on that in a later post…)

can.phon2dB.zip — The zip file contains 3 files:

  • can.phon2dB.maxpat - the abstraction
  • can.phon2dB.maxhelp - the help patch
  • iso226-2003.jxf.jit - the matrix file (64MB)

The .jxf.jit file must be in your search path. This lookup table has a resolution of 1 Hz and 0.25 phon, which is about the same as or less than the just-noticeable difference” for pitch and loudness, respectively.


The Mathematics: Formula for deriving SPL from Phons

Read on if you’re interested in the math involved in constructing scales from the equal loudness curves, or on practical tools for sonification. The formula is taken from ISO226.

The sound pressure level Lp L_{p} of a sine tone at frequency ff at perceived loudness level LNL_{N} is:

Lp=(10αflog10Af)L_{p}=( {10}A{f} ) dB LU+94-L_{U}+94 dB

where

Af=4.47×103×(100.025LN1.14)+[0.4×10(Tf+LU109)]αf A_{f}=4.4710^{-3}( 10^{0.025L_{N}}-1.14 )+[ 0.410^{( -9 )} ]^{_{f}}

and

  • TfT_{f} is the threshold of hearing for frequency ff, in dB;
  • αf_{f} is the exponent for loudness perception at frequency ff;
  • LUL_{U} is a magnitude of the linear transfer function normalized at 1000 Hz;

The values of TfT_{f}, αf_{f} and LUL_{U} are given in the following table:

Frequency ff Loudness perception exponent αf_{f} Transfer function magnitude LUL_{U} Threshold of hearing TfT_{f}
20 Hz 0.532 -31.6 dB 78.5 dB
25 Hz 0.506 -27.2 dB 68.7 dB
31.5 Hz 0.480 -23.0 dB 59.5 dB
40 Hz 0.455 -19.1 dB 51.1 dB
50 Hz 0.432 -15.9 dB 44.0 dB
63 Hz 0.409 -13.0 dB 37.5 dB
80 Hz 0.387 -10.3 dB 31.5 dB
100 Hz 0.367 -8.1 dB 26.5 dB
125 Hz 0.349 -6.2 dB 22.1 dB
160 Hz 0.330 -4.5 dB 17.9 dB
200 Hz 0.315 -3.1 dB 14.4 dB
250 Hz 0.301 -2.0 dB 11.4 dB
315 Hz 0.288 -1.1 dB 8.6 dB
400 Hz 0.276 -0.4 dB 6.2 dB
500 Hz 0.267 0.0 dB 4.4 dB
630 Hz 0.259 0.3 dB 3.0 dB
800 Hz 0.253 0.5 dB 2.2 dB
1000 Hz 0.250 0.0 dB 2.4 dB
1250 Hz 0.246 -2.7 dB 3.5 dB
1600 Hz 0.244 -4.1 dB 1.7 dB
2000 Hz 0.243 -1.0 dB -1.3 dB
2500 Hz 0.243 1.7dB -4.2 dB
3150 Hz 0.243 2.5 dB -6.0 dB
4000 Hz 0.242 1.2 dB -5.4 dB
5000 Hz 0.242 -2.1 dB -1.5 dB
6300 Hz 0.245 -7.1 dB 6.0 dB
8000 Hz 0.254 -11.2 dB 12.6 dB
10,000 Hz 0.271 -10.7 dB 13.9 dB
12,500 Hz 0.301 -3.1 dB 12.3 dB

To produce smooth curves, we must interpolate between the 29 data points. The graphs and software tools in this post all interpolate using Matlab’s custom built-in pchip method, which is a shape preserving PCHIP (Piecemeal Cubic Hermite Interpolating Polynomial). It is similar to using bezier curve interpolation, but with a tighter fitting curve that does not overshoot” actual data points, so that the curves shape is preserved.

The ISO226 specification also provides the formula for deriving Phons from SPL, which uses the same tables for TfT_{f}, αf_{f} and LUL_{U}.


  1. There is another, empirically derived scale for frequency perception called the Mel scale, which in some contexts can be more appropriate, but the ordinary logarithmic frequency scale is probably best in most cases. The Mel scale seems to capture the way we perceive frequency best in the absence of any musical” context. But if we listen to a linear scales constructed from small equal steps, (where we can easily compare each interval with the adjacent intervals) then scales based on the simple mathematical definition of pitch (equal frequency ratios = equal perceptual difference) produce more even results than ones based on the Mel scale, especially for anyone with musical experience. Listen to an example of the Mel scale here if you’d like to judge for yourself: http://www.sfu.ca/sonic-studio/handbook/Mel.html

August 24, 2015

Global Ocean Surface Waves, Visualized

Here is an exquisite vizualization of ocean surface waves by Cameron Beccario @cambecc. The animation below is just one visualization possible with his tool, Earth. Visit earth.nullschool.net to play with wind, current and temperature data. Scroll down to read more about Earth.


The color scale shows the peak wave period (time between crests) for ocean swells. Longest period waves (up to 25 seconds) are brighter cyan, and chopy, short period waves approach black. The animation on top is a representation of the surface wave vector field,” and not an animation of actual surface waves.

Wave conditions (updated every 3 hours). Scroll down to see more.

Earth: an Excellent Example of Good Design

Earth is an excellent example of how good design can produce visualizations that engage a public audience and do a better job of representing data. A good data visualization begins by formulating the task as a Design Problem. The design problem that Earth sets out to solve might be worded like this:

Represent near real-time global weather forecast in an accurate yet holistic way that captures geographical context.”

Earth is on view now through Fall 2015 at Point.B Studio in Port Orford, Oregon, as part of the exhibition, Gegenshein.

You might add and is visually successful as fine art,” since one of Baccario’s intended uses of Earth is to generate fine art prints and other materials for exhibition. But a really good design can produce museum quality work in any case, even if it has a functional purpose.

Beccario decided that what was required was not a single visualization but a tool to generate visualizations of a choice of datasets from a variety of perspectives. The design of the resulting tool, Earth, flows from that requirement. It uses a visually efficient, limited, and unified design language and intuitive interface to allow users to display a very large range of possible visualizations.


Relief map of the moon, color shaded, with rainbow scale. (Image credit: NASAs Goddard Space Flight Center/DLR/ASU.) See Drew Skau’s visual.ly blog post: Dear NASA: No More Rainbow Color Scales, Please”

One small component of that design language is the carefully constructed set of color scales. Color scales are often an afterthought in scientific visualizations, often relying on default rainbow” scales. These naïve color scales can cause intelligibility problems and bias the interpretation of the data. For example, the colors in the standard rainbow scale are not of equal perceived brightness; therefore some bands of information can be perceived as more important than they are, and we can perceive boundaries between color bands that are not necessarily there in the data.1 Beccario designed more perceptually relevant color scales with the help of the online tool ColorBrewer. (There are many other similar tools out there. See also New York Times’ designer Gregor Aischs excellent article on color scales.) Beccario’s color scale for wave periods is a perceptually even brightness gradient of a single hue (cyan), so any boundaries we see in the map are real boundaries, an not artifacts of the color scale.

Beccario's color scale for wave period is a perceptually linear brightness gradient of a single hue.Beccario's color scale for wave period is a perceptually linear brightness gradient of a single hue.

I encourage everyone, including scientists, to explore Cameron Beccario’s tool Earth at their leisure, to discover the difference good design can make in data visualization.

Watch Beccario talk about his design process for Earth at the Graphical Web 2014 conference:


  1. Drew Skau, Dear NASA: No More Rainbow Color Scales, Please.” blog post, Visually. http://blog.visual.ly/rainbow-color-scales/

August 14, 2015

How much oil was that?

Timeline:

-Apr 20: Rig explodes
-Apr 22: Rig sinks
-Jul 15: Well capped
-Sep 19: Well sealed

5 years ago, in 2010, the Deepwater Horizon drilling platform in the Gulf of Mexico exploded when a pulse of high-pressure methane gas from the 1500m deep Macondo wellhead expanded into the drilling riser and rose into the drilling rig. The resulting oil spill was the largest accidental marine oil spill in history.

The quantity of oil released — 4.9 million barrels (206 million gallons)1 — is difficult to grasp. To try to wrap my head around what that volume of oil looks like, I first represented it as an oil storage tank roughly 27m (85ft) in diameter like these…

View of Chevron crude oil depot, Richmond CA. Josh Cassidy/KQED. (Appologies to Chevron --- The Gulf spill was BP's accident. Chevron's tanks are only used in this article to show scale.)View of Chevron crude oil depot, Richmond CA. Josh Cassidy/KQED. (Appologies to Chevron --- The Gulf spill was BP's accident. Chevron's tanks are only used in this article to show scale.)

…but 1500m high (nearly a mile). That’s the depth of the wellhead. There is no oil tanker in existence that could contain all that oil, though the now scrapped super tanker Seawise Giant” could have held most of it.

Deepwater Horizon oil slick, in a May 2010 NASA imageDeepwater Horizon oil slick, in a May 2010 NASA image

Of course, the spill didn’t look like that. About half of the oil spread accross the surface as a slick, and half spread out in a deepwater plume2, never reaching the surface but impacting the water column and ocean bottom. However, an intuitive feel for the quantities can us help to begin to understand the impact of the spill.

Natural oil and gas seeps in the Gulf

ECOGIG studies the effect of natural hydrocabon seeps on the Gulf ecosystem, and compares them with the effects of large accidental releases.

The Gulf of Mexico is well known as an oil reservoir, and like other ocean oil reservoirs, the Gulf features thousands of naturally occuring seeps that release small quanitites of oil and gas into the ocean on a continual basis. One might think that since the Gulf ecosystem has evolved to cope with these natural hydrocarbon releases, the impact of oil spills would be small (or at least reduced). But to understand the relationship between natural and accidental hydrocarbon releases in the Gulf, we need to begin by inderstanding the differences in scale.

1 barrel (42 gallons)1 barrel (42 gallons)

A very active natural oil seep in the Gulf of Mexico can put out around 1 barrel of oil per day and looks like this:

Surface sheen above a natural oil seep in the Gulf of Mexico. Photo by Beth Orcutt.Surface sheen above a natural oil seep in the Gulf of Mexico. Photo by Beth Orcutt.

Natural oil seep in the Gulf of Mexico. Photo ECOGIGNatural oil seep in the Gulf of Mexico. Photo ECOGIG

There are something like 20,000 seeps in the Gulf but most put out much less that one barrel per day. The total daily output of all the seeps in the Gulf is not well constrained, but a reasonable” estimate based on extrapolation from looking at small areas would be from 2500 to 10,000 barrels per day.34

The large tanks in the background at left, some of the largest ever built, are 88m in diameter and hold 750,000 barrels each. A 747 could park comfortably inside.The large tanks in the background at left, some of the largest ever built, are 88m in diameter and hold 750,000 barrels each. A 747 could park comfortably inside.

One day's output from all 20,000 natural oil seeps in the entire Gulf of Mexico (2500 to 10,000 barrels) would fill a hypothetical tank 19.6m (64 ft) high and 5 to 10 meters (16 to 32 ft) in diameter. *The large tanks in the background hold 750,000 barrels and would comfortably fit a 747 inside.One day's output from all 20,000 natural oil seeps in the entire Gulf of Mexico (2500 to 10,000 barrels) would fill a hypothetical tank 19.6m (64 ft) high and 5 to 10 meters (16 to 32 ft) in diameter. *The large tanks in the background hold 750,000 barrels and would comfortably fit a 747 inside.

By contrast, the Macondo wellhead spilled from 57,000 to 70,000 barrels per day from a single point source. It would take from 1 to 4 weeks for the output of all the natural seeps in the gulf to equal one day’s output from the macondo wellhead.

Or you could flood the passenger and cargo holds of 8.75 to 10.75 Boing 747sOr you could flood the passenger and cargo holds of 8.75 to 10.75 Boing 747s


Macondo wellhead spewing oil and gas. Photo credit: U.S. Geological SurveyMacondo wellhead spewing oil and gas. Photo credit: U.S. Geological Survey

One day's output from the Macondo wellhead blowout would have filled a hypothetical tank the same height (19.6m or 64 ft) but 24 to 27 meters (80 to 88 ft) in diameter.One day's output from the Macondo wellhead blowout would have filled a hypothetical tank the same height (19.6m or 64 ft) but 24 to 27 meters (80 to 88 ft) in diameter.

1 day’s output would fill a giant 88m diameter storage tank in 11-13 days. The wellhead discharged oil continually for 86 days, outputting 4.9 Million Barrels. It would take all the storage tanks in the field below to contain that oil:

Section of the Chevron Richmond Refinery. These tanks are used to store crude oil offloaded from oil tankers. the tanks on the right are the ersatz 747 hangars.Section of the Chevron Richmond Refinery. These tanks are used to store crude oil offloaded from oil tankers. the tanks on the right are the ersatz 747 hangars.

Or, one might construct a single storage tank 6 stories tall and nearly 2 1/2 football fields in diameter.

This hypothetical storage tank, too large to ever be built, is 19.6m high and 225m in diameter.This hypothetical storage tank, too large to ever be built, is 19.6m high and 225m in diameter.

To sum up the comparison of natural vs. accidental oil releases in the Gulf of Mexico, let’s look at those quantities side by side.

You can read more about the Deepwater Horizon spill and its effects on the Gulf ecosystem at ecogig.org.

If you’d like to explore the visualizations in this article yourself in Google Earth, download the KMZ files below. Double click on the .kmz files and they will open in Google Earth.

seeps.kmz
macondo-daily.kmz
macondo-widetank.kmz


  1. Marcia K. McNutt et al., Review of Flow Rate Estimates of the Deepwater Horizon Oil Spill,” Proceedings of the National Academy of Sciences 109, no. 50 (December 11, 2012): 20260–67, doi:10.1073/pnas.1112139108. Accessed August 11, 2015. http://www.pnas.org/content/109/50/20260.full

  2. McNutt et al., p. 20267

  3. I. R. Macdonald et al., Natural Oil Slicks in the Gulf of Mexico Visible from Space,” Journal of Geophysical Research: Oceans 98, no. C9 (September 15, 1993): 16351–64, doi:10.1029/93JC01289

  4. Joye, Samantha B., e-mail message to author, June 30, 2015.


© Copyright 2015 Éric Marty