Notes from Edward Tufte's
The Visual Display of Quantitative Information

Matthew R. DeVerna

Contents

Ch 4. Data-Ink and Graphical Design

Tufte provides five data-ink principles for how we should think about utilizing ink when creating data graphics. Below I have included the principles, however, he summarizes the concept nicely in the opening sentence of the chapter:

Data graphics should draw the viewer's attention to the sense and substance of the data, not to something else. The data graphical form should present the quantitative contents. Occasionally artfulness of design makes a graphic worthy of the Museum of Modern Art, but essentially statistical graphics are instruments to help people reason about quantitative information.

E.R. Tufte, The Visual Display of Quantitative Information, Ch. 4.

From a more formulaic perspective, Tufte also provides the data-ink ratio Jump down to an example of maximizing the data-ink ratio :

$$\text{Data-ink ratio} = \frac{\text{data-ink}}{\text{total ink use to print the graphic}}$$

... which should typically be optimized. That is, we should always work to reduce the amount of unnecessary ink on our graphic in order to not distract from the data.

This point is further clarified in Tufte's below principles.

Data-Ink Principles

Above all else show the data.
Maximize the data-ink ratio.
Erase non-data-ink.
Erase redundant data-ink.
Revise and edit.

Principles in Practice

A great example of this principle is provided by Tufte. Consider the below figure:

Example of graph with too much ink Linus Pauling, General Chemistry (San Francisco, 1947), page 64.

This visualization is overpowered by the presence of the "+" symbols and unnecessary grid lines crowd the image.

Following the above principles, Tufte recreates the figure removing the "+" symbols, which allows for the inclusion of additional non-data-ink which aids the viewer, even highlighting the few data points which do not fall on the fit lines.

A cleaned version of the graph with too much ink

Other minor changes like reducing the number of tick marks and rotating the y-axis label make the plot easier to read and focus the viewers eyes more on the data itself.

Ch 5. Chartjunk: Vibrations, Grids, and Ducks

With savage pictures fill their gaps
And o'er unhabitable downs
Place elephants for want of towns.

Johnathan Swift's Indictment of 17th-century cartographers
As included by E.R. Tufte, The Visual Display of Quantitative Information

Again, I allow Tufte to introduce the topic in his own words:

The interior decoration of graphics generates a lot of ink that does not tell the viewer anything new. The purpose of decoration varies - to make the graphic appear more scientific and precise, to enliven the display, to give the designer an opportunity to exericse artistic skills. Regardless of its cause, it is all non-data-ink or redundant data-ink, and it is often chartjunk.
...
Like weeds, many varieties of chartjunk flourish. Here three widespread types found in scientific and technical research work are catalogued - unintentional optical art, the dreaded grid, and the self-promoting graphical duck.

E.R. Tufte, The Visual Display of Quantitative Information, page 107

Unintentional Optical Art (Moiré Effects)

Example of moiré effects from Tufte's Visual Display of Quantitative Information, pg. 111 Various examples of the moiré effect. Taken from E.R. Tufte, The Visual Display of Quantitative Information, page 111.

The general problem here is that certain patterns, when used within visualizations, can overpower the graphic in a way that distracts from the data.

Tufte says that this creates an optical effect like a visual vibration.

Here is a good example provided by Tufte:

Bad graph, heavy on the moiré effects from Tufte's Visual Display of Quantitative Information, pg. 108 Institute de Expansāo Commercial, Brasil: Graphicos Economics-Estatisticas, (Rio de Janeiro, 1929) page. 15.

In this example, "the noise clouds the flow of information." The focus should be on the data but the patterns amount to a hostile take-over of the visualization.

The Grid

One of the more sedate graphical elements, the grid should usually be muted or completely suppressed so that its presence is only implicit - lest it compete with the data. Grids are mostly for initial plotting at home or office rather than putting into print. Dark grid lines are chartjunk. They carry no information, clutter up the graphic, and generate graphic activity unrelated to data information.

E.R. Tufte, The Visual Display of Quantitative Information, page 112-113

Tufte offers the below comparison as an example. The figure is representative of the "age-sex pyramid of the population of France in 1967."

Graphic with a poor use of gridlines (a) Military losses in World War I
(b) Deficit of births in World War I
(c) Military losses in World War II
(d) Deficit of births in World War II
(e) Rise of births after demobilization after World War II

Note: The above are include as part of the figure in Tufte's text, however, I include the manually for image sizing purposes.

Tufte offers up the below improvement which "quiets the grid and gives emphasis to the data."

Tufte's improvement to graph with that uses grid lines poorly Note: Undoubtedly this revision "quiets the grid," however, it seems like this version alone would be entirely incomprehensible without any labels at all.

While I believe that Tufte's suggestion to mute grid lines is widely adopted and well warranted, I found the examples in this chapter to be lacking.

Self-Promoting Graphics: The Duck

When a graphic is taken over by decorative forms or computer debris, when the data measures and structures become Design Elements, when the overall design purveys Graphical Style rather than quantitative information, then that graphic may be called a duck in honor of the duck-form store, "Big Duck." For this building the whole structure is itself decoration, just as in the duck data graphic.

The big duck building! Big Duck , Flanders, New York; photographed by Edward Tufte, July 2000.

The general idea here is simple: do not value design over data clarity. Of the below duck Tufte says, "this may well be the worst graphic ever to find its way into print:"

The worst duck ever? American Education, 1970s

It took me about 5 minutes to break down and understand the above figure which ended up being an extremely simple set of data. This captures the concept of a duck extremely well: by adding unnecessary stylistic additions, the creator of this visualization converted simple data into something that becomes nearly incomprehensible.

Tufte's conclusion to this chapter is perfect:

Chartjunk does not achieve the goals of its propagators. The overwhelming fact of data graphics is that they stand or fall on their content, gracefully display. Graphics do not become attractive and interesting through the addition of ornamental hatching and false perspective to a few bars. Chartjunk can turn bores into disasters, but it can never rescue a thin data set. The best designs (... Here Tufte provides a long list of the "best designs." I exclude them for space and because they are not needed to make the central point. ) are intriguing and curiosity provoking, drawing the viewer into the wonder of the data, sometimes by narrative power, sometimes by immense detail, and sometimes by elegant presentation of simple but interesting data. But no information, no sense of discovery, no wonder, no substance is generated by chartjunk.

Forgo chartjunk, including
moiré vibration,
the grid, and the duck.

Ch 6. Data Ink Maximization and Graphical Design

Painting is special, separate, a matter of meditation and contemplation, for me, no physical action or social sport. As much consciousness as possible. Clarity, completeness, quintessence, quiet. No noise, no schmutz, no schmerz, no fauve schwärmerei. Perfection, passiveness, consonance, consummateness. No palpitations, no gesticulation, no grotesquerie. Spirituality, serenity, absoluteness, coherence. No automatism, no accident, no anxiety, no catharsis, no chance. Detachment, disinterestedness, thoughtfulness, transcendence. No humbugging, no button-holing, no exploitation, no mixing things up.

Ad Reinhardt, statement for the catalogue of the exhibition, "The New Decade: 35 American Painters and Sculptors," Whitney Museum of American Art, New York, 1955

In this chapter Tufte offers up some alternatives to standard plots by applying his principles of data-ink maximization. I encourage a reader of these notes to read the chapter in full as it is quite short and Tufte does a good job of walking the reader through the process of erasing non-data-ink and making revisions. That said, I will attempt to summarize the most important takeaways where I can.

Redesigning the Bar Chart/Histrogram

Beginning with the below:

A simple bar chart

we can maximize data-ink to create the below:

A simple bar chart Note: Tufte takes this process one step further and removes the y-axis tick marks. This seems to far, in my opinion, because data-ink is then being removed.

The general steps being taken are:

  1. Remove the frame
  2. Remove the vertical axes
  3. Remove ticks
  4. Add a white grid A "white grid" is represented with the erasure of lines within the bar charts at the data points along the y-axis.

Redesigning the Scatterplot

A simple bivariate scatterplot...

A conventional scatterplot Conventional scatterplot

... can be improved by utilizing a range-frame

A range-frame scatterplot Range-frame scatterplot

range-frame axes only extend as far as the minimum and maximum available data points, allowing for the encoding of this additional information into the graphic.

a visual explanation of range-frame axes

Here is a nice, simple example of this in practice:

practical example of range-frame axes

Tufte also illustrates the dash-dot-plot wherein the entire frame can be turned into data by framing the scatterplot with a marginal distribution of each variable:

example of dash-dot-plot

A more sophisticated example is provided where more "standard" distributions are utilized along the margins, similar to Seaborn's jointplot function:

A joint-margin plot example Figure Note: Narrowband spectra of individual subpulses. Each point of the intensity Iq(t) plotted on the right is the sume of the distribution of intensities across the receiver bandwidth shown in the center. At the top is plotted the spectrum averaged over the pulse. In the limit of many thousands of pulses this would show the receiver bandpass shape.

Figure Source: Timothy H. Hankings and Barney J. Ricket, "Pulsar signal Processing," in Berni Alder et al., eds., Methods in Computational Physics, Volume 14: Radio Astronomy (New York, 1975), pg. 108.

In the end, Tufte closes with the following:

Maximixing data ink (within reason) is but a single dimension of a complex and multivariate design task. The principle helps conduct experiments in graphical design. Some of those experiments will succeed. There remain, however, many other considerations in the design of statistical graphics - not only of efficiency, but also of complexity, structure, density, and even beauty.

Maximizing the Data-ink Ratio in Practice