Chart Deception: Part 2

Chart Deception: Part 2

Slide 1

In part two of this lecture I’ll talk about X, Y or scatter charts and then provide many examples of poorly constructed graphics. In many cases, I’ll present alternative and better constructions of the same graphics.

Slide 2

Some of this advice is critical to designing non-deceptive scatter charts. It’s important to label the data points.

Slide 3

This graph is essentially meaningless for communicating real information because there are no data points andno axes labels.

Slide 4

In contrast to the previous graph, this graph includes both labeled points and axes.

Slide 5

You’ll notice, as you start looking at the graphics more critically, that graphical displays like those in USA Today exaggerate differences or changes or trends. USA Today does this by not starting it’s axes at the origin (the 0,0 point). The next four slides illustrate this problem.

Slide 6

Here the Y axis starts at 3 and ends in 9, so there appears to be a huge shift in average orders by month. People will remember the shape of this graph, and a quick glancesuggests there were no orders in October, November, and December, but a substantial orders in July, August, and September.

Slide 7

In contrast to the previous slide, this slide suggests a much more stable pattern of average orders per salesperson over the six month period. The first slide suggests tremendous variability, which is incorrect. The true message is that orders were a bit higher in July through September and a bit lower in October through December. A graph that suggests otherwise is deceptive. Also notice that the Y axis starts at the zero point and extends beyond the top-most value, which occurred in August.

Slide 8

Here’s another example of the failure to use the 0,0 origin as the starting point. A quick look at this graph suggests that the average orders per salesperson are highly variable and that September was a disastrous month.

Slide 9

In fact, just the opposite is true. The average orders have been extraordinarily stable across this six month period. The first graph leaves viewers with a distorted impression of the average orders over this six month period.

Slide 10 (No Audio)

Slide 11

In this graph, distortion is caused from treating unequal time intervals as equal. The graph on the right, B, shows a huge area associated with the period between 1975 and 1980. As a result, it seems as if there was a long take off period before the increase in dollars. Alternatively, the graph on the left, A, presents an undistorted X axis. In A, dollars seem relatively stable over the first half of the graph, but from 1985-2000there was a meaningful increase in dollars. You should avoid graphs like B, in which the period from 1975-1980 and the period from 1995-2000 are distorted.

Slide 12

Even if you depict equal intervals identically in your graph, you can still do things to change the visual impression. The top left graph shows the original scaled arrangement; you can see what happens as you start expanding and contracting the X and Y axes. In some cases, you can make changes seem much smaller; in other cases, you can make the changes seem much greater. You shouldn’t distort the message by arbitrarily expanding or contracting the horizontal and vertical scales of your graph. X, Y plots are meant to indicate trends and variability. You can influence that message by arbitrarily expanding or contracting the X and Y axes.

Slide 13 (No Audio)

Slide 14

As I mentioned earlier, avoid broken axes. Here, the Y axis jumps from 0 to 9, so it seems as if the designer’s properly started at the Y axis at the zero (0) point, but in fact has distorted the message. Thus, it appears that this line graph depicts a huge increase from 1930 to 1970. The numbers in the graph indicate a roughly 50% increase. Do you believe anyone viewing this graph casually will see only a 50% increase when the graph suggests—due to the broken Y axis—that there’s a 400 to 500% increase? Using discontinuous axes to depict data will only confuse and distort the information.

Slide 15

Here are three more examples of why you shouldn’t distort a chart by using broken axes and why you should start from the 0 point on the X and Y axis.

Slide 16 (No Audio)

Slide 17

There are certain conventions when people view graphs, and one of those conventions is thatan upward sloping line means the quantity is increasing over time, especially if time is on the X axis and the amount in question is on the Y axis. This cumulative rainfall graph seems to indicate that rainfall increased markedly from June to December. In fact, the individual amounts for each month listed at the bottom of the slide indicate no increase. Avoid using cumulative charts unless you’ve a good reason for doing so because cumulative charts tend to be inconsistent with people’s chart expectations.

Slide 18

Although this is a bar chart instead of an X, Y chart, using cumulative bar charts presents a similar problem. Avoid using cumulative charts.

Slide 19

Occasionally, people seem compelled to put multiple graphs into the same single graph. Somehow, this arrangement is meant to depict vital information about how the two graphs are related. This format can only confuse people, as you’ll see in the next three slides.

Slide 20

Someone looking at this graph is supposed to conclude that sales are dropping because inventory is dropping; however, those things could be independent. Graphing those things together suggests a relationship when none may exist. If it’s necessary to show sales and inventories, you should use two graphs, not one.

Slide 21

Here’s another example of graphing two things together that suggests these things are related. It’s doubtful that consumption triples when the outdoor temperature increases from 80 to 95 degrees, yet that’s what this slide suggests. Although these things may be barely or strongly related, but this slide indicates they’re strongly related, which may or may not be the case.

Slide 22

This graph is meant to indicate newspaper readership over time. The Daily News and The Post are two New York City newspapers. A quick look at this graph suggests that readership is convergent, which isn’t true, at least not as dramatically as suggested. Part of the problem isthis graph is discontinuous; it suddenly drops from 800,000 readers per day to 1.5 million readers per day. In other words, there are two graphs—the Daily New graph on the top and the Post graph on the bottom—that someonestuck together. Although the readership of the Post is increasing and the readership of the Daily News is decreasing, they are not converging as rapidly as suggested by this graph.

Slide 23

As I mentioned in an earlier lecture, many marketing relationships are non-linear, and one way to linearize a relationship is to transform the data. Converting data into its log can be useful for scientific audiences used to semi-log charts. However, such transformations will deceive non-scientists who don’t understand it. In this case, the use of semi-log charts minimizes the appearance of a trend.

Slide 24

Here’s a logarithmic transformation. Equal intervals denote increases by a power of 10. One is 10 to the zero power, 10 is 10 to the first power, 100 is 10 to the second power, and 1000 is 10 to the third power. Seemingly, there’s not much of a trend from 1995-1997, and the extrapolated area suggests a mild increase over time. In fact, if this graph is correct, then the increase from 1995 to 1999 is from 10 to 1000, or a 100-fold increase. A casual look at the graph suggests only a tripling.

Slide 25

This is another one of those examples of people’s cultural norms regarding the reading of graphs. If the X axis isn’t the time axis then connecting the dots can mistakenly suggest a trend because that’s what people expect when they see X, Y charts, a bunch of dots, and a line that connects those dots. Don’t connect the dots unless the X axis is the time axis.

Slide 26 (No Audio)

Slide 27

Finally, regarding X, Y charts, I want to ensure you’re clear about interpolation and extrapolation. With interpolation you’ve got two points on the graph and you’re trying to guess the midpoint between those two points. With extrapolation, you’re looking at all the points up to the end of the graph and then trying to guess subsequent points beyond the current points. Both interpolation and extrapolation are subjective assessments. Interpolation may seem safer because you’re guesstimatinga midpoint. If you have a large series of points, you’ll feel comfortable that there won’t be some dramatic change right at the midpoint or somewhere between two existing points. With extrapolation, it’s impossible to know whether trends will continue or not; there could be a dramatic increase or decrease relative to the current trend not suggested by theory, sophisticated forecasting methods, and the like. Just remember that extrapolations are highly subjective and interpolations are somewhat subjective.

Slide 28 (No Audio)

Slide 29

Here’s what I mean by a radar chart. Although these are popular in Japan, they are problematic. In this example, the axes are not identical; although acceleration goes from 0 to 15 but handling goes from 0 to 8, these two axes are the same length. Fuel economy goes from 0 to 10, riding and styling goes from 5 to 15. Even if you put multiple plots on the same graphic, you’ve got all the problems that I mentioned earlier about the use of profile analysis and semantic differentials: you can’t know what’s important and you can distort people’s perceptions by changing the relative size of the different and unrelated axes. I urge you to avoid radar charts. Fortunately, no current spreadsheet, graphics, or statistical packages uses this approach for plotting data.

Slide 30

People seem to focus on point estimates—modes, medians, and means—which are the single best summary numbers. For metric—interval- or ratio-scaled—data, that point is an estimate based on the sample you drew. If you drew subsequent samples, you may find different point estimates. It’s important to give the viewers of your graphs a sense for the range of likely point estimates for repeated samples. That’s what you depict when you show point estimatesand confidence intervals around those estimates. The next three slides show confidence intervals as well as the point estimates.

Slide 31 to Slide 33 (No Audio)

Slide 34

You always want to use footnotes to indicate the source of the information depicted in any table that you create. Otherwise, you can be accused of plagiarism, and that’s a serious charge. In this Internet era, many people believe that borrowing liberally from other sources is okay, as the true creativity is in the remixing of existing sources. That’s not a good mindset. If it’s obvious that you have borrowed something but haven’t indicated the source, then that’s plagiarism. Try to avoid plagiarism by using footnotes.

You should use proportional fonts and you shouldn’t mix fonts. I use arial fonts for everything. If you import a jpg file, then it’s difficult to manipulate fonts because you’re starting from a picture. Don’t use all uppercase lettering; instead, mix upper and lower case letters. Using uppercase only is equivalent to shouting. If you need to emphasize, then use underline or italics. Finally, consider how you depict numbers. If the numbers represent data points, then use numbers; otherwise, spell out the numbers one through nine. After that, you can use digits for 10 through infinity.

Slide 35 (No Audio)

Slide 36

The next four slides depict bad graphics but suggest no fix. The text indicates what’s inappropriate about the graphand what makes it bad suggests improvements.

Slide 37 toSlide 38 (No Audio)

Slide 39

This graph is lousy for two reasons. First, the Y axis is logarithmic. Second, the graph depicts two different things, which suggests a relationship when none may exist.

Slide 40

This is another poor graph. Notice the effort to compare males to females over the same period, 1968-1976. Between the two graphs are dotted lines that imply a precipitous drop from males to females. Although there’s a drop, it’s not as much as implied by this graph. By comparing, in that dotted line, 1976 data for males to 1968 data for females, the graph creates a distorted impression.

Slide 41 (No Audio)

Slide 42

Here’s an example of stacked 3-D bar charts to reveal a trend from 1971 to 2000 across five different sources of electricity.

Slide 43

The real message from the previous chart is that there are profound increases predicted for petroleum and nuclear energy, but only modest increases for other sources. That point is obscured in the previous slide but made obvious in a scatter plot instead of a stacked 3-D bar chart.

Slide 44 (No Audio)

Slide 45

In the previous slide, the Y axis did not begin at the origin. Here’s what happens when you take that same data and plot it with the Y axis for expenditures per pupil starting at the origin. In this case, it’s clear that expenditures have been relatively stable. The previous slide implies that upward trends in expenditures have caused upward trends in SAT scores. The data, if depicted properly, suggests just the opposite.

Slide 46 (No Audio)

Slide 47

The problem with the first graph in this two graph series is that 1978 data was only partial data.As a result, it suggests that there was a downward trend from 1976 to 1978. In fact, a careful look at the 1978 data indicates an upward trend in commission payments.

Slide 48 (No Audio)

Slide 49

The previous stacked bar chart version of this data obscured that different countries were pulling or not pulling their weight in this regard. By dividing the one stacked bar chart into four separate graphs, it appears that the U.S. has added production, Japanese production is flat, West Germany reserves have fluctuated, and the stock for all other OECD countries has declined. The first oil stocks graph obscures whether or notU.S. stocks have increased, the stocks of the two other countries have remained stable, and the stocks of the remaining OECD countries have declined.

Slide 50

Here’s an excellent example for why you should avoid the chart junk that appears in publications like USA Today. Notice that the graph in the background contains barrels of beer and this is supposed to artistically indicate changes in beer sales from 1970 to 1978. In 1970, the number of barrels was 120 million. This graph doesn’t start from the zero point; rather, it starts from 100 million. In 1978, total sales were 160 million barrels. The difference between 120 million and 160 million is 40 million, or a 33 1/3% increase. Because the graph starts from a non-zero point on the Y axis and amounts sold are associated with barrelsizes, the 1978 barrel appears huge relative to the 1970 barrel. A quick glance suggests that U.S. beer consumption increased tremendously in the eight year period. The graph in the foreground indicates the millions of barrels sold by Schlitz, and the brewer’s market share seems in rapid decline.

Slide 51

However, as this next graph shows, that’s not the case. In fact, U.S. beer sales grew steadily throughout the 1970s. This graph indicates what I mentioned earlier: sales are up 1/3rd and that Schlitz’s original market share of 15% first grew to 25%, and then declined to 20%. The previous graph suggests that the market exploded and Schlitz sales not only declined but its share of the market dropped far more than 5% from its peak.

Slide 52

This graphic shows someone’s artistic approach to illustrating the declining purchase power of the dollar. Starting with 1958 as the base year, when Eisenhower was President, the dollar was worth a dollar. By Jimmy Carter’s time in 1978, the value of a dollar dropped to $0.44.

Slide 53

As this graph shows, the purchasing power of the dollar has declined since the Eisenhower Administration through at least half of the Carter Administration. The dollar on this graph is worth less than half in 1978 as it was worth in 1958. That’s the correct interpretation. The problem with the previous graph and all the chart junk is that dollars were depicted in areas shown as a picture of a dollar. If you were to take a ruler and measure the length and width of the dollars from 1958 and 1978, the ratio is roughly 1 to 0.44. However, the area taken by the 1978 dollar is nowhere near half of area taken by the 1958 dollar. Once again, as was the case with multiple pie charts, the relative areas are supposed to indicate relative quantities.

Slide 54 (No Audio)

Slide 55

As the headings for these two revised graphs show, relative to the previous slide, the message was that during the 1970s there was an increasing positive balance of trade with China, but it worsened with the trade deficit with Taiwan. Because of the mixed metaphor in the last draft, that was difficult to discern.