This post is inspired by an example from the book “The Visual Display of Quantitative Information” by Edward Tufte. This book is a fascinating read for anyone interested in graphs and plots that convey statistically meaningful data. The works of Edward Tufte’s fill an essential void in literature given that almost everyone these days has the need to understand graphical displays of information. Some books are meant to be read while some are meant to held, caressed and respected at the same time. This book falls under the latter category. The author is personally involved with the final layout and the print of his books and this is evident when you lay your hands on the book.
In the first section of the book, the author provides stunning examples of graphical displays spanning many centuries. The graphics showing the march of Napoleon into Russia and the train route timings between Paris and Lyon are alone is worth the price of this book. The author then talks of the importance of (laments about the lack thereof actually) graphical integrity – creating graphical displays where the visual representation of data reflects the actual effect of the data in numbers. He then shows numerous examples of real displays that have sought to “play” around with the visual display to mask the true meaning of the underlying data. One example in particular impressed me lot, primarily because I see such graphs often and it took me quite a while to figure out the guile in the display. I do not want to reproduce the same example from the book as I am not sure of the copyright held by the author for that particular example. I am phrasing it in my own words here.
If I were to tell you that the following bar graph displays the net profit of a company for the past five years(2009-2013), what would be your conclusion regarding the first two years (bars 1 and 2) on the plot?
Based on the display alone, the viewer’s perception would be the following:
- The company made a profit in 2009
- The profit in 2010 was more than the profit in 2009 (by around 15-20%).
The actual data represented by the graph above is this:
||Net profit in INR
Notice that the company actually made a loss in 2009!!! The true representation of the data is this:
How then are the following two graphs identical in terms of the underlying data?
The ingenuity (I call it deceit) lies in the choice of the origin for the x-axis in the plot. Notice that in the truthful one (the green plot), the origin is at zero like it normally is. The visual deceit of the orange plot is achieved by the moving the origin to a large negative value (-400000 in this case). Eliminating the labels for the y-axis (as shown in the first plot in this post) while adding the data labels for each bar completes the deceit. If you use Microsoft Excel, go ahead and try this out. Create a set of data values and create a bar graph based on this data. You will get the green plot by default. Double click anywhere within the labels area on the Y-axis and you will get the following dialog:
Notice the changes I have made that are circled in red. I have modified the default values of these fields to -400,000. I am providing this example to help you understand how graphs can be used to mislead the viewer. I do not endorse any form of trickery however docile they may be.
The author then goes on to elaborate other techniques that are commonly used to violate graphical integrity. That is perhaps the subject for another post.
Do let me know if you encounter such displays. You can upload them on photo sharing sites such as imgur.com and share a link in the comment. Please be cognizant of copyright and privacy issues when you post such content.