Chapter 7 Exploring the data with visualisations
7.1 Overview
To prepare for statistical testing it is essential to visualise your data to see ..
7.2 Empirical research into effective data visualisation
There are no definitive principles for best practices in poster design, but there is a substantial body of literature on data visualisation. This has resulted in some widely repeated basic principles, often motivated by ethical and aesthetic concerns. An example of an ethical concern is the principle that only sequential data can be connected with a line chart, because the line implies a continuous change between the points. Measurement of categories, such as artefact types, locations and methods, should not be linked by lines (for example, there is no continuous sequence between a stone artefact and an animal bone). A related principle is that when absolute magnitudes of the data are important, the vertical axis should begin at zero (Robbins, 2005; Strange, 2007). Displaying data along a vertical axis that does not include zero can misrepresent the data range and exaggerate the relative magnitude between values. ONOe influential set of ideas come from the minimimalist mantras of Tufte. One of his most frequently quoted principles is “maximize the data-ink ratio, within reason”. ‘Data-ink’ refers to the ink used to show data, and so the data-ink ratio is the amount of data-ink relative to the total ink used in a visualisation. These principles ensure that the data are not hidden or distorted by poor choices or irrelevant elements on the visualisation, and that the reader can appreciate the data in the visualisation without distraction and confusion. The practical consequences of this advice include avoiding three-dimensional charts where are two-dimensional chart will suffice (e.g. bar charts and pie charts). Similarly, omitting ‘chartjunk’ - grids, colours, and artistic elements on the charts - helps to improve the data-ink ratio.
While these principles have intuitive appeal and are widely repeated, they can lead to highly minimimalist charts that are difficult to interpret (Tukey 1990). Inbar, Tractinsky and Meyer (2007) and Kulla-Mader (2007) found that standard charts where overwhelmingly preferred over Tufte-style minimalist charts. Kosslyn (1985) and Carswell (1992), have raised the question of how to decide what is data-ink and what is not, concluding that it is frequently highly subjective. Hullman et al., (2011) have suggested that some chartjunk may benefit readers by promoting engagement with the visualisation. For example, Bateman et al. (2010) and Li and Moacdieh (2014) found that subjects who were shown charts with chartjunk had a significantly higher chance of comprehending the message of the chart as compared to non-embellished charts. Kelly (1989) found no difference in immediate recall of information from high and low data-ink charts in a newspaper format. Similarly, McGurgan (2015) found participants reported similar levels of accuracy and mental effort when answering graph comprehension questions using bar graphs and boxplots with varying data-ink ratios. On the other hand, Gillan and Richman (1994; 1992) found empirical support for the principle of data-ink maximization. They found that the percentage of correct interpretations of chart data was significantly lower for the low data-ink charts compared to medium and high data-ink charts. This brief summary of research into the data-ink ratio shows that it is a problematic concept, with only equivocal empirical support. In sum, it seems that principles based on subjective issues of graph aesthetics, often seen Tufte-style graph designs, do not always lead to the most effective visualisations.
These mixed findings suggest that aesthetic minimalism might not always ensure that our data visualisations are easy to interpret accurately. What, then, are the basic principles for optimising the speed at which a reader can percieve the patterns in the data, and the accuracy of the information that a reader can extract from the visualisation? In the context of a poster presentation these optimisations are highly desirable, as the reader of a poster is typically hoping to get information from a poster much quicker than if they were sitting down reading a scholarly article.
Cleveland and McGill (1984) conducted experiments using several common types of data visualisations to test the accuracy with which subjects could read point-values and make comparisons in the data. They found that chart types based on length (such as dot charts and bar charts) were read much more accurately than chart types based on angle, area or volume (such as pie charts and three dimensional charts). Furthermore, they found that people perform substantially worse on stacked bar charts than on aligned bar charts, and that comparisons between adjacent bars are more accurate than between widely separated bars. Numerous subsequent studies have generally supported these findings (Heer and Bostock 2010; Kosara and Ziemkiewicz 2010; Talbot, Setlur, and Anand 2014). Heer and Bostock (2010) repicated the core results for comparing sizes across categories, and also found that the addition of gridlines on a plot improved accuracy. Kosara and Ziemkiewicz (2010) tested square pie, or waffle charts (a square divided into 10 x 10 = 100 cells) along with pie, stacked bar and donut charts, and found that respondants were more confident of their reading of square pie charts, and more accurate in reading the chart values, compared to the other types. Zubiaga et al. (2015) tested respondants with five types of chart (bar charts showing the average value of the distribution, bee swarms, boxplots, stacked bar charts, and histograms) to determine their relative effectiveness in visualising distributions of variables. They find that histograms are the most complete in terms of details given, as well as being the chart type that leads respondants to the most accurate understanding of the underlying data. Rangecroft (2003) and Schonlau and Peters (2012) found that respondants can read 2D pie charts more accurately than 3D pie charts. Zachs et al. (1998) found a similar result for 2D bar charts over 3D bar charts.
Recent experients have cast some light on the traditional rivalry between pie charts and bar charts (Spence 2005). For comparison judgements between categories, bars are more accurately judged than pies (Feldman-Stewart et al. 2000). However, for estimates of the proportion of the whole, pie charts were as accurate as bar charts (Simkin and Hastie 1987; Spence 1990). For pair-wise comparisons, pie and bar charts also perform similarly (Spence and Lewandowsky 1991; Meyer, Shinar, and Leiser 1997). These studies show that under certain conditions, pie charts may be more effective than bar charts. The best choice of chart type depends on the purpose of the chart (Kosslyn and Chabris 1992), and the evidence does not support making a universal perscriptions about chart types.
Although these empirical studies demonstrate a complex relationship between chart type and effectiveness, they can provide a simple, if approximate, rank-order of strategies for visualising data. Dot charts and bar charts are generally at the high-ranking end of the spectrum, along with more exotic styles such as waffle charts and histograms. Lower ranking chart types include stacked bar charts, pie charts. Any kind of 3D chart ranks last.
References
Bateman, Scott, Regan L Mandryk, Carl Gutwin, Aaron Genest, David McDine, and Christopher Brooks. 2010. “Useful Junk?: The Effects of Visual Embellishment on Comprehension and Memorability of Charts.” In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, 2573–82. ACM.
Carswell, C Melody. 1992. “Choosing Specifiers: An Evaluation of the Basic Tasks Model of Graphical Perception.” Human Factors: The Journal of the Human Factors and Ergonomics Society 34 (5). Sage Publications: 535–54.
Cleveland, William S., and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal Article. Journal of the American Statistical Association 79 (387): 531–54. http://links.jstor.org/sici?sici=0162-1459%28198409%2979%3A387%3C531%3AGPTEAA%3E2.0.CO%3B2-Y.
Feldman-Stewart, Deb, Nancy Kocovski, Beth A. McConnell, Michael D. Brundage, and William J. Mackillop. 2000. “Perception of Quantitative Information for Treatment Decisions.” Journal Article. Medical Decision Making 20 (2): 228–38. https://doi.org/10.1177/0272989x0002000208.
Gillan, Douglas J, and Edward H Richman. 1994. “Minimalism and the Syntax of Graphs.” Human Factors: The Journal of the Human Factors and Ergonomics Society 36 (4). SAGE Publications: 619–44.
Gillan, Douglas J., Edward Richman, and Michael Neary. 1992. “Minimalism in Graphics.” In Posters and Short Talks of the 1992 Sigchi Conference on Human Factors in Computing Systems, 75–76. CHI ’92. New York, NY, USA: ACM. https://doi.org/10.1145/1125021.1125090.
Heer, Jeffrey, and Michael Bostock. 2010. “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design.” In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, 203–12. ACM.
Hullman, Jessica, Eytan Adar, and Priti Shah. 2011. “Benefitting Infovis with Visual Difficulties.” IEEE Transactions on Visualization and Computer Graphics 17 (12). IEEE: 2213–22.
Inbar, Ohad, Noam Tractinsky, and Joachim Meyer. 2007. “Minimalism in Information Visualization: Attitudes Towards Maximizing the Data-Ink Ratio.” In Proceedings of the 14th European Conference on Cognitive Ergonomics: Invent! Explore!, 185–88. ECCE ’07. New York, NY, USA: ACM. https://doi.org/10.1145/1362550.1362587.
Kelly, James D. 1989. “The Data-Ink Ratio and Accuracy of Newspaper Graphs.” Journalism and Mass Communication Quarterly 66 (3). Association for Education in Journalism, etc.: 632.
Kosara, Robert, and Caroline Ziemkiewicz. 2010. “Do Mechanical Turks Dream of Square Pie Charts?” In Proceedings of the 3rd Beliv’10 Workshop: BEyond Time and Errors: Novel evaLuation Methods for Information Visualization, 63–70. BELIV ’10. New York, NY, USA: ACM. https://doi.org/10.1145/2110192.2110202.
Kosslyn, Stephen M. 1985. “Graphics and Human Information Processing: A Review of Five Books.” Journal of the American Statistical Association 80 (391). Taylor & Francis: 499–512.
Kosslyn, Stephen M, and Christopher F Chabris. 1992. “Minding Information Graphics.” Folio: The Magazine for Magazine Management 21 (2): 69–71.
Kulla-Mader, Julia. 2007. “Graphs via Ink: Understanding How the Amount of Non-Data-Ink in a Graph Affects Perception and Learning.” Master’s Thesis, Department of Information and Library Science, University of North Carolina.
Li, Huiyang, and Nadine Moacdieh. 2014. “Is ‘Chart Junk’ Useful? An Extended Examination of Visual Embellishment.” Journal Article. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 58 (1): 1516–20. https://doi.org/10.1177/1541931214581316.
McGurgan, Kevin. 2015. “Data-Ink Ratio and Task Complexity in Graph Comprehension.” Master’s Thesis, Department of Psychology, Rochester Institute of Technology.
Meyer, Joachim, David Shinar, and David Leiser. 1997. “Multiple Factors That Determine Performance with Tables and Graphs.” Human Factors: The Journal of the Human Factors and Ergonomics Society 39 (2). SAGE Publications: 268–86.
Rangecroft, Margaret. 2003. “As Easy as Pie.” Journal Article. Behaviour & Information Technology 22 (6): 421–26. https://doi.org/10.1080/01449290310001615437.
Schonlau, Matthias, and Ellen Peters. 2012. “Comprehension of Graphs and Tables Depend on the Task: Empirical Evidence from Two Web-Based Studies.” Statistics, Politics, and Policy 3 (2).
Simkin, David, and Reid Hastie. 1987. “An Information-Processing Analysis of Graph Perception.” Journal Article. Journal of the American Statistical Association 82 (398): 454–65. https://doi.org/10.1080/01621459.1987.10478448.
Spence, Ian. 1990. “Visual Psychophysics of Simple Graphical Elements.” Journal of Experimental Psychology: Human Perception and Performance 16 (4). American Psychological Association: 683.
Spence, Ian. 2005. “No Humble Pie: The Origins and Usage of a Statistical Chart.” Journal of Educational and Behavioral Statistics 30 (4). Sage Publications: 353–68.
Spence, Ian, and Stephan Lewandowsky. 1991. “Displaying Proportions and Percentages.” Applied Cognitive Psychology 5 (1). Wiley Online Library: 61–77.
Talbot, Justin, Vidya Setlur, and Anushka Anand. 2014. “Four Experiments on the Perception of Bar Charts.” IEEE Transactions on Visualization and Computer Graphics 20 (12). IEEE: 2152–60.
Tukey, John W. 1990. “Data-Based Graphics: Visual Display in the Decades to Come.” Journal Article. Statistical Science 5 (3): 327–39. http://www.jstor.org/stable/2245820.
Zacks, Jeff, Ellen Levy, Barbara Tversky, and Diane J Schiano. 1998. “Reading Bar Graphs: Effects of Extraneous Depth Cues and Graphical Context.” Journal of Experimental Psychology: Applied 4 (2). American Psychological Association: 119.
Zubiaga, Arkaitz, and Brian MacNamee. 2015. “Knowing What You Dont Know: Choosing the Right Chart to Show Data Distributions to Non-Expert Users.” In Web Science 2015 Conference, Oxford, United Kingdom, 28 June-1 July 2015. ACM.