Data Visualization | Office of the Provost

The goal of visualizing data graphically is to make patterns in the data stand out clearly, making comparisons easier

Shades of blue and gray provide a palette that’s easy on the eyes while still providing enough contrast to distinguish between the data series. When you look at a graph, the “take-home message” (or messages) should be obvious. Contact Robynn Shannonif you would like assistance with data visualization.

A clustered bar graph makes it possible to compare responses among questions and also to compare responses to a single question. Horizontal bars allow plenty of space for the cluster labels. The clusters are arranged (top to bottom) from the greatest “Strongly agree” response to the smallest. Percentages are often preferable to actual data values because most people can more easily compare percentages in their heads (e.g., it’s easier to think about the difference between 25% and 75% than between 38 and 114). Data labels were left off the bars because they would make the graph look cluttered. Similarly, a stacked bar graph was not used because four sections of a single bar are harder to read than a cluster of bars. Including the sample size helps the viewer interpret the results.

Clustered bar graph displaying responses to an institutional evaluation of academic advising.

For this graph and the next, stacked bars adding up to 100% were used because there are only two or three values for each bar, and stacked bars make for a less cluttered appearance than clustered bars. As with clustered bars, stacked bars allow you to make two comparisons in one graph, in this case comparisons among items being scored by the rubric and scores within an item. In this graph, the information of greatest interest was the number of “1” score in each bar, so the bars are arranged (top to bottom) from the largest value for “1” to the smallest. That section of each bar is also the darkest. Data labels were added mostly to make it easier to compare the size of the middle section of each bar.

Stacked bar chart showing rubric scores for GEF area 1 assessment.

This graph has an additional level of complexity because an additional comparison is being made. As in the graph above, it is possible to compare among rubric items (with abbreviated labels in this graph) and also to compare the scores (“1” vs. “2 or 3”) within an item. Additionally, it is easy to make a comparison between course numbers (101 vs. 102) within a rubric item. Because there are only two sections to each bar, data labels were added only to the section of greatest interest. An alternative way to graph these data would be to arrange the bars in just two clusters, one for 101 and one for 102, and to have one bar for each rubric item (GS, SE, CD, and CP) within each cluster.

Stacked bar chart showing comparison between ENGL 101 and 102 in GEF area 1 assessment.

This graph is an example of a “box” (or “box and whisker”) plot. Box plots contain a lot of information about a set of numbers (values) and are therefore somewhat complicated to read (see labeled illustration). The ends of the lines extending from the box (the “whiskers”) show the minimum and maximum values in the data set, in other words, the ends of the range of values. The line across the inside of the box shows the median, and the “×” shows the mean; the actual value of the mean for each box is labeled. The mean and median can be quite different if the data set includes outliers. The bottom and top lines of the box itself indicate the value for the first and third quartile, respectively (the first quartile is the number that 25% of values fall below; the third quartile is the number that 75% of values fall below). A small box with short “whiskers” indicates a small range of values in the data set.

Box and whisker chart showing reading comprehension progression in a sequence of Spanish courses.

Generic box and whisker chart with parts labeled.