Essays


Send to Kindle

Publishing interactive data documents

Interactive chart

When presented effectively, an interactive chart helps the audience to get an overview and in-depth look of complex data. In this article, I will show an example of using interactive charts effectively. I also introduce R markdown as one of the most efficient ways of authoring data-rich documents.

Motivating example

Visualizing data lets us see the bird's-eye view of data. Suppose we made many observations of ozone and temperature to learn about air quality airquality . We can plot temperature (F) and ozone level (ppb) on the x and y axes, respectively as in Fig. 1. Just by looking at the plot, we gain a little insight on the relationship between the two measurements.

plot of chunk unnamed-chunk-4

Figure 1: Ozone level and temperature

A more sophisticated visualization can present higher dimensional data on a flat surface: it explains not only the relationship between x and y, but also y and z, z and x, and so on. We can add wind speed measurements (mph) to our plot (Fig. 2).

plot of chunk unnamed-chunk-5

Figure 2: Ozone level, temperature, and wind speed. Lighter blue indicates higher wind speed.

In the plot, I showed the wind temperature in shades of blue. A lighter blue indicates higher wind speed. We can observe that the wind speed tends to be low when the ozone level is high.

After getting an overview of the data, we may want to look at more detailed views of particular parts of the data. What if we want to know when the highest ozone level was observed? Fortunately, we use computer screens to view the graph more often these days. You can hover the mouse cursor over Fig. 3 to reveal the date of the observation.

Figure 3: Interactive chart of ozone level, temperature, and wind speed. Hover the mouse cursor over to reveal the date.

Interactive chart with R markdown

You may already be familiar with the sophisticated interactive visualizations on New York Times articles nyt . It used to require skills such as database query languages (e.g. SQL), web-frameworks (Python-Django, Ruby on Rails, nose.js, and etc), and front-end web application coding (JavaScript, HTML, CSS) to create such interactive visualizations.

Thanks to recent developments such as rCharts, it is now much easier to author and publish data-rich documents in R markdown.

With R markdown, we can:

  • Load and transform data from multiple sources
  • Run statistical analysis
  • Produce figures
  • Write narratives

Within a single text file. Having to switch between different programs to complete each of those tasks can be very disruptive to your thought process. With R markdown, we can focus on the research without being distracted by how each tool works.

R markdown example: Twitter impressions and engagements

Figure 4 is a demonstration of using R markdown to visualize my Twitter activities in 2014. The x-axis represents time. The y-axis represents the number of impressions (i.e. how many people saw each of my tweets.) The size of each bubble represents the number of engagements (i.e. the total number of viewer's activities such as clicking the profile or links, and expanding the tweets.)

Figure 4: Interactive chart: Daigo’s Twitter impressions and engagements in 2014. The size of the bubble corresponds to the number of engagements. Hover the mouse cursor over to reveal the details.

A bubble chart like this is one way to effectively visualize the relationship between three variables. The chart above is also colored to distinguish between tweets in English and Japanese. Presented in this way, it is easy to see the majority of my tweets are in English.

After getting an overview of the impressions, engagements, and frequency of tweets in each language, I may want to find out which tweets actually got very high or low impressions or engagements Hovering the mouse over a bubble will reveal the content of each tweet. Each group can be toggled on and off, and the chart adjusts the zoom automatically.

Once data were prepared, it only took 8 lines of R code to generate the chart with the help of rCharts package and the reusable helper function I wrote:

The entire source of this article is available for viewing.

Conclusions

With the help of rCharts package, it is now much easier to publish data-rich documents. R markdown lets us focus on the data analysis and production of the manuscript without requiring in-depth knowledge of programming languages. Properly presented interactive charts help online publishers communicate data and their interpretations effectively

Notes

Acknowledgements

Thanks coffeeandchocolate for proof-reading.


  1. New York Air Quality Measurements data is available in datasets package in R. For health effects of ozone, see here[^]
  2. For example, see Across U.S. Companies, Tax Rates Vary Greatly[^]

Original post: Feb. 6, 2015 | Last updated: Feb. 9, 2015

Previous: Data Driven Debugging
Next: A simple example of building a Random Forest model
Read more

comments powered by Disqus