Publishing interactive data documents
When presented effectively, an interactive chart helps the audience to get an overview and in-depth look of complex data. In this article, I will show an example of using interactive charts effectively. I also introduce R markdown as one of the most efficient ways of authoring data-rich documents.
Visualizing data lets us see the bird's-eye view of data. Suppose we made many observations of ozone and temperature to learn about air quality airquality . We can plot temperature (F) and ozone level (ppb) on the x and y axes, respectively as in Fig. 1. Just by looking at the plot, we gain a little insight on the relationship between the two measurements.
A more sophisticated visualization can present higher dimensional data on a flat surface: it explains not only the relationship between x and y, but also y and z, z and x, and so on. We can add wind speed measurements (mph) to our plot (Fig. 2).
In the plot, I showed the wind temperature in shades of blue. A lighter blue indicates higher wind speed. We can observe that the wind speed tends to be low when the ozone level is high.
After getting an overview of the data, we may want to look at more detailed views of particular parts of the data. What if we want to know when the highest ozone level was observed? Fortunately, we use computer screens to view the graph more often these days. You can hover the mouse cursor over Fig. 3 to reveal the date of the observation.
Interactive chart with R markdown
With R markdown, we can:
- Load and transform data from multiple sources
- Run statistical analysis
- Produce figures
- Write narratives
Within a single text file. Having to switch between different programs to complete each of those tasks can be very disruptive to your thought process. With R markdown, we can focus on the research without being distracted by how each tool works.
R markdown example: Twitter impressions and engagements
Figure 4 is a demonstration of using R markdown to visualize my Twitter activities in 2014. The x-axis represents time. The y-axis represents the number of impressions (i.e. how many people saw each of my tweets.) The size of each bubble represents the number of engagements (i.e. the total number of viewer's activities such as clicking the profile or links, and expanding the tweets.)
A bubble chart like this is one way to effectively visualize the relationship between three variables. The chart above is also colored to distinguish between tweets in English and Japanese. Presented in this way, it is easy to see the majority of my tweets are in English.
After getting an overview of the impressions, engagements, and frequency of tweets in each language, I may want to find out which tweets actually got very high or low impressions or engagements Hovering the mouse over a bubble will reveal the content of each tweet. Each group can be toggled on and off, and the chart adjusts the zoom automatically.
Once data were prepared, it only took 8 lines of R code to generate the chart with the help of rCharts package and the reusable helper function I wrote:
The entire source of this article is available for viewing.
With the help of rCharts package, it is now much easier to publish data-rich documents. R markdown lets us focus on the data analysis and production of the manuscript without requiring in-depth knowledge of programming languages. Properly presented interactive charts help online publishers communicate data and their interpretations effectively
- This R markdown document was processed with R version 3.1.2 (2014-10-31) on x86_64-unknown-linux-gnu (64-bit).
- You can see the entire change history of this post.
Thanks coffeeandchocolate for proof-reading.
Original post: Feb. 6, 2015 | Last updated: Feb. 9, 2015