Introduction

The goal of this tutorial is to introduce the interactive history browser implemented in the experiment package. It follows one of the examples accessible via experiment::simulate_london_meters and is based on the London meter data.

History browser keeps track of all expressions evaluated in R session. It remembers all objects and plots, and allows the user to move back and forth in that recorded history.

In this short introduction, we will perform a simplified data exploration exercise, similar to what a “real” data exploration might look like. In order to keep the big picture clean, we avoid poking around too much.

We start by loading a number of packages we will need for our analysis. History tracker does not write down commands that do not produce new objects or plots, so it ignores this next block of code.

library(dplyr)
library(lubridate)
library(magrittr)
library(ggplot2)

Turn tracing on

Now it is time to load the experiment package and turn on its tracing capability. experiment will register a callback using addTaskCallback and using that callback it will keep record of changes in the global environment of our R session¹.

library(experiment)
tracking_on()
#> Warning: creating a store named "project-store" under
#> "/home/user/my-data-project"

Calling tracking_on() in a live R session will change the R prompt to [tracked] >. In this vignette, in order to make it easier to copy the R code, the promp remains hidden.

Another important thing to notice is the warning “creating a store named…” which informs the user that all objects created in the current session will be stored in a newly created object store². Thus, it is perfectly possible to peform the exercise described in this vignette over a period of multiple days, while closing and reopening R session to pick up the work where it was previously left off.

Preparing the data set

Here is the first command that produces a (new) data object. It reads, transforms and filters a CSV file distributed with the experiment package.

input <-
  system.file("extdata/block_62.csv", package = "experiment") %>%
  readr::read_csv(na = 'Null') %>%
  rename(meter = LCLid, timestamp = tstp, usage = `energy_kWh`) %>%
  filter(meter %in% c("MAC004929", "MAC000010", "MAC004391"),
         year(timestamp) == 2013)

Let’s look at the data. It turns out that the observations are recorded every 30 minutes.

head(input)
#> # A tibble: 6 x 3
#>       meter           timestamp usage
#>       <chr>              <dttm> <dbl>
#> 1 MAC000010 2013-01-01 00:00:00 0.509
#> 2 MAC000010 2013-01-01 00:30:00 0.453
#> 3 MAC000010 2013-01-01 01:00:00 0.500
#> 4 MAC000010 2013-01-01 01:30:00 0.621
#> 5 MAC000010 2013-01-01 02:00:00 0.197
#> 6 MAC000010 2013-01-01 02:30:00 0.176

Let’s agregate them and continue with hourly readings.

input %<>%
  mutate(timestamp = floor_date(timestamp, 'hours')) %>%
  group_by(meter, timestamp) %>%
  summarise(usage = sum(usage))

The first meter

We have three meters in the data set, MAC000010, MAC004391, MAC004929. We will look at them one by one, starting with this one.

input %<>% filter(meter == "MAC004929")

Just a glimpse on the full data set, before we look aggregations.

with(input, plot(timestamp, usage, type = 'p', pch = '.'))

All right! That doesn’t reveal much, how about breaking the data set down by hour and day of week? Any patterns here? We start with aggregating the input set into a temporary variable x.

x <-
  input %>%
  mutate(hour = hour(timestamp),
         dow  = wday(timestamp, label = TRUE)) %>%
  mutate_at(vars(hour, dow), funs(as.factor)) %>%
  group_by(hour, dow) %>%
  summarise(usage = mean(usage, na.rm = TRUE))

And now we can take a look at the by-hour plot:

with(x, plot(hour, usage))

And the hour-by-day-of-the-week breakdown:

ggplot(x) + geom_point(aes(x = hour, y = usage)) + facet_wrap(~dow)

So these are mean values. How about the distribution arund the mean? We can visualize that with a boxplot. Start with overwriting the x variable and then produce a new plot.

x <-
  input %>%
  mutate(hour = hour(timestamp),
         dow  = wday(timestamp)) %>%
  mutate_at(vars(hour, dow), funs(as.factor))

ggplot(x) + geom_boxplot(aes(x = hour, y = usage)) + facet_wrap(~dow)

OK! Let’s look at a linear model for this data.

m <- lm(usage ~ hour:dow, x)

summary(m)
#> 
#> Call:
#> lm(formula = usage ~ hour:dow, data = x)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.04183 -0.19047 -0.03992  0.08349  3.09831 
#> 
#> Coefficients: (1 not defined because of singularities)
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  0.761096   0.050023  15.215  < 2e-16 ***
#> hour0:dow1  -0.124288   0.070744  -1.757 0.078973 .  
#> hour1:dow1  -0.270596   0.070744  -3.825 0.000132 ***
#> hour2:dow1  -0.478827   0.070744  -6.768 1.39e-11 ***
...
#> hour22:dow7 -0.007462   0.070744  -0.105 0.916003    
#> hour23:dow7        NA         NA      NA       NA    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.3607 on 8592 degrees of freedom
#> Multiple R-squared:  0.3471, Adjusted R-squared:  0.3344 
#> F-statistic: 27.35 on 167 and 8592 DF,  p-value: < 2.2e-16

At this point we might decide we know enough. (We probably don’t yet, but for the sake of the presentation, we let’s assume we actually do. After all this is an introduction to the history browser, not to time series analysis.)

History recorded so far

So what does the history look like so far? We can open an interactive viewer by calling experiment:::browserAddin(). It is a htmlwidget so when you do it in actual R session in RStudio, it will open in an interactive window, overlying the main RStudio window³ ⁴. In RStudio you will also have extra buttons and interactions, more about this in the next section.

experiment::browserAddin()

Each node represents either an object introduced at some point in time to R session, or a plot. Objects have their names displayed inside the node, plots are shown as thumbnails.

You can hover your mouse cursor over each node in the history and see the expression that produced the given object along with its general characteristics, like dimensions of a data.frame or the AIC value for a linear model.

You can also zoom in and out - when zooming out far enough, the view switches from showing all individual nodes to showing groups of nodes. The nodes are group according to their creating time. Hovering mouse over a group reveals the names of its nodes.

Second meter

Let’s go back to the last step before narrowing down to just one meter. Clicking on the second input node in the history window selects that node (notice the green highlight on the border of the node). In RStudio, at this point you need to click on the Done button, but since this is a static HTML vignette, that button is not available. The Done button together with the whole window title bar looks as below:

Thus, highlighting the node and clicking on the Done button brings back the state of R session when that object was created - which we will assume happens at this point of our vignette. We restore state of the R session from the time when the second input node was created.

Now we can try a different house.

input %<>% filter(meter == "MAC000010")

We aggregate the data with the same query as before and look at the boxplot. Anything interesting here?

x <-
  input %>%
  mutate(hour = hour(timestamp),
         dow  = wday(timestamp)) %>%
  mutate_at(vars(hour, dow), funs(as.factor))

ggplot(x) + geom_boxplot(aes(x = hour, y = usage)) + facet_wrap(~dow)

History, again

The history looks different now, as there is a second branch reflecting the last three commands we have just issued.

experiment::browserAddin()

Third house

OK, so how about the third house in the data set? We restore the same point in time again, and repeat the same sequence of commands.

input %<>% filter(meter == "MAC004391")

x <-
  input %>%
  mutate(hour = hour(timestamp),
         dow  = wday(timestamp)) %>%
  mutate_at(vars(hour, dow), funs(as.factor))

ggplot(x) + geom_boxplot(aes(x = hour, y = usage)) + facet_wrap(~dow)

As we can see, the history gets updated again to reflect the third branching on the third house in the data set.

experiment::browserAddin()

Searching the history

Our last step will be reducing the size of the history graph presented in the widget. We do it with the query_by() function. Let’s start with finding all variables named input.

h <- query_by(is_named('input'))

Looking at the history graph reveals that it is now much smaller.

plot(h)

How about finding only data frames?

h <- query_by(inherits('data.frame'))

plot(h)

And finally we ask to see only the plots.

h <- query_by(inherits('plot'))

plot(h)

Re-building the vignette

In case you have problems when rebuilding this vignette, here is what my current R session is like:

library(devtools)
devtools::session_info()
#>  setting  value                       
#>  version  R version 3.4.3 (2017-11-30)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_US                       
#>  collate  en_US.UTF-8                 
#>  tz       America/Los_Angeles         
#>  date     2018-01-29                  
#> 
#>  package     * version    date       source        
#>  assertthat    0.2.0      2017-04-11 CRAN (R 3.4.0)
#>  backports     1.1.1      2017-09-25 CRAN (R 3.4.2)
#>  base        * 3.4.3      2017-12-01 local         
#>  bindr         0.1        2016-11-13 CRAN (R 3.4.2)
#>  bindrcpp    * 0.2        2017-06-17 CRAN (R 3.4.2)
#>  broom         0.4.2      2017-02-13 CRAN (R 3.4.2)
#>  clisymbols    1.2.0      2017-05-21 CRAN (R 3.4.2)
#>  colorspace    1.3-2      2016-12-14 CRAN (R 3.4.1)
#>  compiler      3.4.3      2017-12-01 local         
#>  crayon        1.3.4      2017-09-16 CRAN (R 3.4.2)
#>  datasets    * 3.4.3      2017-12-01 local         
#>  defer         0.3.0      2017-12-26 local         
#>  devtools    * 1.13.4     2017-11-09 CRAN (R 3.4.2)
#>  digest        0.6.12     2017-01-27 CRAN (R 3.4.0)
#>  dplyr       * 0.7.4      2017-09-28 CRAN (R 3.4.2)
#>  evaluate    * 0.10.1     2017-06-24 CRAN (R 3.4.2)
#>  experiment  * 0.1        2018-01-29 local         
#>  foreign       0.8-69     2017-06-21 CRAN (R 3.4.2)
#>  ggplot2     * 2.2.1      2016-12-30 CRAN (R 3.4.1)
#>  glue          1.1.1      2017-06-21 CRAN (R 3.4.2)
#>  graphics    * 3.4.3      2017-12-01 local         
#>  grDevices   * 3.4.3      2017-12-01 local         
#>  grid          3.4.3      2017-12-01 local         
#>  gtable        0.2.0      2016-02-26 CRAN (R 3.4.1)
#>  hms           0.3        2016-11-22 CRAN (R 3.4.0)
#>  htmltools     0.3.6      2017-04-28 CRAN (R 3.4.0)
#>  htmlwidgets   0.9        2017-07-10 cran (@0.9)   
#>  jsonlite      1.5        2017-06-01 CRAN (R 3.4.2)
#>  knitr       * 1.17       2017-08-10 CRAN (R 3.4.2)
#>  labeling      0.3        2014-08-23 CRAN (R 3.4.1)
#>  lattice       0.20-35    2017-03-25 CRAN (R 3.4.2)
#>  lazyeval      0.2.0      2016-06-12 CRAN (R 3.4.0)
#>  lubridate   * 1.6.0      2016-09-13 CRAN (R 3.4.0)
#>  magrittr    * 1.5        2014-11-22 CRAN (R 3.4.0)
#>  memoise       1.1.0      2017-04-21 CRAN (R 3.4.0)
#>  methods     * 3.4.3      2017-12-01 local         
#>  mnormt        1.5-5      2016-10-15 CRAN (R 3.4.2)
#>  munsell       0.4.3      2016-02-13 CRAN (R 3.4.1)
#>  nlme          3.1-131    2017-02-06 CRAN (R 3.4.2)
#>  parallel      3.4.3      2017-12-01 local         
#>  pkgconfig     2.0.1      2017-03-21 CRAN (R 3.4.2)
#>  plyr          1.8.4      2016-06-08 CRAN (R 3.4.0)
#>  psych         1.7.8      2017-09-09 CRAN (R 3.4.2)
#>  purrr         0.2.4      2017-10-18 CRAN (R 3.4.2)
#>  R6            2.2.2      2017-06-17 CRAN (R 3.4.2)
#>  Rcpp          0.12.13    2017-09-28 CRAN (R 3.4.2)
#>  readr         1.1.1      2017-05-16 CRAN (R 3.4.0)
#>  reshape2      1.4.2      2016-10-22 CRAN (R 3.4.1)
#>  rlang         0.1.4      2017-11-05 cran (@0.1.4) 
#>  rmarkdown     1.6        2017-06-15 CRAN (R 3.4.2)
#>  rprojroot     1.2        2017-01-16 CRAN (R 3.4.0)
#>  rsvg          1.1        2017-03-21 CRAN (R 3.4.3)
#>  scales        0.5.0      2017-08-24 CRAN (R 3.4.1)
#>  stats       * 3.4.3      2017-12-01 local         
#>  storage       0.1.0      2018-01-22 local         
#>  stringi       1.1.5      2017-04-07 CRAN (R 3.4.0)
#>  stringr     * 1.2.0      2017-02-18 CRAN (R 3.4.0)
#>  testthat    * 1.0.2.9000 2017-10-22 local         
#>  tibble        1.3.4      2017-08-22 CRAN (R 3.4.2)
#>  tidyr         0.7.2      2017-10-16 CRAN (R 3.4.2)
#>  tools         3.4.3      2017-12-01 local         
#>  utils       * 3.4.3      2017-12-01 local         
#>  withr         2.0.0      2017-07-28 CRAN (R 3.4.2)
#>  yaml          2.1.14     2016-11-12 CRAN (R 3.4.0)

Of course, that’s not how things work inside knitr. Thus, this vignette contains code as it is supposed to look, and you should be able to simply copy & paste it into R session. The actual source code of the vignette will reveal much more than that.↩
Object store is a persistent, filesystem-based repository of R artifacts (data sets, functions, plots, etc.) produced while working with R.↩
The mechanics of that are not part of this vignette, however, in case this turns out to be helpful: in RStudio it is a gadget, created with shiny::runGadget, and displayed in a shiny::dialogViewer.↩
Also, in RStudio, you can map this function under a key shortcut. experiment contains all the necessary configuration files. See RStudio Addins for more details.↩

Interactive History Browser

Lukasz A. Bartnik

2018-01-29