Interactive History Browser

Lukasz A. Bartnik

2018-01-29

Introduction

The goal of this tutorial is to introduce the interactive history browser implemented in the experiment package. It follows one of the examples accessible via experiment::simulate_london_meters and is based on the London meter data.

History browser keeps track of all expressions evaluated in R session. It remembers all objects and plots, and allows the user to move back and forth in that recorded history.

In this short introduction, we will perform a simplified data exploration exercise, similar to what a “real” data exploration might look like. In order to keep the big picture clean, we avoid poking around too much.

We start by loading a number of packages we will need for our analysis. History tracker does not write down commands that do not produce new objects or plots, so it ignores this next block of code.

library(dplyr)
library(lubridate)
library(magrittr)
library(ggplot2)

Turn tracing on

Now it is time to load the experiment package and turn on its tracing capability. experiment will register a callback using addTaskCallback and using that callback it will keep record of changes in the global environment of our R session1.

library(experiment)
tracking_on()
#> Warning: creating a store named "project-store" under
#> "/home/user/my-data-project"

Calling tracking_on() in a live R session will change the R prompt to [tracked] >. In this vignette, in order to make it easier to copy the R code, the promp remains hidden.

Another important thing to notice is the warning “creating a store named…” which informs the user that all objects created in the current session will be stored in a newly created object store2. Thus, it is perfectly possible to peform the exercise described in this vignette over a period of multiple days, while closing and reopening R session to pick up the work where it was previously left off.

Preparing the data set

Here is the first command that produces a (new) data object. It reads, transforms and filters a CSV file distributed with the experiment package.

input <-
  system.file("extdata/block_62.csv", package = "experiment") %>%
  readr::read_csv(na = 'Null') %>%
  rename(meter = LCLid, timestamp = tstp, usage = `energy_kWh`) %>%
  filter(meter %in% c("MAC004929", "MAC000010", "MAC004391"),
         year(timestamp) == 2013)

Let’s look at the data. It turns out that the observations are recorded every 30 minutes.

head(input)
#> # A tibble: 6 x 3
#>       meter           timestamp usage
#>       <chr>              <dttm> <dbl>
#> 1 MAC000010 2013-01-01 00:00:00 0.509
#> 2 MAC000010 2013-01-01 00:30:00 0.453
#> 3 MAC000010 2013-01-01 01:00:00 0.500
#> 4 MAC000010 2013-01-01 01:30:00 0.621
#> 5 MAC000010 2013-01-01 02:00:00 0.197
#> 6 MAC000010 2013-01-01 02:30:00 0.176

Let’s agregate them and continue with hourly readings.

input %<>%
  mutate(timestamp = floor_date(timestamp, 'hours')) %>%
  group_by(meter, timestamp) %>%
  summarise(usage = sum(usage))

The first meter

We have three meters in the data set, MAC000010, MAC004391, MAC004929. We will look at them one by one, starting with this one.

input %<>% filter(meter == "MAC004929")

Just a glimpse on the full data set, before we look aggregations.

with(input, plot(timestamp, usage, type = 'p', pch = '.'))

All right! That doesn’t reveal much, how about breaking the data set down by hour and day of week? Any patterns here? We start with aggregating the input set into a temporary variable x.

x <-
  input %>%
  mutate(hour = hour(timestamp),
         dow  = wday(timestamp, label = TRUE)) %>%
  mutate_at(vars(hour, dow), funs(as.factor)) %>%
  group_by(hour, dow) %>%
  summarise(usage = mean(usage, na.rm = TRUE))

And now we can take a look at the by-hour plot:

with(x, plot(hour, usage))

And the hour-by-day-of-the-week breakdown:

ggplot(x) + geom_point(aes(x = hour, y = usage)) + facet_wrap(~dow)

So these are mean values. How about the distribution arund the mean? We can visualize that with a boxplot. Start with overwriting the x variable and then produce a new plot.

x <-
  input %>%
  mutate(hour = hour(timestamp),
         dow  = wday(timestamp)) %>%
  mutate_at(vars(hour, dow), funs(as.factor))
ggplot(x) + geom_boxplot(aes(x = hour, y = usage)) + facet_wrap(~dow)

OK! Let’s look at a linear model for this data.

m <- lm(usage ~ hour:dow, x)
summary(m)
#> 
#> Call:
#> lm(formula = usage ~ hour:dow, data = x)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.04183 -0.19047 -0.03992  0.08349  3.09831 
#> 
#> Coefficients: (1 not defined because of singularities)
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  0.761096   0.050023  15.215  < 2e-16 ***
#> hour0:dow1  -0.124288   0.070744  -1.757 0.078973 .  
#> hour1:dow1  -0.270596   0.070744  -3.825 0.000132 ***
#> hour2:dow1  -0.478827   0.070744  -6.768 1.39e-11 ***
...
#> hour22:dow7 -0.007462   0.070744  -0.105 0.916003    
#> hour23:dow7        NA         NA      NA       NA    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.3607 on 8592 degrees of freedom
#> Multiple R-squared:  0.3471, Adjusted R-squared:  0.3344 
#> F-statistic: 27.35 on 167 and 8592 DF,  p-value: < 2.2e-16

At this point we might decide we know enough. (We probably don’t yet, but for the sake of the presentation, we let’s assume we actually do. After all this is an introduction to the history browser, not to time series analysis.)

History recorded so far

So what does the history look like so far? We can open an interactive viewer by calling experiment:::browserAddin(). It is a htmlwidget so when you do it in actual R session in RStudio, it will open in an interactive window, overlying the main RStudio window34. In RStudio you will also have extra buttons and interactions, more about this in the next section.

experiment::browserAddin()