Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functionality for missing values in a time series #11

Open
Mullefa opened this issue Nov 18, 2014 · 1 comment
Open

Functionality for missing values in a time series #11

Mullefa opened this issue Nov 18, 2014 · 1 comment

Comments

@Mullefa
Copy link

Mullefa commented Nov 18, 2014

I am finding RcppRoll very convenient to use in conjunction with dplyr with one caveat: if I am doing rolling summaries over a numeric vector which is indexed by date (or a time period), then I may still want this to be used for calculating the rolling window (with value 0). It is analogous to using OLAP functions in SQL with range.

mutate_over() is my most recent attempt at implementing this functionality. @kevinushey @hadley I am wondering whether something like this will sit in RcppRoll or dplyr in the future?

@Mullefa
Copy link
Author

Mullefa commented Feb 1, 2015

@kevinushey @hadley on a related note, the user may want to expand the range so that it includes all dates within a given period.

For example, whilst mutate_over() (or another range based function) gives the user a way of calculating a rolling metric over a range of dates they have in their data frame, they may want the other values for e.g. visualisation.

To mitigate against this issue, I have created another function - regularise() - which allows the user to create a data frame with a 'full' index of dates . However, over large data frames it is noticeably slower than the RcppRoll and dplyr API's - I was wondering whether you think it would be possible to rewrite it using Rcpp (no meaningful experience with this)?

If this sounds a bit cryptic, hopefully the following example will illustrate my point:

devtools::install_github("Mullefa/dtsr")


library(dtsr)
library(ggvis)


roll_mean <- function(x, n) {
  out <- RcppRoll::roll_mean(x , n)
  c(dplyr::cummean(x[seq_len(n - 1)]), out)
}


# If the date doesn't appear in the data frame, say no sales occurred on that day.
sales_data <- data_frame(
    date = seq(as.Date("2014-01-01"), as.Date("2014-12-31"), by = 1),
    sales = sample(1:1000, length(date), replace = TRUE)
  ) %>%
  sample_n(250) %>%
  ts_df(date) %>%
  arrange


# This graph doesn't visualize the average weekly sales for dates on which there were no sales.
sales_data %>%
  mutate_over(avg_weekly_sales = roll_mean(sales, n = 7)) %>%
  ggvis(~date, ~avg_weekly_sales) %>%
  layer_bars


# Using regularise() followed by mutate(), this graph does.
sales_data %>%
  regularise %>%
  mutate(avg_weekly_sales = roll_mean(sales, n = 7)) %>%
  ggvis(~date, ~avg_weekly_sales) %>%
  layer_bars

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant