Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: "Data Validation" functionality #37

Open
palderman opened this issue Dec 1, 2021 · 7 comments
Open

Feature Request: "Data Validation" functionality #37

palderman opened this issue Dec 1, 2021 · 7 comments

Comments

@palderman
Copy link

Hi @DillonHammill ,

I recently stumbled across your package in my search for an R solution for data entry/data capture. It seems like your package would largely fit the bill.

However, I don't see any capability for restricting the range of numerical input values as part of a "Data Validation" workflow when entering data (see e.g. https://datacarpentry.org/spreadsheet-ecology-lesson/04-quality-control/index.html for what I mean by "Data Validation"). The col_options argument seems mostly to address data validation for known lists of discrete values, but I see no way to restrict column type to integer or min/max values for numeric input.

Did I miss this somewhere in the documentation?

Where would you suggest I begin looking in your code to add this capability? I'm happy to collaborate/contribute code for this capability as it would greatly facilitate my own workflow.

@DillonHammill
Copy link
Owner

Hi @palderman, this feature has been on my to do list for a while now (#14) but I haven't had time to look at it further. I would be more than happy to review a PR to add improved support for validating data entries.

As discussed in #14 I think the col_options argument would be a good place to add support for this. For example we could do something similar to the code below to restrict entries in vs column of mtcars to range [0,1]:

data_edit(
  mtcars,
  col_options = list(vs = c(0,1))
)

At the moment, the above code would trigger the vs column to use a dropdown menu with options 0 and 1. This is because any vector passed to col_options that doesn't contain either logical values or "date" or "password" character strings is treated as the options for data entry - so we resort to using dropdown columns to prevent typing errors. I do think there will be cases where users will want the above code to work as is, i.e. have numeric options in dropdown menus - so perhaps we should try to keep this functionality.

An alternative would be to add more flexibility to col_options by allowing a list of arguments per column. For example, we could do something like this (vs is the name of the column):

# numeric dropdown
data_edit(
  mtcars,
  col_options = list(
    vs = list(
       type = "dropdown",
       options  = c(0,1)
  )
)

# numeric range
data_edit(
  mtcars,
  col_options = list(
    vs = list(
       range = c(0, 1),
       precision = 0
  )
)

In the above example each list could have options type, options, range and precision (and any other useful options).

  • type controls the format for data entry (i.e. either password, date, checkbox or dropdown)
  • options indicates the available options for dropdown menus
  • range allows specification of numeric ranges e.g. c(0,1). We may need to add another option like bounds to indicate whether limits include upper and/or lower boundaries defined in range.
  • precision indicates the number of decimal places for rounding (integers could have precision zero for example)

Once we have a format for the new col_options we need to make changes to the code in dataEdit.R to make sure these options are appropriately handled. I think most of the changes will need to occur in this chunk of code which is triggered whenever the data changes (note that the column and row headers are handled separately in the chunk below).

DataEditR/R/dataEdit.R

Lines 260 to 286 in d2840dc

# DATA EDITS - INCLUDES ROW NAME EDITS
observeEvent(input$x, {
# OLD VALUES
x_old <- values$x
x_new <- hot_to_r(input$x)
# NA ROW NAMES - MATCH DATA_FORMAT()
if (!nzchar(trimws(colnames(x_new)[1]))) {
ind <- which(is.na(x_new[, 1]))
if (length(ind) > 0) {
x_new[ind, 1] <- rev(
seq(
nrow(x_new),
nrow(x_new) - length(ind) + 1,
-1
)
)
}
}
# ORDER ROW INDICES
rownames(x_new) <- 1:nrow(x_new)
# RENDER
values$x <- x_new
# REVERT READONLY COLUMNS
if (!is.null(col_readonly)) {
values$x[, col_readonly] <- x_old[, col_readonly]
}
})

Basically all you would need to do to compare the columns defined in col_options in the new data (x_new) and make sure all the listed criteria are met - otherwise we replace that column with the previous version defined in x_old. This method is not very efficient as you will be checking all entries not just the ones that have changed (remember this code is triggered with every single change - i.e. each time only a single value will be changed).

I think there is a way to check exactly which cell has been edited by accessing slots in input$x but you will need to look at rhandsontable to see how this works.

Hope this helps, happy to answer any questions you may have along the way.

Dillon

@palderman
Copy link
Author

palderman commented Dec 2, 2021

Sorry for missing #14. I should have checked the existing issues more thoroughly.

I was thinking of something along the lines you outlined above for an integer with hard min/max values:

# numeric dropdown
data_edit(
  mtcars,
  col_options = list(
    vs = list(
       type = "integer",
       min  = 0,
       max = 10
  )
)

Errant values in that case would be rejected, but I was also thinking of being able to warn about out-of-range values instead of rejecting them out right. In that case, supposing you have an integer that is usually between 3 and 5, but on rare occasions could possibly have values as low as 0 or high as 10, we could specify it as:

# numeric dropdown
data_edit(
  mtcars,
  col_options = list(
    vs = list(
       type = "integer",
       range  = c(0, 10),
       range_warn = c(3, 5)
  )
)

Values outside range would be rejected and values outside range_warn but inside range could be highlighted in red or yellow. Thoughts?

@DillonHammill
Copy link
Owner

DillonHammill commented Dec 2, 2021

Looks like a good starting point. I have played around with colouring cells and it is complicated.

For DataEditR to work, we need to set useTypes = FALSE in rhandsontable() which means we cannot make calls to hot_col() or hot_cell() to format the cells - this is because useTypes = FALSE is required for the column/row addition/removal functionality.

Perhaps you will find a way to get this to work.

@BaruqueRodrigues
Copy link

Dillon, I have a question.

I'm running DataEditR:: inside shiny as I need to validate multiple databases, however col_options() has a stop(!all(colnames %in% df)) if the columns are not contained in the dataframe. Do you have any suggestions, other than rewriting the function locally, on how to deal with this issue?

@DillonHammill
Copy link
Owner

DillonHammill commented Dec 3, 2021

@BarqueRodrigues can you poibt me to exactly where that line exists in the code? I can take a look. Given that it is an error I probably put it there for a good reason.

@BaruqueRodrigues
Copy link

Dillon in line 47 on DataEditR::dataEditServer
if (!is.null(col_readonly)) { if (!all(col_readonly %in% colnames(data_to_edit))) { stop("'col_readonly' must contain valid column names.") } values$col_names <- unique(c(col_names, col_readonly)) }

@DillonHammill
Copy link
Owner

So you are using col_readonly? In that case it must match the columns in the data. What is your use case? You have some code I can look at?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants