Feature Request: "Data Validation" functionality #37

palderman · 2021-12-01T17:09:20Z

I recently stumbled across your package in my search for an R solution for data entry/data capture. It seems like your package would largely fit the bill.

However, I don't see any capability for restricting the range of numerical input values as part of a "Data Validation" workflow when entering data (see e.g. https://datacarpentry.org/spreadsheet-ecology-lesson/04-quality-control/index.html for what I mean by "Data Validation"). The col_options argument seems mostly to address data validation for known lists of discrete values, but I see no way to restrict column type to integer or min/max values for numeric input.

Did I miss this somewhere in the documentation?

Where would you suggest I begin looking in your code to add this capability? I'm happy to collaborate/contribute code for this capability as it would greatly facilitate my own workflow.

The text was updated successfully, but these errors were encountered:

DillonHammill · 2021-12-02T02:54:43Z

Hi @palderman, this feature has been on my to do list for a while now (#14) but I haven't had time to look at it further. I would be more than happy to review a PR to add improved support for validating data entries.

As discussed in #14 I think the col_options argument would be a good place to add support for this. For example we could do something similar to the code below to restrict entries in vs column of mtcars to range [0,1]:

data_edit(
  mtcars,
  col_options = list(vs = c(0,1))
)

At the moment, the above code would trigger the vs column to use a dropdown menu with options 0 and 1. This is because any vector passed to col_options that doesn't contain either logical values or "date" or "password" character strings is treated as the options for data entry - so we resort to using dropdown columns to prevent typing errors. I do think there will be cases where users will want the above code to work as is, i.e. have numeric options in dropdown menus - so perhaps we should try to keep this functionality.

An alternative would be to add more flexibility to col_options by allowing a list of arguments per column. For example, we could do something like this (vs is the name of the column):

# numeric dropdown
data_edit(
  mtcars,
  col_options = list(
    vs = list(
       type = "dropdown",
       options  = c(0,1)
  )
)

# numeric range
data_edit(
  mtcars,
  col_options = list(
    vs = list(
       range = c(0, 1),
       precision = 0
  )
)

In the above example each list could have options type, options, range and precision (and any other useful options).

type controls the format for data entry (i.e. either password, date, checkbox or dropdown)
options indicates the available options for dropdown menus
range allows specification of numeric ranges e.g. c(0,1). We may need to add another option like bounds to indicate whether limits include upper and/or lower boundaries defined in range.
precision indicates the number of decimal places for rounding (integers could have precision zero for example)

Once we have a format for the new col_options we need to make changes to the code in dataEdit.R to make sure these options are appropriately handled. I think most of the changes will need to occur in this chunk of code which is triggered whenever the data changes (note that the column and row headers are handled separately in the chunk below).

DataEditR/R/dataEdit.R

Lines 260 to 286 in d2840dc

    
           # DATA EDITS - INCLUDES ROW NAME EDITS 
        
           observeEvent(input$x, { 
        
             # OLD VALUES 
        
             x_old <- values$x 
        
             x_new <- hot_to_r(input$x) 
        
             # NA ROW NAMES - MATCH DATA_FORMAT() 
        
             if (!nzchar(trimws(colnames(x_new)[1]))) { 
        
               ind <- which(is.na(x_new[, 1])) 
        
               if (length(ind) > 0) { 
        
                 x_new[ind, 1] <- rev( 
        
                   seq( 
        
                     nrow(x_new), 
        
                     nrow(x_new) - length(ind) + 1, 
        
                     -1 
        
                   ) 
        
                 ) 
        
               } 
        
             } 
        
             # ORDER ROW INDICES 
        
             rownames(x_new) <- 1:nrow(x_new) 
        
             # RENDER 
        
             values$x <- x_new 
        
             # REVERT READONLY COLUMNS 
        
             if (!is.null(col_readonly)) { 
        
               values$x[, col_readonly] <- x_old[, col_readonly] 
        
             } 
        
           })

Basically all you would need to do to compare the columns defined in col_options in the new data (x_new) and make sure all the listed criteria are met - otherwise we replace that column with the previous version defined in x_old. This method is not very efficient as you will be checking all entries not just the ones that have changed (remember this code is triggered with every single change - i.e. each time only a single value will be changed).

I think there is a way to check exactly which cell has been edited by accessing slots in input$x but you will need to look at rhandsontable to see how this works.

Hope this helps, happy to answer any questions you may have along the way.

Dillon

palderman · 2021-12-02T15:00:41Z

Sorry for missing #14. I should have checked the existing issues more thoroughly.

I was thinking of something along the lines you outlined above for an integer with hard min/max values:

# numeric dropdown
data_edit(
  mtcars,
  col_options = list(
    vs = list(
       type = "integer",
       min  = 0,
       max = 10
  )
)

Errant values in that case would be rejected, but I was also thinking of being able to warn about out-of-range values instead of rejecting them out right. In that case, supposing you have an integer that is usually between 3 and 5, but on rare occasions could possibly have values as low as 0 or high as 10, we could specify it as:

# numeric dropdown
data_edit(
  mtcars,
  col_options = list(
    vs = list(
       type = "integer",
       range  = c(0, 10),
       range_warn = c(3, 5)
  )
)

Values outside range would be rejected and values outside range_warn but inside range could be highlighted in red or yellow. Thoughts?

DillonHammill · 2021-12-02T19:35:21Z

Looks like a good starting point. I have played around with colouring cells and it is complicated.

For DataEditR to work, we need to set useTypes = FALSE in rhandsontable() which means we cannot make calls to hot_col() or hot_cell() to format the cells - this is because useTypes = FALSE is required for the column/row addition/removal functionality.

Perhaps you will find a way to get this to work.

BaruqueRodrigues · 2021-12-03T17:46:50Z

Dillon, I have a question.

I'm running DataEditR:: inside shiny as I need to validate multiple databases, however col_options() has a stop(!all(colnames %in% df)) if the columns are not contained in the dataframe. Do you have any suggestions, other than rewriting the function locally, on how to deal with this issue?

DillonHammill · 2021-12-03T21:27:28Z

@BarqueRodrigues can you poibt me to exactly where that line exists in the code? I can take a look. Given that it is an error I probably put it there for a good reason.

BaruqueRodrigues · 2021-12-03T21:37:48Z

Dillon in line 47 on DataEditR::dataEditServer
if (!is.null(col_readonly)) { if (!all(col_readonly %in% colnames(data_to_edit))) { stop("'col_readonly' must contain valid column names.") } values$col_names <- unique(c(col_names, col_readonly)) }

DillonHammill · 2021-12-03T21:48:05Z

So you are using col_readonly? In that case it must match the columns in the data. What is your use case? You have some code I can look at?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: "Data Validation" functionality #37

Feature Request: "Data Validation" functionality #37

palderman commented Dec 1, 2021

DillonHammill commented Dec 2, 2021

palderman commented Dec 2, 2021 •

edited

Loading

DillonHammill commented Dec 2, 2021 •

edited

Loading

BaruqueRodrigues commented Dec 3, 2021

DillonHammill commented Dec 3, 2021 •

edited

Loading

BaruqueRodrigues commented Dec 3, 2021

DillonHammill commented Dec 3, 2021

Feature Request: "Data Validation" functionality #37

Feature Request: "Data Validation" functionality #37

Comments

palderman commented Dec 1, 2021

DillonHammill commented Dec 2, 2021

palderman commented Dec 2, 2021 • edited Loading

DillonHammill commented Dec 2, 2021 • edited Loading

BaruqueRodrigues commented Dec 3, 2021

DillonHammill commented Dec 3, 2021 • edited Loading

BaruqueRodrigues commented Dec 3, 2021

DillonHammill commented Dec 3, 2021

palderman commented Dec 2, 2021 •

edited

Loading

DillonHammill commented Dec 2, 2021 •

edited

Loading

DillonHammill commented Dec 3, 2021 •

edited

Loading