Skip to content

Tidy data

Peter Desmet edited this page Jan 30, 2019 · 4 revisions

The basis for each mapping process is a tidy dataset. This implies that:

  1. Each variable forms a column
  2. Each observation forms a row
  3. Each type of observational unit forms a table

The provided checklist template in this recipe is a good example of a tidy dataset. For more information on tidy datasets, see Hadley Wickham's paper on tidy data. Starting with untidy data will make the mapping script a lot more complex, with many preparatory steps before you can even start mapping. Therefore, a good dosis of "data hygiene" is essential.

Minor data cleaning

In a tidy dataset, you should be able to start the mapping immediately. However, some small preparatory cleaning steps could be required, such as removing empty rows. For this, you can use the function remove_empty() from the janitor package:

input_data %<>% remove_empty("rows")

More cleaning steps could be necessary depending on the specific checklist.

Clone this wiki locally