Skip to content

Source data

Damiano Oldoni edited this page Aug 30, 2018 · 8 revisions

The source data is a file with species checklist data you want to use as input for your Darwin Core mapping and thus the basic ingredient of our recipe. Source data could be a file you started from scratch, digitized from a publication or received from someone else.

Where is it located?

If you want to make your source data part of your repository, place it in the data/raw directory. We already added a file checklist.xlsx as an example. Of course, you are welcome to use any other file or format and figure out how to import it in R (we recommend the R package readr for text files).

Good source data

  • Is tidy: each species (distribution) is a row, each attribute of that is a column. Deviation from this structure is possible, but it will complicate further processing.

    scientific_name locality occurrence_status
    species A locality X present
    species A locality Y absent
    species B locality X present
  • Is where you manage the data. That is not always possible (e.g. if you got the file from someone else), but the shorter you keep the flow from where you manage the data to what you use as input for Darwin Core mapping, the better. At least, try to keep the structure of the source data the same between updates.

  • Is not altered for Darwin Core. You will do that in the mapping script. Keep your source data raw.

Template for source data

If you are starting your checklist from scratch, our recipe comes with a source data template (checklist.xlsx) to get you started. We decided to use a Microsoft Excel file, as it is often used to manage datasets, despite its limitations (proprietary, limited import options in R, etc.). The template contains the worksheets checklist for your data, README with instructions and controlled vocabularies to populate dropdowns. The template contains fictional data for 12 species to test the recipe, which you can remove and replace with your own data. You are also free to adapt and change the structure of the template as you see fit, but don't forget to adapt the mapping script as well then.

Clone this wiki locally