-
Notifications
You must be signed in to change notification settings - Fork 12
Getting started
So you want to use the Checklist recipe to kickstart your own checklist Darwin Core mapping? Awesome! This page has all the information to get you started. To learn more about the concepts of the recipe, browse the other sections of the wiki.
The basic idea behind the Checklist recipe is:
source data → Darwin Core mapping script → generated Darwin Core files
By changing the source data and/or the mapping script, you can alter the generated Darwin Core files. The main advantage is repeatability: once you have done the mapping, you don't have to start from scratch if your source data has been updated. You can just run the mapping script again (with a little tweak here and there) and upload the generated files to a GBIF Integrated Publishing Toolkit for publication. And by having a mapping script, your mapping is also documented.
To know which files to adapt, you need to understand the structure of the recipe.
The structure of the recipe is based on Cookiecutter Data Science. Files and directories indicated with GENERATED
should not be edited manually.
├── README.md : Description of this repository
├── LICENSE : Repository license
├── checklist-recipe.Rproj : RStudio project file
├── .gitignore : Files and directories to be ignored by git
│
├── data
│ ├── raw : Source data, input for mapping script
│ └── processed : Darwin Core output of mapping script GENERATED
│
├── docs : Repository website GENERATED
│
└── src
├── dwc_mapping.Rmd : Darwin Core mapping script, core functionality of this repository
├── _site.yml : Settings to build website in /docs
└── index.Rmd : Template for website homepage
The recipe has a functional workflow out of the box:
data/raw/checklist.xlsx
→src/dwc_mapping.Rmd
→data/processed/taxon.csv
&data/processed/distribution.csv
-
checklist.xlsx
contains some dummy source data to show the functionality of the workflow. It will only be useful if you update it with your own data or replace it with another file (Excel or other). Note that updating or replacing the source data (file) will have consequences for thedwc_mapping.Rmd
script: both are closely interlinked. See source data. -
dwc_mapping.Rmd
contains functional mapping code, but the output filesdata/processed/taxon.csv
&data/processed/distribution.csv
are mostly nonsense. Only by adapting the mapping script you will get proper Darwin Core files. See R Markdown.
The files README.md
, LICENSE
and .gitignore
are used for versioning and to have a proper repository on GitHub. Open the Markdown file README.md
in RStudio or another text editor to see the hidden instructions on how to adapt it for your checklist.
If you are not planning to make your checklist mapping available on GitHub (which would be a pity), you can delete these files.
The files src/_site.yml
, src/index.Rmd
and the directory docs
are used to transform your README.md
and mapping script into a RMarkdown website that can be hosted on GitHub, like this one. This is a bit more advanced, but you can get started with how to generate the website and how to host it on GitHub.
If you are not planning to create a website, you can delete these files.
Convinced? Here's how to setup the recipe on your computer.
- GitHub account: create one if don't have one yet
- GitHub Desktop: download and install it on your computer
- RStudio: download and install it on your computer
If you are familiar with git, you can also use it directly in RStudio, rather than installing GitHub Desktop.
- Go to https://github.com/trias-project/checklist-recipe.
- In the top right, click
Use this template
. This will create a copy of thechecklist-recipe
repository under your GitHub account.
- In your newly created repository at
https://github.com/your_username/checklist-recipe
, click the greenClone or download
button and selectOpen in desktop
. This will open GitHub Desktop and download the repository files to your computer. - In GitHub Desktop select
Repository
>Show in Explorer
(PC) /Show in Finder
(Mac) from the top menu. - In your file browser, open the
checklist-recipe
directory and doubleclickchecklist-recipe.Rproj
to open the checklist recipe in RStudio.
-
You will use a number of R packages, which you will need to install first. Copy/paste the following in your R Studio Console:
install.packages(c("tidyverse", "tidylog", "magrittr", "here", "janitor", "readxl", "digest", "rgbif"))
Note: if you get a
Updating Loaded Packages
warning, clickNo
to not restart the R session. -
Then in
Files
pane, go tosrc
directory and opendwc_mapping.Rmd
-
In the header menu of the open file, click
Run > Run All
to run the mapping code.
If the script was able to run without problems, you are all set! 🎉 Nothing has changed in the output data though (you just overwrote it with the same data), because neither the script or source data were adapted. To understand the basics and the different sections of the mapping script, take a look at the other sections of this wiki.
Happy cooking!
- Home
- Getting started
- Basics
- Ingredients: Source data
- Instructions: R Markdown
- Utensils: Tidyverse functions
- Dinner: Darwin Core data
- Mapping script
- Data preparation
- Mapping
- GitHub
- Publishing data
- Examples