Skip to content

Controlled Vocabularies

Paula Zermoglio edited this page Feb 3, 2018 · 19 revisions

Table of Contents

  1. List of Darwin Core terms that recommend the use of controlled vocabularies
  2. Available Controlled Vocabularies
  3. Consequences of not using Controlled Vocabularies

List of Darwin Core terms that recommend the use of controlled vocabularies

Following is the list of Darwin Core terms that have recommendations to follow a controlled vocabulary.

Please note that some of the terms are associated with / linked to recommended resources (for example, ISO codes). For other terms, there are no exhaustive, comprehensive controlled vocabularies available. However, several efforts have been done to build controlled vocabularies. We, the community, have started the process of gathering those scattered sources (see below).

Darwin Core terms that are recommended to use controlled vocabularies:

Class Term Recommended
Record-level dcterms:type DCMI Type Vocabulary
dcterms:language RFC 4646
basisOfRecord Darwin Core classes
Occurrence sex
lifeStage
reproductiveCondition
behavior
establishmentMeans
occurrenceStatus
disposition
Organism organismScope
Event sampleSizeUnit Ontology of Units of Measure
Location higherGeographyID Getty Thesaurus of Geographic Names
continent Getty Thesaurus of Geographic Names
waterbody Getty Thesaurus of Geographic Names
islandGroup Getty Thesaurus of Geographic Names
island Getty Thesaurus of Geographic Names
country Getty Thesaurus of Geographic Names
countryCode ISO 3166-1-alpha-2
geodeticDatum EPSG
verbatimCoordinateSystem
verbatimSRS EPSG
georeferenceVerificationStatus {'requires verification', 'verified by collector', or 'verified by curator' }
Identification identificationVerificationStatus HISPID/ABCD
Taxon taxonRank
nomenclaturalCode
taxonomicStatus
MeasurementOrFact measurementType
measurementUnit International System of Units (SI)
ResourceRelationship relationshipOfResource

Other Darwin Core fields that have inherent restrictions on values are:

Class Term Restriction
Record-level dcterms:modified ISO 8601:2004(E) <= now
Occurrence individualCount positive integer or 0
Event eventDate ISO 8601:2004(E) <= now
eventTime ISO 8601:2004(E) <= now
startDayOfYear positive integer <= 366 (or 365)
endDayOfYear positive integer <= 366 (or 365)
year integer <= current year
month positive integer <= 12
day positive integer <= 31 (or 30, or 28)
Location decimalLatitude real number between -90 and 90 inclusive
decimalLongitude real number between -180 and 180 inclusive
coordinateUncertaintyInMeters real number > 0
coordinatePrecision subset of positive real numbers> 0
pointRadiusSpatialFit 0 or positive real number >= 1
footprintWKT valid geometry in Well-known Text
footprintSRS valid SRS in Well-known Text
footprintSpatialFit 0 or positive real number >= 1
georeferencedDate ISO 8601:2004(E) <= now
Identification dateIdentified ISO 8601:2004(E) <= now
Taxon namePublishedInYear four-digit year
MeasurementOrFact measurementDeterminedDate ISO 8601:2004(E) <= now
ResourceRelationship relationshipEstablishedDate ISO 8601:2004(E) <= now

Available Controlled Vocabularies

So far, you can find a preliminary compilation of controlled vocabularies in this file: ControlledVocabs Resources.

That file is open for everyone to comment on, and we invite you to do so!

If you have a controlled vocabulary for any term in Darwin Core that recommends the use of a controlled vocabulary (see table below), please consider sharing it. To do so, you can:

a) post an issue in this GitHub repository.

b) send your message via the Darwin Core Hour Input Form. Don't forget to provide your email address so that we can get in contact.

Consequences of not using Controlled Vocabularies

The lack of adherence to controlled vocabularies cause the data to be highly heterogeneous, and therefore very difficult to find and use. We have compiled the actual values found in fields that recommend the use of controlled vocabularies from GBIF, VertNet and iDigBio, as a way to expose how messy data are and how difficult it is to deal with these data. You can take a look at those values HERE.