Skip to content

Tabular

GCHQDeveloper81 edited this page Jul 16, 2024 · 1 revision

https://www.w3.org/TR/tabular-metadata/

The CSVW vocabulary was created alongside the Metadata vocabulary for CSV data W3C recommendation. The motivation for this standard relates to the issue of CSV being widely used on the web but not directly annotatable. Consider the following CSV data.

ID, name, year
1, Undertow, 1993
2, Ænima, 1996
3, Lateralus, 2001
4, 10,000 Days, 2006
5, Fear Inoculum, 2019

Some people will know exactly what this data represents while others may be able to infer at least part of the meaning based on context. When it comes to automated processes though, not a lot can be done beyond displaying the data in a pretty table - there simply isn't enough information there to draw any conclusions about what the data actually means. More things would become possible if we were to provide some additional metadata to describe the data.

  • User interfaces could be more descriptive about any tabular data they're displaying, or could appropriately prompt users for information that will eventually go into a CSV file.
  • You could use a metadata information to perform validation on a CSV file, making sure all the columns were present and in the correct format.
  • Mapping between different formats and locations becomes something that can be more easily performed - for example, you could automatically map CSV to RDF, or map CSV to the appropriate columns/tables in some database somewhere.

Note that the intended serialisation format for using this vocab in accordance with the guidence is JSON rather than RDF. The spec also states that implementations should automatically recognise the prefixes defined in the RDFA Core initial context rather than requiring users to define these every time. Despite this, it is still perfectly possible to use this vocab outside of the scope of modelling CSV on the web so long as you're talking about the same kinds of entity.

Example

The following block demonstrates using the CSVW vocab in its intended serialisation format (JSON) to describe the table from the body of the article above.

{
  "@context": ["http://www.w3.org/ns/csvw", { "@language": "en" }],
  "url": "http://example.com/tool_albums.csv",
  "dc:title": "Tool albums",
  "dcat:keyword": ["music", "rock"],
  "dc:modified": { "@value": "2023-09-20", "@type": "xsd:date" },
  "tableSchema": {
    "primaryKey": "ID",
    "columns": [
      {
        "name": "ID",
        "dc:description": "Unique primary key for this album",
        "datatype": "string",
        "required": true
      },
      {
        "name": "name",
        "titles": ["Album name"],
        "dc:description": "The name of the album",
        "datatype": "string",
        "required": true
      },
      {
        "name": "year",
        "titles": ["Album year"],
        "dc:description": "The year the album was released",
        "datatype": "gYear"
      }
    ]
  }
}

Here is the same example represented as rdf/turtle - in this case, the prefixes used are explicitly defined at the top of the document.

@prefix : <http://www.example.com/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .

:my_table a csvw:Table ;
    dc:title "Tool albums" ;
    dc:modified "2023-09-20"^^xsd:date ;
    dc:description "A list of all Tool albums released as of 2023" ;
    dcat:keyword ("music" "rock") ;
    csvw:url "http://example.com/tool_albums.csv" ;

    csvw:tableSchema [
        csvw:primaryKey "ID" ;
        csvw:columns (
            [
                csvw:name "ID" ;
                dc:description: "Unique primary key for this album" ;
                csvw:required true ;
                csvw:datatype "number"
            ]
            [
                csvw:name "name" ;
                csvw:titles "Album name" ;
                dc:description: "The name of the album" ;
                csvw:required true ;
                csvw:datatype "string"
            ]
            [
                csvw:name "year" ;
                csvw:titles "Album year" ;
                dc:description: "The year the album was released" ;
                csvw:datatype "gYear"
            ]
        ) ;
    ] .