Skip to content

glossarist/csv-converter-ts

Repository files navigation

Glossarist CSV conversion script in TypeScript

Initializes a Paneron repository with a Glossarist dataset from a CSV file with terminological data.

Requires Node 15, NPM.

The repository also includes scaffolding for building a static site from the glossary, and a GHA workflow for deploying the site to AWS S3 + CF infrastructure (see the Deployment section).

Usage

Generating data

Assuming you have the CSV ready (see CSV structure below), invoke the script as follows:

% npx @riboseinc/glossarist-csv-converter -l <language_code> -i <glossary_id> -d <domain_name> </path/to/file.csv> -o </repository/container/directory>

Where:

  1. Paths can be relative.

  2. Language code is expected to be a three-letter ISO 639-2/T code (“eng” for English).

  3. Glossary ID can be a descriptive alphanumeric string without spaces.

  4. Domain name should match the domain at which the registry will eventually be made accessible (uniformResourceIdentifier per ISO 19135-1). Note that the glossary can only be deployed at domain root currently.

  5. Repository root will be created at /repository/container/directory/glossary_id; it must not exist (but container directory must)

Finalizing repository setup

After navigating to /repository/container/directory/glossary_id, initialize local repository, assign Github repository as a remote, and push:

% git init
% git commit -m "Initial migration complete"
% git branch -M main
% git remote add origin "<your Github origin URL>"
% git push

Building site locally

After navigating to /repository/container/directory/glossary_id, install dependencies and build the site, after which you can serve it locally:

% yarn
% yarn build
% cd dist
% python3 -m http.server 8000

(Navigate to http://localhost:8000/ in your favorite browser.)

CSV structure

Note
For a sample file, refer to sample/glossary.csv.

The fields are as follows:

  1. Human-readable identifier (required): a string

  2. Date accepted (default is now)

  3. Definition (required): plain text, with possible AsciiMath

  4. Note 1: plain text, with possible AsciiMath

  5. Note 2: plain text, with possible AsciiMath

  6. Note 3: plain text, with possible AsciiMath

  7. Example 1: plain text, with possible AsciiMath

  8. Example 2: plain text, with possible AsciiMath

  9. Example 3: plain text, with possible AsciiMath

  10. Authoritative source reference (to a standard, e.g., "ITU-R Recommendation 592")

  11. Authoritative source clause (in the standard referenced, e.g., "4.4")

  12. Authoritative source link (URL, to the standard referenced)

  13. Term 1 designation (required): as text with possible AsciiMath

  14. Term 1 type: expression | symbol | prefix

  15. Term 1 part of speech (expressions only): noun | adjective | adverb | verb

  16. Term 1 grammatical number (nouns only): plural | singular | mass

  17. Term 1 grammatical gender (nouns only): common | feminine | masculine | neuter

  18. Term 1 participle marker (adjectives and adverbs only): non-empty value for “true”

  19. Term 1 abbreviation marker: non-empty value for “true”

  20. Term 2 designation: as text with possible AsciiMath

  21. Term 2 type: expression | symbol | prefix

  22. Term 2 part of speech (expressions only): noun | adjective | adverb | verb

  23. Term 2 grammatical number (nouns only): plural | singular | mass

  24. Term 2 grammatical gender (nouns only): common | feminine | masculine | neuter

  25. Term 2 participle marker (adjectives and adverbs only): non-empty value for “true”

  26. Term 2 abbreviation marker: non-empty value for “true”

  27. Term 3 designation: as text with possible AsciiMath

  28. Term 3 type: expression | symbol | prefix

  29. Term 3 part of speech (expressions only): noun | adjective | adverb | verb

  30. Term 3 grammatical number (nouns only): plural | singular | mass

  31. Term 3 grammatical gender (nouns only): common | feminine | masculine | neuter

  32. Term 3 participle marker (adjectives and adverbs only): non-empty value for “true”

  33. Term 3 abbreviation marker: non-empty value for “true”

Deployment

This requires you to have an S3 bucket associated with a CloudFront distribution, which in turn is associated with a domain name.

Provided the following variables are specified in Github repository secrets, the GHA workflow provided should deploy the site automatically:

  • DOMAIN_NAME

  • AWS_REGION

  • AWS_ACCESS_KEY_ID

  • AWS_SECRET_ACCESS_KEY

  • CLOUDFRONT_DISTRIBUTION_ID

  • S3_BUCKET_NAME

Roadmap

  1. Explain how to load the created dataset in Paneron

  2. Render static HTML site locally without manual intervention

About

Converting CSV into a Glossarist dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published