Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving use of TEI dating attributes #63

Open
adunning opened this issue May 10, 2023 · 4 comments
Open

Improving use of TEI dating attributes #63

adunning opened this issue May 10, 2023 · 4 comments

Comments

@adunning
Copy link

adunning commented May 10, 2023

At present the schema requires @when, @notBefore, or @notAfter on most dating elements. These are part of att.datable.w3c, i.e. using the W3C XML Schema datatype, which is based on ISO 8601 but with some important differences.

I've encountered a number of problems with our approach:

  • We are not strictly using the TEI attributes correctly: according to the specification, @notBefore, or @notAfter should give an earliest/late possible date for an event, and we are not distinguishing these from precise beginning/end points where we have them, which should instead be specified in @from and @to.
  • We allow all other various other TEI dating attributes on our elements, which is confusing to cataloguers and can result in conflicting data.
  • Because Oxygen sorts attributes alphabetically, it's very easy to become confused between @notBefore and @notAfter.
  • Unlike ISO 8601, the W3C datatype does not allow for the expression of centuries or decades as such, meaning that the 19th century can appear in our records as any of notBefore="1800" notAfter="1899" or notBefore="1800" notAfter="1900" or notBefore="1801" notAfter="1900".
  • It is difficult to distinguish between instances of notBefore/notAfter that give actual dates from those that give estimates to the century or decade. I would expect the machine-readable date to reflect the prose precisely. In an ideal world, it should be possible to eliminate the prose date entirely and instead give a precise multilingual text representation of a date from the machine-readable version.

In the ISO 8601 specification, one can write when-iso="18" to represent the 19th century (a date between 1800–1899); or 196 to indicate the 1960s. It also allows intervals such as 2020/2022 for 2020–22 or 181/185 for the 1810s × 1850s.

ISO 8601-2:2019 helps further with extensions to improve the syntax of imprecise dates, allowing for example 18XX for an unspecified point in the 19th century. It also gives machine-readable equivalents to 'circa' etc, based on the Extended Date/Time Format (EDTF) Specification. TEI have planned support for this in @when-iso.

The radical solution would be to normalize all date attributes to when-iso – it would be easier to write, more precise, and more interoperable (e.g. by avoiding the problem of different definitions of centuries). Whether that is realistic is another question, and we would need to develop guidance on translating parts of centuries into ISO notation; I noted some of these problems in bodleian/medieval-mss#623.

Note that ISO 8601-1:2019+A1:2022 and ISO 8601-2:2019 are available through British Standards Online (requires the university VPN).

@holfordm
Copy link
Collaborator

My understanding is that @from and @to are used for continuous periods, as in the example <date from="1863-05-28" to="1863-06-01">28 May through 1 June 1863</date> [⚓︎](https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.datable.w3c.html#index-egXML-d54e5495) and also <p>Those five years — <date from="1918" to="1923">1918 to 1923</date> — had been, he suspected, somehow very important.</p (https://tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#CONADA). So it would only be correct to use those attributes where it was known that a manuscript had been written over a continuous period. It is correct (again as I understand it) to use @notBefore and @notAfter for earliest and latest possible dates of a discrete event e.g. writing a manuscript, whether those are known exactly (1457 x 1460) or not (15th century); our guidelines recommend using the @cert attribute to distinguish the former cases.

@holfordm
Copy link
Collaborator

(confusion of the values in @notBefore / @notAfter is easy to do, but there is a schematron rule in place to flag when this occurs)

@adunning
Copy link
Author

That is my understanding as well: which I think means that we should technically be using from/to on <provenance> in most cases? Whether it is a helpful distinction or not is another question!

@holfordm
Copy link
Collaborator

Indeed - but even with provenance from and to would only be appropriate where the exact duration of a provenance event is known; that is what our guidelines recommend (https://msdesc.github.io/consolidated-tei-schema/msdesc.html#provenance) and has been adopted in many cases; incorrect uses of notAfter and notBefore could certainly be corrected, although I would agree that this wouldn't be a high priority at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants