Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes where the about node cannot be unambiguously determined #16

Open
cmungall opened this issue Aug 31, 2021 · 1 comment
Open

Changes where the about node cannot be unambiguously determined #16

cmungall opened this issue Aug 31, 2021 · 1 comment

Comments

@cmungall
Copy link
Collaborator

http://purl.obolibrary.org/obo/NCBITaxon_2 http://www.w3.org/2000/01/rdf-schema#label "Bacteria" .
http://purl.obolibrary.org/obo/NCBITaxon_3 http://www.w3.org/2000/01/rdf-schema#label "Virus" .

rename 'Bacteria' to 'Virus'
rename 'Virus' to 'Vaccine'

KGCL Model:
NodeRename(ID=test_id_322, Old Value='Bacteria', New Value='Virus')
NodeRename(ID=test_id_323, Old Value='Virus', New Value='Vaccine')

Output Graph:
http://purl.obolibrary.org/obo/NCBITaxon_2 http://www.w3.org/2000/01/rdf-schema#label "Vaccine" .
http://purl.obolibrary.org/obo/NCBITaxon_3 http://www.w3.org/2000/01/rdf-schema#label "Vaccine" .

The behavior is not wrong by the current under-specification, but we should better specify this

I think we are better being strict here, forcing all SimpleChanges to be about a single node. If we later have a use case for updating multiple nodes at once we create specific change classes for that.

NodeRename has a field with cardinality 0..1 for about-node.

I propose that the procedure from going from the change model to the output graph first fills in the about-node (or about-edge slot). If this cannot be done unambiguously, this would raise an error.

However, I appreciate that may be harder for a SPARQL implementation. It's nice to be able to translate the change object into a simple SPARQL object. I am happy with this behavior for now if we document this.

This might be non-ideal for cases where no about node can be determined, a direct SPARQL would simply result in no changes. But maybe that can just be detected directly.

@ckindermann
Copy link
Collaborator

We can validate constraints on change operations in two ways depending on how KGCL operations are used.

Validation as part of SPARQL UPDATE queries

The current design allows representing KGCL operations as SPARQL UPDATE queries.
Since this representation is independent of any input graph and can be used independently of the KGCL implementation, it would be desirable to check constraints on change operations as part of the query. However, doing so comes with a couple of potential drawbacks:

  1. if a constraint is not satisfied, then no changes will be made, as you say. Having a query 'fail' silently in such a case may be inconvenient for users in practice. Especially for a (seemingly straightforward) command like rename 'Virus' to 'Vaccine' where a user will most likely only check whether 'Virus' occurs as a label.

  2. checking constraints as part of the query could possibly come with a cost in performance.

Validation using KGCL tooling

An alternative to the above would be to validate constraints as part of the KGCL toolchain.
So, when a user decides to apply a KGCL operation to a graph, then we can validate constraints and raise errors as needed - independently of the corresponding SPARQL UPDATE query. This essentially corresponds to your proposal to complete instances of the KGCL data model (and validating all constraints) before they are translated to SPARQL UPDATE queries. Note, however, that this obviously requires a user to specify both a graph and a set of KGCL operations.

I think both ways are reasonable and have their pros and cons.
Validating constraints as part of SPARQL UPDATE queries would be nice for portability. Meaning a KGCL patch could be made available to any tool that provides support for SPARQL. However, (new) users might get confused or frustrated if simple expectations are not met. Especially without helpful error messages.

So, maybe we would like to provide both ways for validating constraints and let the user decide what to use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants