Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Update/Merge RDF triple or KG incrementally #60

Open
tangyong opened this issue Mar 5, 2021 · 9 comments
Open

[Question] Update/Merge RDF triple or KG incrementally #60

tangyong opened this issue Mar 5, 2021 · 9 comments
Assignees
Labels
question Further information is requested

Comments

@tangyong
Copy link

tangyong commented Mar 5, 2021

From the paper: SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs, we seem to find SDM-RDFizer has such a capability for updating KG incrementally while new data is coming (eg. in streaming way...)

image

Concretely, we have built a KG using SDM-RDFizer from multi-datasources while facing an IoT case, and IoT data is still coming by collecting a lot of sensors and transfering message middleware (eg.kafka), then, we need to constly update the previous KG to reflect the recent data change, however, we want not to build the KG from scratch.

Instead, we want to update the previous built KG incrementally to add/update the data and reach the real time effective as soon as possible.

So, I want to ask the team whether supporting the above case or not?

Thanks!

@dachafra dachafra added the question Further information is requested label Mar 5, 2021
@dachafra
Copy link
Collaborator

dachafra commented Mar 5, 2021

Dear @tangyong,
At this moment we do not support streaming construction of the KGs, the incremental generation of the KG means that we do not maintain all the triples generated in memory thanks to the structures we have defined, we write chunks of generated triples to the output file. Additionally, it is in our plan to support the creation of KGs using previous versions, as you mention, but right now it is neither supported.

Thanks for using the SDM-RDFizer, hope this answers your questions.

David

@dachafra dachafra self-assigned this Mar 5, 2021
@mevs
Copy link
Collaborator

mevs commented Mar 5, 2021

Dear @tangyong,

Many thanks for your interest in our work! The members of the team are working constantly in adding more features to the SDM-RDFizer.
This public version of the SDM-RDFizer does not support the incremental creation of the knowledge graph. Nevertheless, the structures and strategies implemented in SDM-RDFizer enable to include new incoming data into an RDF knowledge graph incrementally. We have a beta version of the SDM-RDFizer that implements these features, and we could give you access in case you are interested. It is essential to highlight that we have applied these incremental SDM-RDFizer in the context of evolving data, e.g., new scholarly data, instead than in the context of IoT. Extending the SDM-RDFizer for IoT data is part of our future plans. Thus, if you have a specific task or use case, please, contact me directly to my personal account [email protected].

Best regards, Maria-Esther Vidal

@tangyong
Copy link
Author

tangyong commented Mar 5, 2021

Dear @tangyong,
At this moment we do not support streaming construction of the KGs, the incremental generation of the KG means that we do not maintain all the triples generated in memory thanks to the structures we have defined, we write chunks of generated triples to the output file. Additionally, it is in our plan to support the creation of KGs using previous versions, as you mention, but right now it is neither supported.

Thanks for using the SDM-RDFizer, hope this answers your questions.

David

Thanks reply very much, I see. Another question: if I use SDM-RDFizer to build KG from existing data sources and wish to implement the KG update, then, could you give me some suggestion from your experience?

Thanks!

@tangyong
Copy link
Author

tangyong commented Mar 5, 2021

Dear @tangyong,

Many thanks for your interest in our work! The members of the team are working constantly in adding more features to the SDM-RDFizer.
This public version of the SDM-RDFizer does not support the incremental creation of the knowledge graph. Nevertheless, the structures and strategies implemented in SDM-RDFizer enable to include new incoming data into an RDF knowledge graph incrementally. We have a beta version of the SDM-RDFizer that implements these features, and we could give you access in case you are interested. It is essential to highlight that we have applied these incremental SDM-RDFizer in the context of evolving data, e.g., new scholarly data, instead than in the context of IoT. Extending the SDM-RDFizer for IoT data is part of our future plans. Thus, if you have a specific task or use case, please, contact me directly to my personal account [email protected].

Best regards, Maria-Esther Vidal

Great news and plans! And I am interested the beta version very much! and I have such case for smart city i.e. IoT case. I will send email to you.

Thanks @mevs again very much !

@eiglesias34
Copy link
Collaborator

Thanks reply very much, I see. Another question: if I use SDM-RDFizer to build KG from existing data sources and wish to implement the KG update, then, could you give me some suggestion from your experience?

Dear @tangyong,

To update an existing KG, you need access to the KG in question (being it from an endpoint, database, file, etc.) so you can compare the new triples with the KG to determine if they do not already exist.

Best regards, Enrique Iglesias

@tangyong
Copy link
Author

tangyong commented Mar 6, 2021

Thanks reply very much, I see. Another question: if I use SDM-RDFizer to build KG from existing data sources and wish to implement the KG update, then, could you give me some suggestion from your experience?

Dear @tangyong,

To update an existing KG, you need access to the KG in question (being it from an endpoint, database, file, etc.) so you can compare the new triples with the KG to determine if they do not already exist.

Best regards, Enrique Iglesias

Dear @eiglesias34 @mevs @dachafra

Thanks Enrique Iglesias's suggestions ! I have some comments as following:

First, I totally agree with you said: "compare the new triples with the KG to determine if they do not already exist"

Second, I want to say some details, on a real world case e.g. IoT inter-connect sensor data in a streaming way,

  1. I have built a knowledge graph using history data sources related such an IoT scene, and we can build it offline because it will take a relative long time, and I ignore RDF store(in reality, I will use gStore) for now.

  2. Sensor data in a streaming way is continuously coming by a lot of devices and entering e.g. kafka, then, I wish to update the built knowledge graph for reflecting the newest data change in real time. Since I plan to update in real time, data volume should be in a reasonable acceptable range by many factors. Then, I will use spark/flink to split data according to time window. Then, according to your suggestion, I will have four ways in mind to base the splitted data to update the KG as following:

(1) According the same RML mapping file, I use a different RML tool (not SDM-RDFizer) for making new triples based on the splitted data and other properties(eg. manally trigering insert/delete operations...), then, I compare the new triples with the KG and determine if I will insert/insert them.

(2) According the same RML mapping file, I still use SDM-RDFizer to make a new KG(.nt file) for the splitted data, then, I read the new .nt file and resovle the new tripes and compare the new triples with the KG and determine if I will insert/insert them.

(3) Based on (2), SDM-RDFizer exposed an interface to obtain the new tripes to avoid myself resovling the new triples.

(4) Based on (2), the whole update logic is offered by SDM-RDFizer, and I can image the following context,

from rdfizer.semantify import semantify
import sys

KG = semantify(str(sys.argv[1])) ---- the built knowledge graph, KG is returned as context object, and we add transaction idea.

newSensorData = streamingWindowData(...) --- use spark/flink to split data according to time window
newCsvfiedSensorData = formatCsv(newSensorData, ...) --- convert newSensorData into csv format

KG.update( newCsvfiedSensorData ) --- expose a new interface called "update"
...

From my view, I wish SDM-RDFizer to expose more operations/interfaces e.g. (3) and (4).

Thanks !

@eiglesias34
Copy link
Collaborator

Dear @tangyong,

Thank you again for your interest in the SDM-RDFizer.

Given the complexity of your most recent question, the head of our group wishes to have met so that we can discuss it further. Please contact Prof. Maria-Esther Vidal at [email protected].

Best regards, Enrique Iglesias

@tangyong
Copy link
Author

tangyong commented Mar 8, 2021

Dear @tangyong,

Thank you again for your interest in the SDM-RDFizer.

Given the complexity of your most recent question, the head of our group wishes to have met so that we can discuss it further. Please contact Prof. Maria-Esther Vidal at [email protected].

Best regards, Enrique Iglesias

OK, I will ping Prof. Maria-Esther Vidal.

Thanks!

@LangDaoAI
Copy link

Any update about the issue?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants