The time-consuming problem of converting csv data to RDF #59

nullgogo · 2021-03-05T03:13:32Z

Problem Description:

With 8 csv files, it took more than a day to convert about 600M data into RDF. We also tested the conversion of two csv files to RDF separately, which took more than a few hours.

Data source:

The data comes from CMDB, a total of 8 csv files, including host (18M), vm (18M), software (160M) and other data, there is a one-to-many and many-to-many semantic relationship between these data.

Config.ini and mapping.ttl Configuration:

Execute:

environment:
os: centos7
cpu core:64
memory: 96G

tangyong · 2021-03-05T07:48:54Z

@eiglesias34 We request team to help us to see the above performance problem,

1 [Problem domain] Our AIOps team to build our infra operational KG using SDM-RDFizer
2 give us some suggestions or directions for deep investigation
3 if needing any other info, please tell me

Thanks!

mevs · 2021-03-05T10:29:34Z

Dear @tangyong

Many thanks for sharing this use case. We have implemented new optimization techniques to speed up the execution of the joins in the mappings. Please, let us arrange a meeting, and we can share with you the new version which is still in development stage. Please, contact me at [email protected]

Best regards, Maria-Esther Vidal

tangyong · 2021-03-05T10:45:21Z

Dear @tangyong

Many thanks for sharing this use case. We have implemented new optimization techniques to speed up the execution of the joins in the mappings. Please, let us arrange a meeting, and we can share with you the new version which is still in development stage. Please, contact me at [email protected]

Best regards, Maria-Esther Vidal

thanks @mevs very much! I will arrange a meeting and contact with you.

tangyong · 2021-03-06T03:24:53Z

Dear @mevs ,

I have discussed with my team that we wish to firstly obtain the new optimaized version for comparing performance improvment and feedback you again. I will send my quest to your email.

Thanks!

tangyong · 2021-03-10T10:56:10Z

Problem Description:

With 8 csv files, it took more than a day to convert about 600M data into RDF. We also tested the conversion of two csv files to RDF separately, which took more than a few hours.

Data source:

The data comes from CMDB, a total of 8 csv files, including host (18M), vm (18M), software (160M) and other data, there is a one-to-many and many-to-many semantic relationship between these data.

Config.ini and mapping.ttl Configuration:

Execute:

environment:
os: centos7
cpu core:64
memory: 96G

Dear @mevs @dachafra @eiglesias34 ，

We have made a dataset for reproducing the problem and we wish to send you for assisting in investigation/fix. If you have time to help us , please telling me how to share the dataset (~800M) and we will upload the dataset into shared storage.

Thanks!
Best regards, Tang.

dachafra assigned eiglesias34 and dachafra Mar 5, 2021

dachafra added the enhancement New feature or request label Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The time-consuming problem of converting csv data to RDF #59

The time-consuming problem of converting csv data to RDF #59

nullgogo commented Mar 5, 2021 •

edited

Loading

tangyong commented Mar 5, 2021

mevs commented Mar 5, 2021

tangyong commented Mar 5, 2021

tangyong commented Mar 6, 2021

tangyong commented Mar 10, 2021

Problem Description:

The time-consuming problem of converting csv data to RDF #59

The time-consuming problem of converting csv data to RDF #59

Comments

nullgogo commented Mar 5, 2021 • edited Loading

Problem Description:

tangyong commented Mar 5, 2021

mevs commented Mar 5, 2021

tangyong commented Mar 5, 2021

tangyong commented Mar 6, 2021

tangyong commented Mar 10, 2021

Problem Description:

nullgogo commented Mar 5, 2021 •

edited

Loading