Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The time-consuming problem of converting csv data to RDF #59

Open
nullgogo opened this issue Mar 5, 2021 · 5 comments
Open

The time-consuming problem of converting csv data to RDF #59

nullgogo opened this issue Mar 5, 2021 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@nullgogo
Copy link

nullgogo commented Mar 5, 2021

Problem Description:

With 8 csv files, it took more than a day to convert about 600M data into RDF. We also tested the conversion of two csv files to RDF separately, which took more than a few hours.

Data source:

The data comes from CMDB, a total of 8 csv files, including host (18M), vm (18M), software (160M) and other data, there is a one-to-many and many-to-many semantic relationship between these data.

1

Config.ini and mapping.ttl Configuration:

2
3

Execute:

4

environment:
os: centos7
cpu core:64
memory: 96G

@tangyong
Copy link

tangyong commented Mar 5, 2021

@eiglesias34 We request team to help us to see the above performance problem,

1 [Problem domain] Our AIOps team to build our infra operational KG using SDM-RDFizer
2 give us some suggestions or directions for deep investigation
3 if needing any other info, please tell me

Thanks!

@dachafra dachafra added the enhancement New feature or request label Mar 5, 2021
@mevs
Copy link
Collaborator

mevs commented Mar 5, 2021

Dear @tangyong

Many thanks for sharing this use case. We have implemented new optimization techniques to speed up the execution of the joins in the mappings. Please, let us arrange a meeting, and we can share with you the new version which is still in development stage. Please, contact me at [email protected]

Best regards, Maria-Esther Vidal

@tangyong
Copy link

tangyong commented Mar 5, 2021

Dear @tangyong

Many thanks for sharing this use case. We have implemented new optimization techniques to speed up the execution of the joins in the mappings. Please, let us arrange a meeting, and we can share with you the new version which is still in development stage. Please, contact me at [email protected]

Best regards, Maria-Esther Vidal

thanks @mevs very much! I will arrange a meeting and contact with you.

@tangyong
Copy link

tangyong commented Mar 6, 2021

Dear @mevs ,

I have discussed with my team that we wish to firstly obtain the new optimaized version for comparing performance improvment and feedback you again. I will send my quest to your email.

Thanks!

@tangyong
Copy link

Problem Description:

With 8 csv files, it took more than a day to convert about 600M data into RDF. We also tested the conversion of two csv files to RDF separately, which took more than a few hours.

Data source:

The data comes from CMDB, a total of 8 csv files, including host (18M), vm (18M), software (160M) and other data, there is a one-to-many and many-to-many semantic relationship between these data.

1

Config.ini and mapping.ttl Configuration:

2
3

Execute:

4

environment:
os: centos7
cpu core:64
memory: 96G

Dear @mevs @dachafra @eiglesias34

We have made a dataset for reproducing the problem and we wish to send you for assisting in investigation/fix. If you have time to help us , please telling me how to share the dataset (~800M) and we will upload the dataset into shared storage.

Thanks!
Best regards, Tang.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants