-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in flags which enable creation of nodes grouped by source #138
Comments
What does data source mean in this context? Should we be thinking in terms of just supplying a field or list of fields that would be use to split into separate files? I don't know if
|
A data source example would be this one. We would like to have something like
where The idea is to generate KGs of everything possible w.r.t what's available in the data source. This will allow downstream projects pick and choose either each individual KGs of interest (sort of like building a bouquet of flowers)or the whole thing based on the requirements. Hope this makes sense. |
I spent a little time looking at refactoring to go from a single writer to a dict of writers, but it wasn’t the kind of refactor that just easily falls into place. I might start with a cli command to split the files after the ingest, because that’s much more straightforward to implement, and much less likely to break the existing behavior. |
Posting as a result of KG Construction Crew discussion on July 8, 2024.
The current configuration of Koza is to generate one large TSV file for all nodes parsed from a singular datasource. To help with debugging and certain use cases; having the ability to have output node files for each individual data source could be useful.
In addition to this behavior, adding a flag which could be used to disable the creation of the large node TSV file may also be helpful.
Summary of request:
Please reach out to @hrshdhgd for more details of the advantages of this approach!
The text was updated successfully, but these errors were encountered: