Add NDJSON loading script and update database connection settings #92
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Type
[Feature, Fix, Documentation]
Short Description
PR Summary:
load_ndjson_to_postgres.py
to load MIMIC-IV FHIR dataset files in NDJSON format directly into a PostgreSQL database.collect.py
to use environment variables for the database connection to improve security..gitignore
to excludephysionet.org
directory data files to keep the repository clean.Tests Added
No specific unit tests were added for this script. The script was tested manually with sample NDJSON files with a successful data import into PostgreSQL.
Issue Reference
Closes #84 – resolves the need for loading MIMIC-IV FHIR NDJSON files into the PostgreSQL database for use with
collect.py
.Detailed Description
1. Added Script:
The new script
load_ndjson_to_postgres.py
reads each NDJSON file in a specified directory, flattens nested JSON data where necessary, and loads the data into a specified PostgreSQL database. This streamlines the process of loading large FHIR datasets for analysis.2. Updates to
collect.py
:The script
collect.py
now uses environment variables to fetch database credentials.3.
.gitignore
Update:Excluded the
physionet.org
in case users download the dataset inside the main repository.Environment Variable Setup:
To use the new scripts, please set up environment variables: