Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NDJSON loading script and update database connection settings #92

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

zzadxz
Copy link

@zzadxz zzadxz commented Oct 30, 2024

  • A new script to load NDJSON files into PostgreSQL
  • Updated database connection parameters to use environment variables
  • Added physionet.org data directory to .gitignore

PR Type

[Feature, Fix, Documentation]

Short Description

PR Summary:

  • A new Python script load_ndjson_to_postgres.py to load MIMIC-IV FHIR dataset files in NDJSON format directly into a PostgreSQL database.
  • Updated collect.py to use environment variables for the database connection to improve security.
  • An addition to .gitignore to exclude physionet.org directory data files to keep the repository clean.

Tests Added

No specific unit tests were added for this script. The script was tested manually with sample NDJSON files with a successful data import into PostgreSQL.

Issue Reference

Closes #84 – resolves the need for loading MIMIC-IV FHIR NDJSON files into the PostgreSQL database for use with collect.py.

Detailed Description

1. Added Script:
The new script load_ndjson_to_postgres.py reads each NDJSON file in a specified directory, flattens nested JSON data where necessary, and loads the data into a specified PostgreSQL database. This streamlines the process of loading large FHIR datasets for analysis.

2. Updates to collect.py:
The script collect.py now uses environment variables to fetch database credentials.

3. .gitignore Update:
Excluded the physionet.org in case users download the dataset inside the main repository.

Environment Variable Setup:
To use the new scripts, please set up environment variables:

export DB_HOST=localhost
export DB_PORT=5432
export DB_NAME=mimiciv_fhir
export DB_USER=your_username
export DB_PASSWORD=your_password

Kiarash Sotoudeh and others added 3 commits October 30, 2024 05:15
- a new script to load NDJSON files into PostgreSQL
- updated database connection parameters to use environment variables
- added physionet.org data directory to .gitignore
@amrit110
Copy link
Member

@zzadxz thanks for this PR! Great to see you think of ideas to improve the repo.

@zzadxz
Copy link
Author

zzadxz commented Oct 30, 2024

@amrit110 Of course! More than happy to help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

before running collect.py
2 participants