IDEARS - Integrated Disease Explanation and Associations Risk Scoring

Applies to the UKB datasetes, UKB dementia, AD and PD classification and SHAP

Overview

This is the codebase for IDEARs - Integrated Disease Explanation and Associations Risk Scoring. Its overall architecture is shown below.

To ease the configuation, please install Anaconda and set this up in a virtual environment.

conda env create -f .\conda-env.yml

conda activate conda-env

Then on Windows, run startlocal_woDocker.bat and on Linux, run startlocal_woDocker.sh

data_gen.py is used to perform ETL on the data and to create the model datasets
data_proc.py is used for extra data processing including the creation of normalised datasets
ml.py is used to run the models including logistic regression, XGBoost and for model interpretability using SHAP
analysis.py is used to create charts, perform extra statistical tests including paired t tests

The jupyter notebooks used for AD are:

Import modules etc.

This folder shows the implementation of the IDEARs platform.