Know2BIO

Know2BIO is a comprehensive biomedical knowledge graph benchmark harmonizing heterogeneous database sources.

Getting Started

Environment Setup

We recommend using Anaconda3 to manage the environment.

Install Anaconda3.
Edit env.yaml: set $USER_PATH to user's directory.
Create know2bio environment using conda env create -f env.yml.

Hardware Requirements

Server: AMD EPYC 7542 Processor (128 cores), 1.73 TB RAM, and 8 NVIDIA A100-SXM4-80GB GPUs.
Operating system: Ubuntu 20.04 LTS.

Benchmarking

Setup

Python environment: follow the guide in Environment Setup Section.

Experiments

To run the experiments, please execute main.py script. Arguments are listed below.

usage: run.py [-h] [--dataset {ontology,instance,whole,FB15K,WN,WN18RR,FB237,YAGO3-10}]
              [--model {TransE,TransR,DistMult,CP,MurE,RotE,RefE,AttE,RotH,RefH,AttH,ComplEx,RotatE}] [--regularizer {N3,F2}] [--reg REG]
              [--optimizer {Adagrad,Adam,SparseAdam}] [--max_epochs MAX_EPOCHS] [--patience PATIENCE] [--valid VALID] [--rank RANK] [--batch_size BATCH_SIZE] [--neg_sample_size NEG_SAMPLE_SIZE]
              [--init_size INIT_SIZE] [--learning_rate LEARNING_RATE]

Knowledge Graph Embedding

options:
  -h, --help            show this help message and exit
  --dataset {ontology,instance,whole}
                        Knowledge Graph dataset: ontology, instance, whole views
  --model {TransE,TransR,DistMult,CP,MurE,RotE,RefE,AttE,RotH,RefH,AttH,ComplEx,RotatE}
                        Knowledge Graph embedding model
  --optimizer {Adagrad,Adam,SparseAdam}
                        Optimizer
  --max_epochs MAX_EPOCHS
                        Maximum number of epochs to train for
  --patience PATIENCE   Number of epochs before early stopping
  --valid VALID         Number of epochs before validation
  --rank RANK           Embedding dimension
  --batch_size BATCH_SIZE
                        Batch size
  --neg_sample_size NEG_SAMPLE_SIZE
                        Negative sample size, -1 to not use negative sampling
  --dropout DROPOUT     Dropout rate
  --init_size INIT_SIZE
                        Initial embeddings' scale
  --learning_rate LEARNING_RATE
                        Learning rate

Example: Train TransE model on Know2BIO's whole view

CUDA_VISIBLE_DEVICES=0 python main.py --model TransE --dataset whole --valid 10 --patience 5 --rank 512 --neg_sample_size 150 --optimizer Adam --learning_rate 0.001

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
assets		assets
benchmark		benchmark
dataset		dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yaml		env.yaml
know2bio_metadata_schema.jsonld		know2bio_metadata_schema.jsonld

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Know2BIO

Getting Started

Environment Setup

Hardware Requirements

Benchmarking

Setup

Experiments

Dataset Construction

Dataset Schema

Data Source and Relationships

Usage and Datasheet

About

Releases

Packages

Languages

License

DylanSteinecke/Know2BIO

Folders and files

Latest commit

History

Repository files navigation

Know2BIO

Getting Started

Environment Setup

Hardware Requirements

Benchmarking

Setup

Experiments

Dataset Construction

Dataset Schema

Data Source and Relationships

Usage and Datasheet

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages