Skip to content

Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge Graphs

License

Notifications You must be signed in to change notification settings

DylanSteinecke/Know2BIO

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Know2BIO

Know2BIO is a comprehensive biomedical knowledge graph benchmark harmonizing heterogeneous database sources.

Getting Started

Environment Setup

We recommend using Anaconda3 to manage the environment.

  • Install Anaconda3.
  • Edit env.yaml: set $USER_PATH to user's directory.
  • Create know2bio environment using conda env create -f env.yml.

Hardware Requirements

  • Server: AMD EPYC 7542 Processor (128 cores), 1.73 TB RAM, and 8 NVIDIA A100-SXM4-80GB GPUs.
  • Operating system: Ubuntu 20.04 LTS.

Benchmarking

Setup

  • Python environment: follow the guide in Environment Setup Section.

Experiments

  • To run the experiments, please execute main.py script. Arguments are listed below.
usage: run.py [-h] [--dataset {ontology,instance,whole,FB15K,WN,WN18RR,FB237,YAGO3-10}]
              [--model {TransE,TransR,DistMult,CP,MurE,RotE,RefE,AttE,RotH,RefH,AttH,ComplEx,RotatE}] [--regularizer {N3,F2}] [--reg REG]
              [--optimizer {Adagrad,Adam,SparseAdam}] [--max_epochs MAX_EPOCHS] [--patience PATIENCE] [--valid VALID] [--rank RANK] [--batch_size BATCH_SIZE] [--neg_sample_size NEG_SAMPLE_SIZE]
              [--init_size INIT_SIZE] [--learning_rate LEARNING_RATE]

Knowledge Graph Embedding

options:
  -h, --help            show this help message and exit
  --dataset {ontology,instance,whole}
                        Knowledge Graph dataset: ontology, instance, whole views
  --model {TransE,TransR,DistMult,CP,MurE,RotE,RefE,AttE,RotH,RefH,AttH,ComplEx,RotatE}
                        Knowledge Graph embedding model
  --optimizer {Adagrad,Adam,SparseAdam}
                        Optimizer
  --max_epochs MAX_EPOCHS
                        Maximum number of epochs to train for
  --patience PATIENCE   Number of epochs before early stopping
  --valid VALID         Number of epochs before validation
  --rank RANK           Embedding dimension
  --batch_size BATCH_SIZE
                        Batch size
  --neg_sample_size NEG_SAMPLE_SIZE
                        Negative sample size, -1 to not use negative sampling
  --dropout DROPOUT     Dropout rate
  --init_size INIT_SIZE
                        Initial embeddings' scale
  --learning_rate LEARNING_RATE
                        Learning rate
  • Example: Train TransE model on Know2BIO's whole view
CUDA_VISIBLE_DEVICES=0 python main.py --model TransE --dataset whole --valid 10 --patience 5 --rank 512 --neg_sample_size 150 --optimizer Adam --learning_rate 0.001

Code and README for the benchmarking Know2BIO can be found in benchmark.

Dataset Construction

Dataset Schema

Know2BIO Schema

Data Source and Relationships

Know2BIO Data Source

Usage and Datasheet

Code and README for the construction of Know2BIO can be found in dataset.

About

Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge Graphs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%