CogKR: Cognitive Graph for Multi-hop Knowledge Reasoning
Accepted to IEEE TKDE.
Under construction.
- Python 3
- PyTorch >= 1.1.0
- NVIDIA GPU + CUDA cuDNN
Clone this repo
git clone https://github.com/THUDM/CogKR
cd CogKR
Please install dependencies by
pip install -r requirements.txt
Then install pytorch_scatter 2.0.4 manually.
Three public datasets FB15K-237, WN18RR, and YAGO3-10 are used for knowledge graph completion. The original datasets can be downloaded from FB15K-237, WN18RR, and YAGO3-10.
Two public datasets NELL-One and Wiki-One (slightly modified) are used for one-shot link prediction. The original datasets can be downloaded from One-shot Relational Learning. You can download the preprocessed datasets from the link in OneDrive. If you're in regions where OneDrive is not available (e.g. Mainland China), try to the link in Tsinghua Cloud.
After downloading the dataset, please unzip it into the datasets/{dataset_name}/data
folder.
To use your own dataset, see the "Use your dataset" part below.
python src/preprocess.py --directory datasets/{dataset_name} --process_data --save_train
For training, simply sun
python src/main.py --directory datasets/{dataset_name} --gpu {gpu_id} --config {config_file} --comment {experiment_name}
Use dataset_path
to specify the path to the dataset.
Use gpu_id
to specify the id of the gpu to use.
config_file
is used to specify the configuration file for experimental settings and hyperparameters. Different configurations for two datasets in the paper are stored under the configs/
folder.
experiment_name
is used to specify the name of the experiment.
For evaluation, simply run
python src/main.py --inference --directory datasets/{dataset_name} --gpu {gpu_id} --config {config_file} --load_state {state_file}
To use your own dataset, please put the files of the dataset under datasets/
in the following structure:
-{dataset_name}/data
-train.txt
-valid_support.txt
-valid_eval.txt
-test_support.txt
-test_eval.txt
-ent2id.txt (optional)
-relation2id.txt (optional)
-entity2vec.{embed_name} (optional)
-relation2vec.{embed_name} (optional)
-rel2candidates.json (optional)
train.txt
,valid_support.txt
, valid_eval.txt
, test_support.txt
and test_eval.txt
correspond to the facts of training relations, support facts and evaluate facts of validation relations and support facts and evaluate facts of test relations, for one-shot link prediction tasks. Each line is in the format of {head}\t{relation}\t{tail}\n
. For knowledge graph completion, train.txt
, valid_eval.txt
, and test_eval.txt
should be the train, valid, and test sets. valid_support.txt
and test_support.txt
should be empty.
ent2id.txt
, relation2id.txt
, entity2vec.{embed_name}
and relation2vec.{embed_name}
are used for pretrained KG embeddings. The usage of pretrained embeddings is not required. Each line of ent2id.txt
or relation2id.txt
is the entity/relation name whose id is the line number(starting from 0). Each line of entity2vec.{embed_name}
or relation2vec.{embed_name}
is the vector of the entity/relation whose id is the line number.
rel2candidates.json
represents the candidate entities of test and validation relations. The file is only used for one-shot link prediction in our experiment.
Please cite our paper if you use the code or datasets in your own work:
@ARTICLE {9512424,
author = {Z. Du and C. Zhou and J. Yao and T. Tu and L. Cheng and H. Yang and J. Zhou and J. Tang},
journal = {IEEE Transactions on Knowledge & Data Engineering},
title = {CogKR: Cognitive Graph for Multi-hop Knowledge Reasoning},
year = {5555},
volume = {},
number = {01},
issn = {1558-2191},
pages = {1-1},
keywords = {cognition;task analysis;urban areas;training;computational modeling;benchmark testing;scalability},
doi = {10.1109/TKDE.2021.3104310},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {aug}
}