Claim Detection & Semantics Extraction (Covid-19)

Installation

Clone repo
Create Python virtual environment
Make sure that your current Java environment is Java 8.
- If the setup fails at the JAMR step, check that Java 8 is configured for the newly downloaded transition-amr-parser project.
Make sure cuda is enabled if you are on a machine with a GPU.
Run make install [$isi_username]
- This assumes that your conda installation is within ~/miniconda3. If it is not, replace Line 27 of setup.sh with: source ~/PATH_TO_MINICONDA_INSTALL.
- If you provide isi_username, it will assume that you can access the minlp-dev-01 server and that you are working from a local system. In that case, you will be prompted for a password after you see "Downloading model..." If not, it will assume that you are working from a /nas-mounted server.
You will also need to download and unzip this file into data/:
1. UIUC EDL data (param: edl.edl_output_dir): https://drive.google.com/file/d/16ANEPjqy4byNY3B2BmYqsu1ZcBlp9tfR/view?usp=sharing

Docker

These instructions assume that you are building the image on the SAGA cluster.

Clone repo
cd into cdse-covid and clone the following repos:
1. git clone https://github.com/isi-vista/aida-tools.git
2. git clone https://github.com/elizlee/amr-utils.git
3. git clone https://github.com/isi-vista/saga-tools.git
4. git clone https://github.com/IBM/transition-amr-parser.git
  1. Make sure that your transition-amr-parser installation is updated and on the master branch.
  2. cd to transition-amr-parser/preprocess and do the following:
    1. git clone https://github.com/jflanigan/jamr.git
    2. git clone https://github.com/damghani/AMR_Aligner.git
    3. mv AMR_Aligner kevin
    4. cd transition-amr-parser/preprocess/kevin:
      1. git clone https://github.com/moses-smt/mgiza.git
5. Copy the following files from /scratch/dockermount/cdse_covid_resources:
  1. The Wikidata classifier: wikidata_classifier.state_dict --> cdse-covid/wikidata_linker/resources
  2. The AMR parser model: /scratch/dockermount/cdse_covid_resources/AMR2.0 --> transition-amr-parser/DATA
6. cd back into cdse-covid and run
```
docker build . -t isi-cdse-covid:<tag>
```

Usage

Via Pegasus WFMS

Generate workflow

conda activate <cdse-covid-env>
python -m cdse_covid.pegasus_pipeline.claim_pipeline params/claim_detection.params

Navigate to experiment dir specified in your params file, execute the workflow, and monitor the progress

bash setup.sh
pegasus-status PEGASUS/RUN/DIR -w 60

Via Shell Script

We provide a simple way to run the whole pipeline without needing Pegasus WMS.

Create a parameter file with your own values for the parameters in params/run_pipeline_params.params
Make sure that your cdse-covid conda environment is active.

Run

bash ./run_pipeline.sh your/params/file

Via Individual Scripts

Create the AMR files

The files in TXT_FILES should consist of sentences separated by line.

conda activate transition-amr-parser
python -m cdse_covid.pegasus_pipeline.run_amr_parsing_all \
    --corpus TXT_FILES \
    --output AMR_FILES \
    --max-tokens MAX_TOKENS \
    --amr-parser-model TRANSITION_AMR_PARSER_PATH

Preprocessing

conda activate <cdse-covid-env>
python -m cdse_covid.pegasus_pipeline.ingesters.aida_txt_ingester \
    --corpus TXT_FILES --output SPACIFIED --spacy-model SPACY_PATH

EDL ingestion

conda activate <cdse-covid-env>
python -m cdse_covid.pegasus_pipeline.ingesters.edl_output_ingester \
    --edl-output EDL_OUTPUT --output EDL_MAPPING_FILE

Claim detection

conda activate <cdse-covid-env>
python -m cdse_covid.claim_detection.run_claim_detection \
    --input SPACIFIED \
    --patterns claim_detection/topics.json \
    --out CLAIMS_OUT \
    --spacy-model SPACY_PATH

Semantic extraction from AMR

conda activate transition-amr-parser
python -m cdse_covid.semantic_extraction.run_amr_parsing \
    --input CLAIMS_OUT \
    --output AMR_CLAIMS_OUT \
    --amr-parser-model TRANSITION_AMR_PARSER_PATH \
    --max-tokens MAX_TOKENS \
    --domain DOMAIN

Semantic extraction from SRL

conda activate <cdse-covid-env>
python -m cdse_covid.semantic_extraction.run_srl \
    --input AMR_CLAIMS_OUT \
    --output SRL_OUT \
    --spacy-model SPACY_PATH

Wikidata linking

conda activate <cdse-covid-env>
python -m cdse_covid.semantic_extraction.run_wikidata_linking \
    --claim-input CLAIMS_OUT \
    --srl-input SRL_OUT \
    --amr-input AMR_CLAIMS_OUT \
    --output WIKIDATA_OUT

Entity merging

conda activate <cdse-covid-env>
python -m cdse_covid.semantic_extraction.run_entity_merging \
    --edl EDL_MAPPING_FILE \
    --qnode-freebase QNODE_FREEBASE_MAPPING \
    --freebase-to-qnodes FREEBASE_TO_QNODES \
    --claims WIKIDATA_OUT \
    --output ENTITY_OUT \
    --include-contains

Postprocessing

conda activate <cdse-covid-env>
python -m cdse_covid.pegasus_pipeline.convert_claims_to_json \
    --input ENTITY_OUT \
    --output OUTPUT_FILE

Converting the JSON to AIF

conda activate <cdse-covid-env>
python -m cdse_covid.pegasus_pipeline.ingesters.claims_json_to_aif \
   --claims-json OUTPUT FILE \
   --aif-dir AIF_OUTPUT_DIR

Contributing

Before pushing, first run make precommit to run all precommit checks.
- You can run these checks individually if you so desire. Please see (./Makefile)[Makefile] for a list of all commands.
After ensuring all linting requirements are met, rebase the new branch against master.
Create a new PR, requesting review from at least one collaborator.

Name		Name	Last commit message	Last commit date
Latest commit History 477 Commits
cdse_covid		cdse_covid
params		params
sample_data		sample_data
scripts		scripts
wikidata_linker		wikidata_linker
.flake8		.flake8
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
amr-requirements-docker-lock.txt		amr-requirements-docker-lock.txt
amr-requirements-lock.txt		amr-requirements-lock.txt
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-docker-lock.txt		requirements-docker-lock.txt
requirements-lock.txt		requirements-lock.txt
run_pipeline.sh		run_pipeline.sh
setup.py		setup.py
setup.sh		setup.sh
tap_environment_for_docker.yml		tap_environment_for_docker.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claim Detection & Semantics Extraction (Covid-19)

Installation

Docker

Usage

Via Pegasus WFMS

Via Shell Script

Via Individual Scripts

Contributing

About

Releases

Packages

Contributors 2

Languages

isi-vista/cdse-covid

Folders and files

Latest commit

History

Repository files navigation

Claim Detection & Semantics Extraction (Covid-19)

Installation

Docker

Usage

Via Pegasus WFMS

Via Shell Script

Via Individual Scripts

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages