Skip to content

Aurora-Network-Global/TMD

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SDG Classifier

The github for the sustainability project for the course Text Mining Domains at Vrije Universiteit Amsterdam (2021). Students: Myrthe Buckens, Dyon van der Ende and Eva den Uijl

data

In the data folder, you will find the following files:

  • original data file provided by Aurora Universities Alliance: SDG-1-17-basic-excl-target13.0.csv
  • preprocessed training data with 2 SDG's and negative class: SDG-training.csv
  • preprocessed test data with 2 SDG's and negative class: SDG-test.csv
  • preprocessed training data with all SDG's: SDG-training-all.csv
  • preprocessed test data with all SDG's: SDG-test-all.csv
  • file with predictions for baseline: SDG-predictions-baseline.csv
  • file with predictions for RoBERTa on title: SDG-predictions-title.csv
  • file with predictions for RoBERTa on abstract: SDG-predictions-abstract.csv

requirements

The needed requirements can be found in requirements and installed by running pip install requirements from your terminal.

code

In the code folder, you will find the following scripts, to be run with the specified arguments:

  • baseline system: baseline.py <test data> <output location>
  • preprocessing script: preprocessing.py <data set>
  • basic statistics on the data: basic_stats.py <data set>
  • RoBERTa classifier with abstracts: RoBERTa_classifier_abstract.py <train data> <test data> <output location>
  • RoBERTa classifier with titles: RoBERTa_classifier_title.py <train data> <test data> <output location>
  • evaluation script: evaluation.py <classifier predictions>

For running the mBERT models, we used google colaboration. It is possible to download this code on top of the page as .ipynb or .py, but for speed and possible hardware limitations, we advise you run the code online with the GPU from google. The code can be found by following the links below:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%