-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raw dataset #20
Comments
Same here. |
Yes, there has been some issue with downloading the raw data using the script. Some of the wayback urls have expired. The dataset at the url mentioned at the top of this post is the same that was used in our paper and also shared with BART authors. Please only use entries that are provided here: https://github.com/EdinburghNLP/XSum/blob/master/XSum-Dataset/XSum-TRAINING-DEV-TEST-SPLIT-90-5-5.json |
When I tried to get the XSUM dataset from url you shared (http://kinloch.inf.ed.ac.uk/public/direct/XSUM-EMNLP18-Summary-Data-Original.tar.gz), I encountered the following error: Can you help me to access the raw dataset? |
Try replacing kinloch with bollin. |
It works ! |
I saw the dataset is available at http://kinloch.inf.ed.ac.uk/public/direct/XSUM-EMNLP18-Summary-Data-Original.tar.gz (link given by @shashiongithub )
I tried to use this dataset to reproduce BART results, but I couldn't. According to one of BART author on this issue (fairseq repository), this is because I'm using an already processed version of the dataset.
Is it possible to have a link to the raw dataset (no postprocess of any kind) ?
The text was updated successfully, but these errors were encountered: