-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paper-processing issues in the latest processing batch of papers #178
Comments
Here is the list of papers that rreas processed, for continuing the above analysis. Access the reader for a specific paper at https://scholarphi.semanticscholar.org/?file=https://arxiv.org/pdf/.pdf&preset=demo
|
One of the reasons that citations are resolving to the incorrect paper is because, in some papers, the This also appears to be the cause of some failed resolutions for paper 1908.00300v1. I speculate that this format of including hyperlinks for reference entries is common to the ACL reference format. |
I looked into the papers that were missing symbols, and determined to main reasons symbols were missing:
|
@rreas recently processed approximately 100 papers from recent machine learning papers. In this issue, I'm cataloging a list of all issues that I've seen in the papers I've inspected that were processed by the pipeline. Data quality has certainly increased dramatically with recent changes. However, we still have quite a ways to go. The below list should be helpful in prioritizing what changes need to be addressed first.
General
Complete failure (no citations, no symbols, no definitions, nothing):
1906.00414v2(citations are mysteriously working now)(It seems if there is no entity data, the "Loading citation data..." loading bar never disappears)
Massive entity:
An annotation erroneously spans across columns:
Annotation bounding box includes clipping of figure
Annotation breaks across columns, and tooltip shows at the top of the second column:
Citations
Citation clicked in one column highlights in another column:
A large number of citations aren't detected:
A lot of citations resolve to the wrong paper:
On initial load of paper, once citations have been fetched, the underlines don't appear:
Multiple citations to the same paper are grouped into one annotation:
Symbols
Symbol parsing is off (maybe an issue from macros??):
Very few / no symbols detected:
(It seems that, in particular, there's something going wrong with processing anything besides the citations in the NeurIPS papers. Maybe it's an issue with colorization of entities.)
Clicking on a symbol causes the interface to crash:
Display equation is not clickable:
LaTeX doesn't render for symbol:
All entities in one column are slightly vertically offset:
Symbol "e" isn't detected:
Symbol includes underscore that should be merged with the other symbols:
Sentences
Declutter doesn't work in algorithm (i.e., pseudocode) listings:
Declutter doesn't work in figure captions / subfigure captions:
Declutter doesn't work at very beginning of abstract:
Some other sentence missing from declutter:
Terms
Term span extends into word outside of it (e.g., section header, citiation):
False positives from abbreviation that matches common word:
Citation is marked as a term:
Term includes citation:
Definitions
No definitions found:
Very few definitions found (this is not a complete list. Many papers have only very few definitions found):
Definition starts in the middle of a word:
No definition shows when an underlined term /symbol is clicked:
Reviewed
(Two papers were chosen from each year of {2017, 2018, 2019} for ICML, NeurIPS, CVPR, ACL), for a total of 24 papers, or maybe a couple more than that.)
Examples of pretty good papers:
The text was updated successfully, but these errors were encountered: