Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity Localization Bug: Symbol. pp_t subsymbols are grouped incorrectly in paper 1906.04604v1 #183

Open
2 tasks
andrewhead opened this issue Dec 30, 2020 · 0 comments
Labels
bad-entity-detection An issue or task related to an entity that was detected in the wrong place bug Something isn't working entity-localization An issue or task related to entity localization symbols An issue or task related to symbols

Comments

@andrewhead
Copy link
Contributor

Assign this issue two labels, one for each of:

  1. The entity type (citation or symbols)
  2. The localization issue type (missing-entity-detection or bad-entity-detection)

Description: As you can see, the symbol pp_t is grouped into subsymbols as follows:

  • one for the first p
  • one for the second p and the subscript t (i.e., p_t)
  • one for the second p alone
  • one for the t subscript

image

URL (optional): If you run a local instance of the UI that connects to the dev_symbol_failure schema from the database, you can see this behavior at the bottom of page 4 at http://localhost:3001/?file=https://arxiv.org/pdf/1906.04604v1.pdf&preset=demo.

How to fix (optional): The ideal behavior is that:

  • The first two ps (pp) are grouped into a single symbol.
  • t is made into a subscript for the ps

This will likely require modifying parse_equation.py to, when merging identifiers in a row, also merge in the contents of subscript and supersciprt elements (i.e., 'msub', 'msup', 'msubsup') into the elements before it, if the base element is mergeable (i.e., if it's a letter, and the elements before it in the row are also letters). Test cases should include:

  • 'foo_{i}bar_{j}' -> '{foo}_i', '{bar}_j'
    *'22foo_i' -> '22', 'foo_i'
@andrewhead andrewhead added bug Something isn't working entity-localization An issue or task related to entity localization symbols An issue or task related to symbols bad-entity-detection An issue or task related to an entity that was detected in the wrong place labels Dec 30, 2020
@andrewhead andrewhead added this to the LaTeX Updates for Alpha milestone Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bad-entity-detection An issue or task related to an entity that was detected in the wrong place bug Something isn't working entity-localization An issue or task related to entity localization symbols An issue or task related to symbols
Projects
None yet
Development

No branches or pull requests

1 participant