Entity Localization Bug: Symbol. pp_t subsymbols are grouped incorrectly in paper 1906.04604v1 #183
Labels
bad-entity-detection
An issue or task related to an entity that was detected in the wrong place
bug
Something isn't working
entity-localization
An issue or task related to entity localization
symbols
An issue or task related to symbols
Milestone
Assign this issue two labels, one for each of:
citation
orsymbols
)missing-entity-detection
orbad-entity-detection
)Description: As you can see, the symbol
pp_t
is grouped into subsymbols as follows:p
p
and the subscriptt
(i.e.,p_t
)p
alonet
subscriptURL (optional): If you run a local instance of the UI that connects to the
dev_symbol_failure
schema from the database, you can see this behavior at the bottom of page 4 at http://localhost:3001/?file=https://arxiv.org/pdf/1906.04604v1.pdf&preset=demo.How to fix (optional): The ideal behavior is that:
p
s (pp
) are grouped into a single symbol.t
is made into a subscript for thep
sThis will likely require modifying
parse_equation.py
to, when merging identifiers in a row, also merge in the contents of subscript and supersciprt elements (i.e., 'msub', 'msup', 'msubsup') into the elements before it, if the base element is mergeable (i.e., if it's a letter, and the elements before it in the row are also letters). Test cases should include:*'22foo_i' -> '22', 'foo_i'
The text was updated successfully, but these errors were encountered: