Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java-based content parser for TTS #976

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

Java-based content parser for TTS #976

wants to merge 4 commits into from

Conversation

di72nn
Copy link
Member

@di72nn di72nn commented May 13, 2020

I drafted a content parser for TTS written in java (based on the jsoup library).
The parser should be ok, but the code in general is very much work-in-progress. I tried the parser with a couple of articles, seemed to work pretty much like current JS-based parser.

Main advantages:

  • Parsing is no longer coupled with WebView.
    That means the TTS service may be made to work completely by itself (just feed it the article ID).
    The UI can pull metadata from the service (currently it's the opposite).
  • I guess it is easier to maintain, since I know Java, but not so much JS.

Disadvantages:

  • Java parser can't use computed styles (for detecting block elements). I don't think that makes any difference, but I wouldn't know since I'm not a web developer.
  • The XPath ranges produced by java parser applicable only to pure article content: any post-processing in WebView (stuff like annotation highlights) may interfere with them. That shouldn't be a serious problem either. And that's actually applicable for JS parser too (maybe not as much though).

There is also an XPath-ranges-based highlighting/focusing implemented, since java parser can't provide pixel-based offsets. As an optimization, the UI part could get and store pixel-based offsets by itself.

Despite the fact that the parser itself is pretty much done, there's still a lot work to do: the whole UI <-> service interaction should be redone. Preferably, that horrendous TtsData and co. logic should be moved to the service, and the UI should just pull all the TTS information from the service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant