Java-based content parser for TTS #976

di72nn · 2020-05-13T10:16:34Z

I drafted a content parser for TTS written in java (based on the jsoup library).
The parser should be ok, but the code in general is very much work-in-progress. I tried the parser with a couple of articles, seemed to work pretty much like current JS-based parser.

Main advantages:

Parsing is no longer coupled with WebView.
That means the TTS service may be made to work completely by itself (just feed it the article ID).
The UI can pull metadata from the service (currently it's the opposite).
I guess it is easier to maintain, since I know Java, but not so much JS.

Disadvantages:

Java parser can't use computed styles (for detecting block elements). I don't think that makes any difference, but I wouldn't know since I'm not a web developer.
The XPath ranges produced by java parser applicable only to pure article content: any post-processing in WebView (stuff like annotation highlights) may interfere with them. That shouldn't be a serious problem either. And that's actually applicable for JS parser too (maybe not as much though).

There is also an XPath-ranges-based highlighting/focusing implemented, since java parser can't provide pixel-based offsets. As an optimization, the UI part could get and store pixel-based offsets by itself.

Despite the fact that the parser itself is pretty much done, there's still a lot work to do: the whole UI <-> service interaction should be redone. Preferably, that horrendous TtsData and co. logic should be moved to the service, and the UI should just pull all the TTS information from the service.

di72nn added 4 commits May 13, 2020 13:44

TTS: minor parsing improvements

b33a4fb

TTS: split sentences on ellipsis

41c8c41

WIP: ranges

97d7a51

WIP: java parser

3931fec

di72nn mentioned this pull request Oct 12, 2020

Don't steal focus when text-to-speech (TTS) advances to next article #1077

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java-based content parser for TTS #976

Java-based content parser for TTS #976

di72nn commented May 13, 2020

Java-based content parser for TTS #976

Are you sure you want to change the base?

Java-based content parser for TTS #976

Conversation

di72nn commented May 13, 2020